huangyg123 / curricularface Goto Github PK

CurricularFace(CVPR2020)

License: MIT License

Python 99.77% Shell 0.23%

curricularface's Introduction

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, Feiyue Huang

This repository is the official PyTorch implementation of paper CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. (The work has been accepted by CVPR2020).

We have released a training framework for face recognition, please refer to the details at TFace.

Main requirements

torch == 1.1.0
torchvision == 0.3.0
tensorboardX == 1.7
bcolz == 1.2.1
Python 3

Usage

# To train the model:
sh train.sh
# To evaluate the model:
(1)please first download the val data in https://github.com/ZhaoJ9014/face.evoLVe.PyTorch.
(2)set the checkpoint dir in config.py
sh evaluate.sh

You can change the experimental setting by simply modifying the parameter in the config.py

Model

The IR101 pretrained model can be downloaded here. [Baidu Cloud](link: https://pan.baidu.com/s/1bu-uocgSyFHf5pOPShhTyA passwd: 5qa0), Google Drive

Result

The results of the released pretrained model are as follows:

Data	LFW	CFP-FP	CPLFW	AGEDB	CALFW	IJBB (TPR@FAR=1e-4)	IJBC (TPR@FAR=1e-4)
Result	99.80	98.36	93.13	98.37	96.05	94.86	96.15

The results are slightly different from the results in the paper because we replaced DataParallel with DistributedDataParallel and retrained the model.

Citing this repository

If you find this code useful in your research, please consider citing us:

@article{huang2020curricularface,
	title={CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition},
	author={Yuge Huang and Yuhan Wang and Ying Tai and  Xiaoming Liu and Pengcheng Shen and Shaoxin Li and Jilin Li, Feiyue Huang},
	booktitle={CVPR},
	pages={1--8},
	year={2020}
}

Contacts

If you have any questions about our work, please do not hesitate to contact us by emails. Yuge Huang: [email protected] Ying Tai: [email protected]

curricularface's People

Contributors

Stargazers

Watchers

curricularface's Issues

error when loading model

Hi, I met a problem when I tried to load a model which had been trained for test. I used distributed training strategy with multiple GPUs on one machine. The error message is as follows:
File "evaluate.py", line 67, in <module> model_dict = torch.load(BACKBONE_RESUME_ROOT, map_location=torch.device('cuda')) File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 386, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 580, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: storage has wrong size: expected -4925077422239741557 got 512. It seems that some error happened when saving model, but there was no error message catched.
Thanks a lot!
PS: This case did not happen each time. Sometimes I could load the model correctly, but sometimes failed.

Error in train.py about builtins.print

Hello, i meet an error on train.py on line 51
and the error message is: builtins.print = print_pass SyntaxError: invalid syntax. Can someone help me with this problem? Thanks a lot!

How to preprae the data record dir?

Thanks for your generous sharing. But the readme don't mention how to generate the RECORD_DIR.

it was occur error to mask = cos_theta > cos_theta_m with multi gpu

dimension mismatch

cos_theta = torch.mm(embbedings, kernel_norm)

when i run it says the size does not match: The code
cos_theta = torch.mm(embbedings, kernel_norm)
should :
cos_theta = torch.mm(embbedings, kernel_norm.t())
but why when i test the model in Megaface. the rank1 is only 0.002%?

some different between implementation and paper

Hi,
When you update the t. you use 0.99 as the alpha in the paper. but only 0.01 in the code. So, which one is the right version. In my opinion, the code is right.

why s is set to 64?

spawn.py problem

how to solve this problem ?

inference

Hello, author. First of all, thank you for sharing the project. I see that there are reviews in this project, but there is no prediction. I have a problem when I write the prediction script. I want to ask you. I want to give the model a graph and let it output the category through L2 in utils_ norm(backbone( ccropped.cuda ()) [0]). CPU () value, I don't know how to deal with it, so I want to ask you if I can give you some ideas or solutions

About "original_logits" and "outputs" issues

When calculating top1 and top5, why use "original_logits" instead of "outputs"? Although this does not affect the training process, I found that the top1 obtained with "original_logits" will be much larger than the top1 obtained with "outputs". This is exactly the opposite of my expectations. Why is that?

calculate t was wrong

with torch.no_grad():
self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t
but your paper is: t(k) = αr(k) + (1 - α)t(k-1);
where t0 = 0, α is the momentum parameter and set to0.99.

about class CurricularFace(nn.Module)

In this loss, you choose the hard_example by "mask = cos_theta > cos_theta_m; " and " hard_example = cos_theta[mask]". However the cos_theta is always larger than cos_thata_m. This code is useless and it does not match your paper.

if cos(θyi + m) ≥ cosθj then
N(t,cosθj) = cosθj;
else
N(t,cosθj) = (t(k) + cosθj)cosθj ;

Different results on IJB-C

Hi Huang,

I really liked your new Curricular face paper especially the idea of automatic curriculum learning. I tried replicating your results on IJB-C dataset using the IR_101 pretrained model that you have supplied. Firstly, I have aligned the faces using the landmarks provided in the ArcFace repository. I then normalized these faces the same way you did for lfw and other datasets in the evaluate.py code. I used the same ArcFace evaluation code. However, the results I am getting are much lower than what is reported in the paper. Here are my results:

TAR @ FAR: [1:1 verification protocol]
1e-6 --> 6.34
1e-5 --> 15.76
1e-4 --> 55.29
1e-3 --> 75.83
1e-2 --> 84.34
1e-1 --> 91.17

These numbers seem buggy to me. Is the code you used to evaluate IJB-C the same as evaluate.py?

torch 1.2 error:one of the variables needed for gradient computation has been modified by an inplace operation:

torch1.2 run 'train.sh',when loss.backforward. raise error like this:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 5]], which is output 0 of ClampBackward, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

is the torch version problem? Thanks

the 9th formulation was inconsistence with your open code

code:
self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t
paper:
t(k) = αr(k) + (1 − α)t(k−1)
α = 0.99,
r(k) = sum cosθyi
arget_logit.mean() in your code was not same with sum cosθyi in your paper

Model evaluation

Excuse me,
I have a problem why the best threshold was used in evaluation rather than a fixed threshold？
Shouldn't we be using a fixed threshold in practical application?

训练速度慢

谢谢作者的分享。ir101模型我在私有集上测过了（现实监控场景多角度），效果很棒， 89%， arcface作者模型只有83%。

But在训自己的数据集， 4卡速度十分慢，是arcface的2.5倍，，，

Image alignment and preprocessing for pre-trained model

I wonder, how to prepare the images in order to use the pre-trained model.

I guess that the images should be aligned and resized to 112x112, right? Your code also supports 224x224, but on what images has the pre-trained net been trained? I guess on 112x112, because when I try 224x224, I get a 'RuntimeError: size mismatch'.
How was alignment and resizing exactly done. You refer to the val data in https://github.com/ZhaoJ9014/face.evoLVe.PyTorch. However, the cropped images of CFP (Version "Align_112x112") linked in the Data Zoo there are cropped differently than the code of that repository does. Can you give one or two examples of aligned and resized images used to train the pre-trained model?
The script utils.py contains the following function for preprocessing (after alignment and resizing):
ccrop = transforms.Compose([ de_preprocess, transforms.ToPILImage(), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
with
def de_preprocess(tensor): return tensor * 0.5 + 0.5
It seems, that the validation data had already been normalized (mean, std), and here, you revert that to an image tensor, but then apply the same normalization again. So, I think, when performing inference on a single (aligned and resized) image, I should just use input_tensor = preprocess(croppedImg).unsqueeze(0) using this transform:
preprocess = transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
Is this assumption right?

You help is much appreciated.

I would like to ask how the original training picture size is 250x250, when training, it becomes 112x112?

For face recognition

Hello teams,

Can the Curricular Face or ArcFace be used for face recognition in a single image?

why BatchNorm1d with affine=False?

In output_layer ,BatchNorm1d is used.
why BatchNorm1d with affine=False?
thank you!

the 9th formulation was inconsistence with your open code

t can not update correctly when use multiple gpu

It seems that when use multiple gpus for training, t start from 0 in every iteration. When use a single gpu for training, t is updated correctly.

Error in evaluate.py

in File utility.py
I meet the error get_val_pair(path, name):
carray = bcolz.carray(rootdir = os.path.join(path, name), mode = 'r')
FileNotFoundError: No such a file /public/share/dataset/Face/lfw/meta/sizes
Can someone hep me with this? Thanks a lot

prepare training data

Could you tell me how to prepare the training data and the 'refined_ms1m.txt'? Thanks.

About “seed” issues

Putting seed in the main function seems useless, so that each training is different and cannot reproduce the results. Does it put seed in the main_worker function?

target_logit in code，What does target_logit mean？

CurricularFace/head/metrics.py /
target_logit = cos_theta[torch.arange(0, embbedings.size(0)), label].view(-1, 1)

Performance and backbone network

Thanks for sharing a great work. In your code, you provided some backbone network

support: ['ResNet_50', 'ResNet_101', 'ResNet_152', 'IR_50', 'IR_101', 'IR_152', 'IR_SE_50', 'IR_SE_101', 'IR_SE_152']

In the Model Zoo , the insightface provide LResNet100E-IR

Method	LFW(%)	CFP-FP(%)	AgeDB-30(%)	MegaFace(%)
Ours	99.77	98.27	98.28	98.47

This ís your result that trained from scratch with IR-101 using your setting.

Evaluation: LFW Acc: 0.9976666666666667, CFP_FP Acc: 0.983, AgeDB Acc: 0.9818333333333333, CPLFW Acc: 0.9301666666666666 CALFW Acc: 0.9608333333333332

And this is your report

Data	LFW	CFP-FP	CPLFW	AGEDB	CALFW	IJBB (TPR@FAR=1e-4)	IJBC (TPR@FAR=1e-4)
Result	99.80	98.36	93.13	98.37	96.05	94.86	96.15

I have some questions after reading your code:

Do you use augmentation (I found only RandomHorizontalFlip has been applied)? Insightface team used augmentation such as flip, ColorJitterAug, compress_aug... https://github.com/deepinsight/insightface/blob/3866cd77a6896c934b51ed39e9651b791d78bb57/recognition/image_iter.py#L207?
I am using 4GPU with batch size of 700/each GPU. My performance is smaller than your report. Do you think number of GPU is the reason (you used 8 GPUs)?
Does your IR_101 same with LResNet100E-IR in term of number of FLOP and params? I found that you save backbone and head seperately, while insighface saved them into one model? Any difference?
Have you measure the inference speed of IR_101? I feel it too slow than mxnet

Which align method have you used?

翻了issue也没有正面回答过align方式，align明显会影响精度，这也是issue里无法复现的原因

IR101 acc

Hi,

Thanks for sharing this code.

I have tried to test IR101 pretrained model with evaluate.sh, but I got this result:

============================================================
Evaluation: LFW Acc: 0.6561666666666667, CFP_FP Acc: 0.6135714285714285, AgeDB Acc: 0.5056666666666667, CPLFW Acc: 0.5356666666666666 CALFW Acc: 0.5483333333333333

what's wrong?

Thanks!!!

Output size

what would be the output size of curricularface? would it return a scalar value or a tensor of dimension [batch_size,no_of_classes]?

Run Time Error

Hi,

Thanks for sharing this code.

I have tried to train this repo with my data, but I got this error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation (both ArcFace and CurricularFace)
Any suggestions?

BTW, the version of tqdm in requirements.txt seems wrong (no 4.31.2)

maybe:
4.31.1, 4.32.0, 4.32.1, 4.32.2

Thanks!!!

expect you to release source code ?

Hi,
Just wondering when you are planning to release the source code and data.

Thanks

is it any wrong in calculation of derivatives?

Why in equation (8) derivitiveα are devided with sin(θj) ? where does this come from?
I refer to this paper(CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition)
Can anyone help me understand because i find the same result but without sin(θyi) for example when(j=yi) and iwhen(j!=yi) i find an extra term sin(θj) that is multiplying with the existing result

can not perform same as your pretrained res101

hello

Evaluation: LFW Acc: 0.9968333333333332, CFP_FP Acc: 0.8959999999999999, AgeDB Acc: 0.9726666666666667

it's my training result as 20 epoch , hyper param as you set .. acc on cfp-fp is quite bad...

I got some issue

As you said, if I evaluate the model , please first download the val data in https://github.com/ZhaoJ9014/face.evoLVe.PyTorch.
But, which dataset should I download?
Thanks.

self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t

Hello! In the paper, t should be calculated as:

It seems to be "self.t = target_logit.mean() * 0.99 + (1-0.99) * self.t" in the code.
But found "self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t" in your code. I don't know why.
Thx.

batch_norm parameters issue

On line 104 of train.py, you write "separate batch_norm parameters from others; do not do weight decay for batch_norm parameters to improve the generalizability"

CurricularFace/train.py

Line 104 in 8b2f473

 backbone_paras_only_bn, backbone_paras_wo_bn = separate_irse_bn_paras(backbone) # separate batch_norm parameters from others; do not do weight decay for batch_norm parameters to improve the generalizability 

Is this the conclusion you got from the experiment? Or is the conclusion of a certain paper? Can you explain the reason for this?

target_logit's shape

target_logit = cos_theta[torch.arange(0, embbedings.size(0)), label].view(-1, 1)

By this line, it seems the shape is [embbedings.size(0) * embbedings.size(0), 1]. Is it true?

final_target_logit = torch.where(target_logit > self.threshold, cos_theta_m, target_logit - self.mm)

In MV-softmax, the released code use easy_margin like following:

final_gt = torch.where(gt > 0, cos_theta_m, gt)

but the official arcface implement does not use easy_margin, like following:

final_gt =  cos(θ + m)

I try mv-softmax using own dataset without easy_margin, but suffering from divergence issue(NAN). I fix divergence issue using easy_margin.

Compared with mv_softmax , there are two diffs, first is the thresh, CurricularFace use self.threshold = math.cos(math.pi - m), this is understandable， second is final_target_logit = torch.where(target_logit > self.threshold, cos_theta_m, target_logit - self.mm), what is intuitive understanding of target_logit - self.mm , where self.mm = math.sin(math.pi - m) * m

Cannot reproduce the results on IJBB and IJBC

I used the pretrained IR101 model and the IJB evaluation code from insightface. Following are the results I got:

IJB-B TAR@FAR:
1e-6 -> 41.76%
1e-5 -> 69.81%
1e-4 -> 87.14%
1e-3 -> 93.27%

IJB-C TAR@FAR:
1e-6 -> 62.64%
1e-5 -> 75.46%
1e-4 -> 87.53%
1e-3 -> 94.01%

There is no problem if I use the pretrained models from face.evoLVe.PyTorch. Would you please share the evaluation code on IJB-B and IJB-C?

关于evaluate的问题

你好，想问一下为什么在evaluate文件中只用了backbone，没有用head