Giter Club home page Giter Club logo

curricularface's Introduction

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, Feiyue Huang

This repository is the official PyTorch implementation of paper CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. (The work has been accepted by CVPR2020).

We have released a training framework for face recognition, please refer to the details at TFace.

Main requirements

  • torch == 1.1.0
  • torchvision == 0.3.0
  • tensorboardX == 1.7
  • bcolz == 1.2.1
  • Python 3

Usage

# To train the model:
sh train.sh
# To evaluate the model:
(1)please first download the val data in https://github.com/ZhaoJ9014/face.evoLVe.PyTorch.
(2)set the checkpoint dir in config.py
sh evaluate.sh

You can change the experimental setting by simply modifying the parameter in the config.py

Model

The IR101 pretrained model can be downloaded here. [Baidu Cloud](link: https://pan.baidu.com/s/1bu-uocgSyFHf5pOPShhTyA passwd: 5qa0), Google Drive

Result

The results of the released pretrained model are as follows:

Data LFW CFP-FP CPLFW AGEDB CALFW IJBB (TPR@FAR=1e-4) IJBC (TPR@FAR=1e-4)
Result 99.80 98.36 93.13 98.37 96.05 94.86 96.15

The results are slightly different from the results in the paper because we replaced DataParallel with DistributedDataParallel and retrained the model.

Citing this repository

If you find this code useful in your research, please consider citing us:

@article{huang2020curricularface,
	title={CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition},
	author={Yuge Huang and Yuhan Wang and Ying Tai and  Xiaoming Liu and Pengcheng Shen and Shaoxin Li and Jilin Li, Feiyue Huang},
	booktitle={CVPR},
	pages={1--8},
	year={2020}
}

Contacts

If you have any questions about our work, please do not hesitate to contact us by emails. Yuge Huang: [email protected] Ying Tai: [email protected]

curricularface's People

Contributors

huangyg123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

curricularface's Issues

error when loading model

Hi, I met a problem when I tried to load a model which had been trained for test. I used distributed training strategy with multiple GPUs on one machine. The error message is as follows:
File "evaluate.py", line 67, in <module> model_dict = torch.load(BACKBONE_RESUME_ROOT, map_location=torch.device('cuda')) File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 386, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 580, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: storage has wrong size: expected -4925077422239741557 got 512. It seems that some error happened when saving model, but there was no error message catched.
Thanks a lot!
PS: This case did not happen each time. Sometimes I could load the model correctly, but sometimes failed.

Error in train.py about builtins.print

Hello, i meet an error on train.py on line 51
and the error message is: builtins.print = print_pass SyntaxError: invalid syntax. Can someone help me with this problem? Thanks a lot!

cos_theta = torch.mm(embbedings, kernel_norm)

when i run it says the size does not match: The code
cos_theta = torch.mm(embbedings, kernel_norm)
should :
cos_theta = torch.mm(embbedings, kernel_norm.t())
but why when i test the model in Megaface. the rank1 is only 0.002%?

inference

Hello, author. First of all, thank you for sharing the project. I see that there are reviews in this project, but there is no prediction. I have a problem when I write the prediction script. I want to ask you. I want to give the model a graph and let it output the category through L2 in utils_ norm(backbone( ccropped.cuda ()) [0]). CPU () value, I don't know how to deal with it, so I want to ask you if I can give you some ideas or solutions

About "original_logits" and "outputs" issues

When calculating top1 and top5, why use "original_logits" instead of "outputs"? Although this does not affect the training process, I found that the top1 obtained with "original_logits" will be much larger than the top1 obtained with "outputs". This is exactly the opposite of my expectations. Why is that?

calculate t was wrong

with torch.no_grad():
self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t
but your paper is: t(k) = αr(k) + (1 - α)t(k-1);
where t0 = 0, α is the momentum parameter and set to0.99.

about class CurricularFace(nn.Module)

In this loss, you choose the hard_example by "mask = cos_theta > cos_theta_m; " and " hard_example = cos_theta[mask]". However the cos_theta is always larger than cos_thata_m. This code is useless and it does not match your paper.

if cos(θyi + m) ≥ cosθj then
N(t,cosθj) = cosθj;
else
N(t,cosθj) = (t(k) + cosθj)cosθj ;

Different results on IJB-C

Hi Huang,

I really liked your new Curricular face paper especially the idea of automatic curriculum learning. I tried replicating your results on IJB-C dataset using the IR_101 pretrained model that you have supplied. Firstly, I have aligned the faces using the landmarks provided in the ArcFace repository. I then normalized these faces the same way you did for lfw and other datasets in the evaluate.py code. I used the same ArcFace evaluation code. However, the results I am getting are much lower than what is reported in the paper. Here are my results:

TAR @ FAR: [1:1 verification protocol]
1e-6 --> 6.34
1e-5 --> 15.76
1e-4 --> 55.29
1e-3 --> 75.83
1e-2 --> 84.34
1e-1 --> 91.17

These numbers seem buggy to me. Is the code you used to evaluate IJB-C the same as evaluate.py?

torch 1.2 error:one of the variables needed for gradient computation has been modified by an inplace operation:

torch1.2 run 'train.sh',when loss.backforward. raise error like this:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 5]], which is output 0 of ClampBackward, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

is the torch version problem? Thanks

Model evaluation

Excuse me,
I have a problem why the best threshold was used in evaluation rather than a fixed threshold?
Shouldn't we be using a fixed threshold in practical application?

训练速度慢

谢谢作者的分享。ir101模型我在私有集上测过了(现实监控场景多角度), 效果很棒, 89%, arcface作者模型只有83%。

But在训自己的数据集, 4卡速度十分慢,是arcface的2.5倍,,,

Image alignment and preprocessing for pre-trained model

I wonder, how to prepare the images in order to use the pre-trained model.

  1. I guess that the images should be aligned and resized to 112x112, right? Your code also supports 224x224, but on what images has the pre-trained net been trained? I guess on 112x112, because when I try 224x224, I get a 'RuntimeError: size mismatch'.

  2. How was alignment and resizing exactly done. You refer to the val data in https://github.com/ZhaoJ9014/face.evoLVe.PyTorch. However, the cropped images of CFP (Version "Align_112x112") linked in the Data Zoo there are cropped differently than the code of that repository does. Can you give one or two examples of aligned and resized images used to train the pre-trained model?

  3. The script utils.py contains the following function for preprocessing (after alignment and resizing):
    ccrop = transforms.Compose([ de_preprocess, transforms.ToPILImage(), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
    with
    def de_preprocess(tensor): return tensor * 0.5 + 0.5
    It seems, that the validation data had already been normalized (mean, std), and here, you revert that to an image tensor, but then apply the same normalization again. So, I think, when performing inference on a single (aligned and resized) image, I should just use input_tensor = preprocess(croppedImg).unsqueeze(0) using this transform:
    preprocess = transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
    Is this assumption right?

You help is much appreciated.

For face recognition

Hello teams,

Can the Curricular Face or ArcFace be used for face recognition in a single image?

Error in evaluate.py

in File utility.py
I meet the error get_val_pair(path, name):
carray = bcolz.carray(rootdir = os.path.join(path, name), mode = 'r')
FileNotFoundError: No such a file /public/share/dataset/Face/lfw/meta/sizes
Can someone hep me with this? Thanks a lot

prepare training data

Could you tell me how to prepare the training data and the 'refined_ms1m.txt'? Thanks.

About “seed” issues

Putting seed in the main function seems useless, so that each training is different and cannot reproduce the results. Does it put seed in the main_worker function?

Performance and backbone network

Thanks for sharing a great work. In your code, you provided some backbone network

support: ['ResNet_50', 'ResNet_101', 'ResNet_152', 'IR_50', 'IR_101', 'IR_152', 'IR_SE_50', 'IR_SE_101', 'IR_SE_152']

In the Model Zoo , the insightface provide LResNet100E-IR

Method LFW(%) CFP-FP(%) AgeDB-30(%) MegaFace(%)
Ours 99.77 98.27 98.28 98.47

This ís your result that trained from scratch with IR-101 using your setting.

Evaluation: LFW Acc: 0.9976666666666667, CFP_FP Acc: 0.983, AgeDB Acc: 0.9818333333333333, CPLFW Acc: 0.9301666666666666 CALFW Acc: 0.9608333333333332

And this is your report

Data LFW CFP-FP CPLFW AGEDB CALFW IJBB (TPR@FAR=1e-4) IJBC (TPR@FAR=1e-4)
Result 99.80 98.36 93.13 98.37 96.05 94.86 96.15

I have some questions after reading your code:

  1. Do you use augmentation (I found only RandomHorizontalFlip has been applied)? Insightface team used augmentation such as flip, ColorJitterAug, compress_aug... https://github.com/deepinsight/insightface/blob/3866cd77a6896c934b51ed39e9651b791d78bb57/recognition/image_iter.py#L207?

  2. I am using 4GPU with batch size of 700/each GPU. My performance is smaller than your report. Do you think number of GPU is the reason (you used 8 GPUs)?

  3. Does your IR_101 same with LResNet100E-IR in term of number of FLOP and params? I found that you save backbone and head seperately, while insighface saved them into one model? Any difference?

  4. Have you measure the inference speed of IR_101? I feel it too slow than mxnet

IR101 acc

Hi,

Thanks for sharing this code.

I have tried to test IR101 pretrained model with evaluate.sh, but I got this result:

============================================================
Evaluation: LFW Acc: 0.6561666666666667, CFP_FP Acc: 0.6135714285714285, AgeDB Acc: 0.5056666666666667, CPLFW Acc: 0.5356666666666666 CALFW Acc: 0.5483333333333333

what's wrong?

Thanks!!!

Output size

what would be the output size of curricularface? would it return a scalar value or a tensor of dimension [batch_size,no_of_classes]?

Run Time Error

Hi,

Thanks for sharing this code.

I have tried to train this repo with my data, but I got this error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation (both ArcFace and CurricularFace)
Any suggestions?

BTW, the version of tqdm in requirements.txt seems wrong (no 4.31.2)

maybe:
4.31.1, 4.32.0, 4.32.1, 4.32.2

Thanks!!!

is it any wrong in calculation of derivatives?

Why in equation (8) derivitiveα are devided with sin(θj) ? where does this come from?
I refer to this paper(CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition)
Can anyone help me understand because i find the same result but without sin(θyi) for example when(j=yi) and iwhen(j!=yi) i find an extra term sin(θj) that is multiplying with the existing result

can not perform same as your pretrained res101

hello

Evaluation: LFW Acc: 0.9968333333333332, CFP_FP Acc: 0.8959999999999999, AgeDB Acc: 0.9726666666666667

it's my training result as 20 epoch , hyper param as you set .. acc on cfp-fp is quite bad...

self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t

Hello! In the paper, t should be calculated as:
image
It seems to be "self.t = target_logit.mean() * 0.99 + (1-0.99) * self.t" in the code.
But found "self.t = target_logit.mean() * 0.01 + (1 - 0.01) * self.t" in your code. I don't know why.
Thx.

batch_norm parameters issue

On line 104 of train.py, you write "separate batch_norm parameters from others; do not do weight decay for batch_norm parameters to improve the generalizability"

backbone_paras_only_bn, backbone_paras_wo_bn = separate_irse_bn_paras(backbone) # separate batch_norm parameters from others; do not do weight decay for batch_norm parameters to improve the generalizability
Is this the conclusion you got from the experiment? Or is the conclusion of a certain paper? Can you explain the reason for this?

target_logit's shape

target_logit = cos_theta[torch.arange(0, embbedings.size(0)), label].view(-1, 1)

By this line, it seems the shape is [embbedings.size(0) * embbedings.size(0), 1]. Is it true?

final_target_logit = torch.where(target_logit > self.threshold, cos_theta_m, target_logit - self.mm)

In MV-softmax, the released code use easy_margin like following:

final_gt = torch.where(gt > 0, cos_theta_m, gt)

but the official arcface implement does not use easy_margin, like following:

final_gt =  cos(θ + m)

I try mv-softmax using own dataset without easy_margin, but suffering from divergence issue(NAN). I fix divergence issue using easy_margin.

Compared with mv_softmax , there are two diffs, first is the thresh, CurricularFace use self.threshold = math.cos(math.pi - m), this is understandable, second is final_target_logit = torch.where(target_logit > self.threshold, cos_theta_m, target_logit - self.mm), what is intuitive understanding of target_logit - self.mm , where self.mm = math.sin(math.pi - m) * m

Cannot reproduce the results on IJBB and IJBC

I used the pretrained IR101 model and the IJB evaluation code from insightface. Following are the results I got:

IJB-B TAR@FAR:
1e-6 -> 41.76%
1e-5 -> 69.81%
1e-4 -> 87.14%
1e-3 -> 93.27%

IJB-C TAR@FAR:
1e-6 -> 62.64%
1e-5 -> 75.46%
1e-4 -> 87.53%
1e-3 -> 94.01%

There is no problem if I use the pretrained models from face.evoLVe.PyTorch. Would you please share the evaluation code on IJB-B and IJB-C?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.