Giter Club home page Giter Club logo

adaface's People

Contributors

mk-minchul avatar zonepg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

adaface's Issues

about option "train_data_subset"

Hi author,
Based on the code, the class num is set to 70722 in case of train_data_subset being set to True. Just wondering how "70722" comes. In addition, what's the target of using subset of emore faces? Just to save the training time? Appreicate your suggestions. Thanks.

Large-scale datasets

Hello!
Thank you for your incredible work! Analysis of margin losses through gradient scaling term is quite eye-opening.

My questions are:

  1. Why didn't you provide any experiments' results for some really large-scale datasets, like WebFace42M?
  2. Any observations or thoughts about stability of AdaFace under noisy labels, compared to default losses like CosFace/ArcFace ?

Question about Figure 3

Hi Minchul Kim,
Thank you for your great work!
I am impressed with Figure 3 in your paper, which is really an excellent illustration.
I wonder how to draw such figures, and would you please release the tool or code to draw them.

When reproducting the paper, result on IJB-B, IJB-C,Tinyface are lower than reported

I trained the model on MS1MV2, on high-quality datasets, and I got a similar result. Average 97.18(reported 97.19)
But when testing on IJB-B, IJB-C, and Tinyface, the results are lower.

Dataset Avg on high quality IJB-B IJB-C Tinyface-Rank1 Tinyface-Rank5
Reported in the paper 97.19 95.67 96.89 68.21 71.54
Reproducted 97.18 95.37 96.62 67.03 70.52

I followed the code provided and trained on 8-GPUs. And I wonder if there are some special tricks when evaluating on mixed and low-quality datasets?

And I found in the paper, Table3, (b), AdaFace trained on MS1MV2 has a better result on TinyFace than the model trained on MS1MV3. This is quite strange, MS1MV3 dataset is larger than MS1MV2, and the result should be better, but it shows a reverse result on TinyFace. Table 3 shows the trend, the model trained on MS1MV3 has better performance than the model trained on MS1MV2, but only in TinyFace, it's different.

I wonder if it's a small mistake that the result on TinyFace is reversed on MS1MV3 and MS1MV2 ?

Hope to get your reply.

How to solve?can't find this file(mem_file.dat.conf)

I tried to train after processing the data according to the convert.py script you provided, but I couldn't find the MEM of the validation set_ File.dat.conf file

initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl
All DDP processes registered. Starting ddp with 1 processes

creating train dataset
creating val dataset
laoding validation data memfile
Traceback (most recent call last):
File "main.py", line 83, in
main(args)
File "main.py", line 55, in main
trainer.fit(trainer_mod, data_mod)
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 865, in _run
self._call_setup_hook(model) # allow user to setup lightning_module in accelerator environment
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1169, in _call_setup_hook
self.datamodule.setup(stage=fn)
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
fn(*args, **kwargs)
File "/work/1.model_train/6.face/AdaFace-master/data.py", line 99, in setup
self.val_dataset = val_dataset(self.data_root, self.val_data_path, self.concat_mem_file_name)
File "/work/1.model_train/6.face/AdaFace-master/data.py", line 136, in val_dataset
val_data = evaluate_utils.get_val_data(os.path.join(data_root, val_data_path))
File "/work/1.model_train/6.face/AdaFace-master/evaluate_utils.py", line 12, in get_val_data
agedb_30, agedb_30_issame = get_val_pair(data_path, 'agedb_30')
File "/work/1.model_train/6.face/AdaFace-master/evaluate_utils.py", line 25, in get_val_pair
np_array = read_memmap(mem_file_name)
File "/work/1.model_train/6.face/AdaFace-master/evaluate_utils.py", line 53, in read_memmap
with open(mem_file_name+'.conf', 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: 'faces_emore/agedb_30/memfile/mem_file.dat.conf'

can't run success on inference.py, The error:IndexError: too many indices for array: array is 0-dimensional, but 3 were indexed

i try run the demo of inference.py when i get code
but i get this error info:

Traceback (most recent call last):
  File "inference.py", line 41, in <module>
    input = to_input(aligned_rgb_img)
  File "inference.py", line 24, in to_input
    brg_img = ((np_img[:,:,::-1] / 255.) - 0.5) / 0.5
IndexError: too many indices for array: array is 0-dimensional, but 3 were indexed

i find get error because the face is none on the face detection and don't known why i can't get face, i only run the code in new env and don't have any change on code

i think maybe the env have different, this is my env:

Package                 Version
----------------------- -----------
absl-py                 1.1.0
aiohttp                 3.8.1
aiosignal               1.2.0
async-timeout           4.0.2
asynctest               0.13.0
attrs                   21.4.0
bcolz                   1.2.1
cachetools              5.2.0
certifi                 2022.6.15
charset-normalizer      2.0.12
cycler                  0.11.0
fonttools               4.33.3
frozenlist              1.3.0
fsspec                  2022.5.0
future                  0.18.2
google-auth             2.8.0
google-auth-oauthlib    0.4.6
graphviz                0.8.4
grpcio                  1.47.0
idna                    3.3
imageio                 2.19.3
importlib-metadata      4.12.0
joblib                  1.1.0
kiwisolver              1.4.3
Markdown                3.3.7
matplotlib              3.5.2
menpo                   0.11.0
multidict               6.0.2
mxnet                   1.9.1
networkx                2.6.3
numpy                   1.21.6
oauthlib                3.2.0
opencv-python           4.6.0.66
packaging               21.3
pandas                  1.3.5
Pillow                  9.1.1
pip                     22.1.2
prettytable             3.3.0
protobuf                3.19.4
pyasn1                  0.4.8
pyasn1-modules          0.2.8
pyDeprecate             0.3.1
pyparsing               3.0.9
python-dateutil         2.8.2
pytorch-lightning       1.4.4
pytz                    2022.1
PyWavelets              1.3.0
PyYAML                  6.0
requests                2.28.0
requests-oauthlib       1.3.1
rsa                     4.8
scikit-image            0.19.3
scikit-learn            1.0.2
scipy                   1.7.3
setuptools              62.6.0
six                     1.16.0
tensorboard             2.9.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
threadpoolctl           3.1.0
tifffile                2021.11.2
torch                   1.8.1+cu111
torchaudio              0.8.1
torchmetrics            0.6.0
torchvision             0.9.1+cu111
tqdm                    4.64.0
typing_extensions       4.2.0
urllib3                 1.26.9
wcwidth                 0.2.5
Werkzeug                2.1.2
wheel                   0.37.1
yarl                    1.7.2
zipp                    3.8.0

Is it possible to replace EMA of feature norm calculation with a torch.nn.BatchNorm layer?

Hi,

I'm trying to implement AdaFace using MXNet and wondering if it is reasonable to replace EMA of feature norm calculation with a BatchNorm layer (without learnable parameters).

This is to say, can I replace this part

AdaFace/head.py

Lines 75 to 81 in 79af07a

with torch.no_grad():
mean = safe_norms.mean().detach()
std = safe_norms.std().detach()
self.batch_mean = mean * self.t_alpha + (1 - self.t_alpha) * self.batch_mean
self.batch_std = std * self.t_alpha + (1 - self.t_alpha) * self.batch_std
margin_scaler = (safe_norms - self.batch_mean) / (self.batch_std+self.eps) # 66% between -1, 1

with this in constructor

self.norm_layer = torch.nn.BatchNorm1d(1, eps=self.eps, momentum=self.t_alpha, affine=False)

and this in forward()?

margin_scaler = self.norm_layer(safe_norms)

I'm not sure if with torch.no_grad() should be kept or there is something I haven't noticed.
Thank you.

Questions about GST visualization

Hi Minchul, thx for this incredible work! I have few questions about the GST visualization in the Fig 3 of the main paper.

  1. For the CosFace Loss(1st. column of Fig 3), it looks like GST value decreases rapidly near the boundary, how can you adjust the GST value from W_j to the boundary B_1, what is the value of s? I thought the result might be based on the last graph of Fig 1 of supplementary material, then +0.5(m=0.5) in x-axis and -1(P-1) in y-axis to get the function of GST based on cos_theta for CosFace, however it looks different in the 1st. column of Fig 3.

  2. In the ArcFace Loss(2nd. column of Fig 3), we can see GST increases when cos_theta goes up. But according to the Eq. 15, when cos_theta goes up, |(P-1)| goes down and (cos(m)+...) goes up. How to make sure the GST value is positive correlated with cos_theta?

  3. The idea emphasizes hard sample with high norm and easy sample with low norm. But the AdaFace Loss(7th. column of Fig 3) shows white triangle(hard sample, low norm) still has large GST value which doesn't make sense.

I'd appreciate it if you can solve my problem :p

Training time

Would you be so kind as to include the time taken for training with the same procedure with the paper?

Unexpected result when trained resnet50 on MS1MV3 dataset

Hi,
I trained resnet50 on MS1MV3 dataset and used the same params as given, but got 0.8882 test_acc on test dataset.

Detailed result:
{'agedb_30_num_test_samples': 12000.0,
'agedb_30_test_acc': 0.8686666488647461,
'agedb_30_test_best_threshold': 1.7070000171661377,
'cfp_fp_num_test_samples': 14000.0,
'cfp_fp_test_acc': 0.8317142724990845,
'cfp_fp_test_best_threshold': 1.7899999618530273,
'lfw_num_test_samples': 12000.0,
'lfw_test_acc': 0.9645000100135803,
'lfw_test_best_threshold': 1.5600000619888306,
'test_acc': 0.8882936239242554}

params:
--arch ir_50
--use_16bit
--batch_size 256
--num_workers 8
--epochs 50
--lr_milestones 12,20,24
--lr 0.1
--head adaface
--m 0.4
--h 0.333
--low_res_augmentation_prob 0.2
--crop_augmentation_prob 0.2
--photometric_augmentation_prob 0.2

About the usage of the loss function

Hi, authors. Thanks for open sourcing such great work. Here I have a question about the usage of the loss function. In the readme https://github.com/mk-minchul/AdaFace#usage, you take the embedding before normalization as input for the adaface loss. But in your implementation, https://github.com/mk-minchul/AdaFace/blob/c4052220c51167a18c35ce15a450044180cbb281/train_val.py#L54 you take the embedding after normalization as input for the adaface loss. I am confused about that. In addition, since you use the norm of the embedding feature as a metric of the image quality, do you think the normalization operation before generating the embedding feature will damage the efficiency of adaface loss. For example, CNN backbone->(1)L2 normalization->pooling->FC->embedding->(2)L2 normalization, will the first L2 normalization operation downgrade the efficiency of adaface loss?

About masked face recognition

Thank you for your excellent work! I tried your pretrained model on 1k-face recognition and it just worked very well!
Now I'm trying masked face recognition. I augmented the dataset faces_emore by putting 5 kinds of masks on the face, so the number of training imgs is 5x bigger than the original dataset. I'm wondering how many epochs I should set if I want to get a good 'ir50' model? (or maybe 'ir18' model). Do you have any suggestions on training this model? I intend to use the same parameters you show in the readme. Since the training process could take a long time, it would be very helpful if you can give me some advice to avoid several trainings.

not the faces_emore data train

Hi,
I want to ask you a question
because faces_emore data is .rec format , your train need img by convert.py , so I fixed code to use the other img data to train(300w img) . now it's always “creating train dataset” ,not train ?

图片

Opposite view between norm and quality CASIA-Webface

Hi, thanks for yours excellent work! It help me a lot.
I've add your AdaFace loss with PartialFC to deal with OOM problem when training (ultra) large scale dataset and get some questions:

  • I've analyze our Cosface model trained with PartialFC and get opposite view between norm & quality
    image
  • My implementation also have a opposite with small CASIA-Webface dataset (10k ids, 0.5M images), but the accuracy for BLUR domain is still higher (I think the adaptive margin loss didn't work and the main improvement of accuracy was only given by augmentation)
    image
    (Backbone used is R50, init batch_mean = 20, batch_std = 100, t = 1.0, tested with t = 0.01 have the same result)
  • But for my large scale training (5M ids, 105M images), your conclusion is still working
# Our current checkpoint
Training: 2022-08-18 08:10:35,869-[lfw][210000]XNorm: 23.302152
Training: 2022-08-18 08:10:35,870-[lfw][210000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-08-18 08:10:35,870-[lfw][210000]Accuracy-Highest: 0.99800

Training: 2022-08-18 08:11:33,067-[cfp_fp][210000]XNorm: 19.032604
Training: 2022-08-18 08:11:33,067-[cfp_fp][210000]Accuracy-Flip: 0.96914+-0.00833
Training: 2022-08-18 08:11:33,067-[cfp_fp][210000]Accuracy-Highest: 0.96914

Training: 2022-08-18 08:12:21,623-[agedb_30][210000]XNorm: 21.835075
Training: 2022-08-18 08:12:21,624-[agedb_30][210000]Accuracy-Flip: 0.97567+-0.00782
Training: 2022-08-18 08:12:21,624-[agedb_30][210000]Accuracy-Highest: 0.97733
Training: 2022-08-18 08:12:21,624-[+][210000]Score / Score-Highest: 2.94281 / 2.94362

Have we need to re-thinking about the properties of training dataset ?

Using it on video

First of all, great work and great paper!

Second of all, i've used the inference example and it worked as expected, however is there a way to input a video and have it recognize the faces and compare them to a dataset of images?

I think it might be possible to do so after training the model but im not certain how to do that as well.
I could use any insight you have.

Thank you so much!

About hyper-parameters of reproduction.

Hi~ It's a wonderful and excellent work and I'm interested in it.
But I got some problems when reproducing it, I cannot achieve high-enough accuracy on this task: AdaFace / ResNet100 / MS1MV2. Could you please provide its bash file, just like run.sh or hparams.yaml.
Looking forward to your reply! Thanks a lot!

Train the model using my own dataset

Hi, thank you for your excellent work!
Can you give us a Readme about how to train an adaface model using our own dataset? The training part in this Readme is way too simple, it would be very nice if you can show more examples and data preparations on training :)

Why do I use IR_ 18 model, the trained CKPT is 530m?

this is my config

parent_parser.add_argument('--data_root', type=str, default='')
parent_parser.add_argument('--train_data_path', type=str, default='faces_emore/imgs')
parent_parser.add_argument('--val_data_path', type=str, default='faces_emore')
parent_parser.add_argument('--train_data_subset', action='store_true')
parent_parser.add_argument('--prefix', type=str, default='default')
parent_parser.add_argument('--gpus', type=int, default=4, help='how many gpus')
parent_parser.add_argument('--distributed_backend', type=str, default='ddp', choices=('dp', 'ddp', 'ddp2'),)
parent_parser.add_argument('--use_16bit', action='store_true', help='if true uses 16 bit precision')
parent_parser.add_argument('--epochs', default=26, type=int, metavar='N', help='number of total epochs to run')
parent_parser.add_argument('--seed', type=int, default=42, help='seed for initializing training.')
parent_parser.add_argument('--batch_size', default=1024, type=int,
                           help='mini-batch size (default: 256), this is the total '
                                'batch size of all GPUs on the current node when '
                                'using Data Parallel or Distributed Data Parallel')

parent_parser.add_argument('--lr',help='learning rate',default=0.002, type=float)
parent_parser.add_argument('--lr_milestones', default='12,20,24', type=str, help='epochs for reducing LR')
parent_parser.add_argument('--lr_gamma', default=0.1, type=float, help='multiply when reducing LR')

parent_parser.add_argument('--num_workers', default=36, type=int)
parent_parser.add_argument('--fast_dev_run', dest='fast_dev_run', action='store_true')
parent_parser.add_argument('--evaluate', action='store_true', help='use with start_from_model_statedict')
parent_parser.add_argument('--resume_from_checkpoint', type=str, default='')
parent_parser.add_argument('--start_from_model_statedict', type=str, default='')
parser.add_argument('--arch', default='ir_18')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M')
parser.add_argument('--weight_decay', default=1e-4, type=float)

parser.add_argument('--head', default='adaface', type=str, choices=('adaface'))
parser.add_argument('--m', default=0.4, type=float)
parser.add_argument('--h', default=0.333, type=float)
parser.add_argument('--s', type=float, default=64.0)
parser.add_argument('--t_alpha', default=0.01, type=float)

parser.add_argument('--low_res_augmentation_prob', default=0.2, type=float)
parser.add_argument('--crop_augmentation_prob', default=0.2, type=float)
parser.add_argument('--photometric_augmentation_prob', default=0.2, type=float)

parser.add_argument('--accumulate_grad_batches', type=int, default=1)
parser.add_argument('--test_run', action='store_true')
parser.add_argument('--save_all_models', action='store_true')

about the inference time

Hi,
thank you for your nice work,
I just try the inference code, but it takes too long to complete the inference. So I wonder how long the normal inference time per image is?

A question about the usage.

Hello,

Thank you for presenting the solid and wonderful work. I have a small question about the usage.

cosine_with_margin = adaface(embbedings, norms, labels)
loss = torch.nn.CrossEntropyLoss()(cosine_with_margin, labels)

After I read the code in head.py. I thought the return value of cosine_with_margin is the variant of the logits. In other words, it is the output of the adaptive margin function in the paper. Has it been passed into the softmax function? Why we can input it into the CrossEntropyLoss and compare it with ground truth directly?

I'm looking forward to your reply. Thank you very much.

Issue with Face Detection, MTCNN

Greetings,

I'm trying to run the 'try_mtcnn_steb_by_step' notebook, and after running it i've gotten a vastly different result than the one you have (only four faces surrounded with boxes, with the final result being only 3 out of many faces in the image).

Is there any clue on what causes these differences?

Thanks

Convert to Tensorrt

Hi, thank you for your excellent work!
I want to deploy this model to Tensorrt. Could you give me some guidelines

About the processed emore faces

Hi author,
A naive question for you. The extracted images using:

python convert.py --rec_path <DATASET_ROOT>/faces_emore

look somehow weird. It seems that RGB channel is wrongly ordered. Any comment? Thanks.

MODEL PERFORMANCE

I reviewed your work. The results look impressive. However, in my tests, I noticed that the performance of arcface is better. Yes, adaface is ahead in benchmarks. But archface provides better recognition in large datasets. Could it be something I did wrong? Or a recommendation?

The performance of model in dataset glint360k

Hi ,Thank you for your work sharing. I added Adaface to project insightFace and verified the effect of Adaface in dataset MV1Mv3, but in Glint360K, the effect of Adaface is worse than that of Cosface. Have you verified it on dataset Glint360K?

dataset | method | backnone | LFW | CFPFP | AGEDB | IJBC(1E-4)
MS1MV3 | Adaface | r100 | 99.867 | 99.014 | 98.3 | 97.17
MS1MV3 | Arcface | r100 | 99.85 | 98.9 | 98.55 | 96.85
Glint360k | Adaface | r100 | 99.83 | 99.15 | 98.45 | 97.38
Glint360k | Cosface | r100 | 99.817 | 99.2 | 98.65 | 97.55

Table3: our evaluation of the released model

Hello,
Thank you for your great work!
In Table3, "our evaluation of the released model" refers to an Arcface Resnet100 model trained with Webface4M.
However, I cannot find the corresponding checkpoint in insightface anyhow.
Could you please share where you found the checkpoint?

If the checkpoint is actually trained by yourself, does the "Arcface Resnet100 model trained with Webface4M" checkpoint also follow exactly the same data augmentation (crop, resizing, colorjitter) as in Adaface?

Many thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.