koushiksrivats / flip Goto Github PK

View Code? Open in Web Editor NEW

61.0 4.0 2.0 8.82 MB

Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)

Home Page: https://koushiksrivats.github.io/FLIP/

Python 100.00%

clip face-anti-spoofing face-antispoofing multimodal

flip's People

Contributors

Stargazers

Watchers

Forkers

tjpulfn bill007bill

flip's Issues

How to deploy the trained model?

Hi!

Thanks for your work!

I would like to know, how can we deploy the trained model? Since during the training it is necessary to give prompts for each image, to perform inference, we just need to give the model a RGB image, right?

From an image, the modelo will be able to decide if the image is a spoof or a real person? Am I right?

Cannot reproduce the performance by ViT

Hi, thank you for impressive research.

Your proposed model reproduces, but does not reproduce the performance of the experiments on ViT both MICO protocols.

just wondering if you were able to reproduce the ViT performance. TPR@FPR, which differs from the paper by about 2-30%.

Here is the result I got by running python train_vit.py --config M, C, I, O for each.

Can you give any suggestions for result?

Run, HTER, AUC, TPR@FPR=1%
0, 6.333333333333332, 98.43333333333332, 76.66666666666667
1, 9.75, 97.425, 63.33333333333333
2, 6.583333333333333, 96.64166666666667, 60.0
3, 5.0, 98.26666666666667, 68.33333333333333
4, 8.416666666666666, 95.35, 65.0
Mean,7.216666666666666, 97.22333333333333, 66.66666666666666
Std dev, 1.670495601776777, 1.134991434133119, 5.676462121975469

Run, HTER, AUC, TPR@FPR=1%
0, 13.426931056110385, 94.66557308502598, 55.319148936170215
1, 10.647947122111256, 96.7948408677892, 63.12056737588653
2, 9.258455155111688, 96.38153133593863, 53.90070921985816
3, 10.647947122111256, 95.23487882150496, 25.53191489361702
4, 9.258455155111688, 96.95726990559817, 60.99290780141844
Mean,10.647947122111255, 96.0068188031714, 51.773049645390074
Std dev, 1.522112187595772, 0.9010634790490837, 13.560758471168365

Run, HTER, AUC, TPR@FPR=1%
0, 12.385730211817169, 94.78595317725753, 24.615384615384617
1, 16.19286510590858, 91.21181716833891, 29.230769230769234
2, 14.626532887402455, 94.17614269788183, 34.61538461538461
3, 15.228539576365662, 92.93422519509475, 30.0
4, 13.87959866220736, 94.22742474916387, 38.46153846153847
Mean,14.462653288740245, 93.46711259754738, 31.384615384615387
Std dev, 1.2853508023630207, 1.2798800867634041, 4.751829295840155

Run, HTER, AUC, TPR@FPR=1%
0, 20.0, 87.72822299651567, 4.366197183098591
1, 20.42032683908328, 86.16719831182216, 2.3943661971830985
2, 20.509643225204886, 87.46181969867989, 14.225352112676056
3, 20.52559257986946, 88.02144574765667, 20.704225352112676
4, 19.73720371006527, 87.68459537714091, 22.3943661971831
Mean,20.23855327084458, 87.41265642636306, 12.816901408450704
Std dev, 0.3153353781341504, 0.6477253172896427, 8.197134698080816

about dataset

Hello, Thanks for your work very much, can you share the processed datasets(including CASIA-SURF,CASIA-CeFA,and WMCA.) for me? my email is [email protected]

How can we utilize the score output by an FLIP model to determine whether a face image is live or spoof?

Hi, we would like to use FLIP to determine whether a face image is live or spoof. If we input the image into the FLIP model, we can obtain a score output by the model. How can we utilize this score to determine whether the face image is live or spoof?
Thank you.

Frame generated with MTCNN

Hi,
thank you for your amazing work.

Currently i would like to reproduce the results.
I notice that during my dataset preprocessing, if the selected frame using the protocol stated

only sample two frames: frame[6] and frame[6+math.floor(total_frames/2)]

what if on selected frame, the MTCNN do not detect any face, what is your approach in that scenario?

As your code required to have frame0 and frame1.
thanks

Questions about training with CelebA-Spoof

Hello! I have questions about your work.

In the paper, you mentioned "In each of the three protocols, similar to [16], we include CelebA-Spoof [64] as the supplementary training data to increase the diversity of training samples."

Does this mean that you are pre-training with CelebA-Spoof dataset, and then fine-tuning with OCIM?
Or does that mean you are fine-tuning with all together of CelebA-Spoof and 3 datasets of OCIM at the same time, and then test with the left one of OCIM? If so, isn't it OCI+CelebA-Spoof to M instead of OCI to M?

Thanks for sharing your work!

Any pretrained weight?

Thanks for sharing great work.

is there any plans to share pretrained weight??

Dataloader Error

Nice work and thanks for sharing!

I have a question about the code. what is fake_shot.txt for and where is it?
It gives the following error.

FLIP/utils/utils.py", line 65, in sample_frames
    for i in open(dataroot + dataset_name + '_fake_shot.txt').readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'data/MCIO/txt/msu_fake_shot.txt

Thanks!

人脸对齐部分

您好，感谢分享如此伟大的作品，在人脸对齐部分，猜测您可能使用了相似变换或者仿射变化，那预先设定的五个人脸关键点位置是多少，能分享下吗

Confusion about data

Hello and thank you for the great work.
As mentioned, For each video, only sample two frames: frame[6] and frame[6+math.floor(total_frames/2)] and save the frame as videoname_frame0.png/videoname_frame1.png, except for the CelebA-Spoof dataset.
Question：
1）The label in data/MCIO/txt/, e.g., casia_fake_test.txt, only used frame0. However, in https://github.com/koushiksrivats/FLIP/blob/4f95def259e135a0cbaff1d770f559ca739c4c9f/config.py#L13C26-L14C3, it seems that uses frame0 for training, and both frame0 and frame1 for testing.
2) I am just confused about the split of the dataset and the use of frame0 and frame1：The results reported in Table 2 in the paper are based on the original training and testing set or split with the labels in data/MCIO/txt/. In protocol 1, frame0 (6th) used for training？ and frame0 and 1 used for testing?

Could you share the pre-trained weight that necessary for successfully running the inference process of your project?

Thank you for open-sourcing this wonderful work. We are researchers from different fields and would like to leverage your work for our own works. However, we are not well-versed in FAS, so it would be quite challenging for us to obtain the datasets and train the model from scratch.

We are not interested in obtaining all the pre-trained weights in the paper. Our goal is simply to acquire the specific weight necessary for successfully running the inference process of your project.

Could you share the pre-trained weight that necessary for successfully running the inference process of your project?

Thank you very much.

Cannot reproduce the performance in the paper

Hello, thanks for sharing the impressive work!

I have been trying to reproduce the performance of FLIP-MCL.
I preprocessed the datasets in Protocol 1 and ran train_flip_mcl.py.
However, the average HTER in protocol 1 is about two times bigger than the numbers in the paper.

Can you give any suggestions or ideas that I can change?

Here is the result I got by running python train_flip_mcl.py --config O

Run, HTER, AUC, TPR@FPR=1%
0,7.5833047062864996,97.71310791578742,60.985915492957744
1,8.581488933601609,96.89564214555627,46.76056338028169
2,7.74083525543505,97.56622662806103,51.54929577464789
3,8.301271040879424,97.38840359228543,61.12676056338028
4,7.635569514648869,97.55994503607008,58.028169014084504
Mean,7.968493890170292,97.42466506355206,55.69014084507042
Std dev,0.39938486613099317,0.28380266685835653,5.656012560076761