selimsef / dfdc_deepfake_challenge Goto Github PK

View Code? Open in Web Editor NEW

725.0 725.0 190.0 69.16 MB

A prize winning solution for DFDC challenge

License: MIT License

Dockerfile 1.86% Shell 4.03% Python 94.11%

deepfake-detection deepfakes kaggle

dfdc_deepfake_challenge's People

Contributors

Stargazers

Watchers

Forkers

killthehostage datalab-vn deeplearning2012 alex-engineer mteterin luuthienxuan yangsenwxy duykhanhbk jiaming-lee linhduongtuan maksimovkonstantin hadryan cxz tchigher amorgun mishinvd lubumobi apolmig ankitshah009 joefannie dataman-py gkn1fexxx mfts lasteg idcore michaelafanasev trendingtechnology yifan-zhao orlgln alexgliu momina04 unimauro phoenix9032 huangjiadidi zawecha1 dong03 fanhongxing jasmin-bharadiya wangyoucao fil82 tmvien otwen bapablo kukumayas piantic hajungong007 zhangconghhh donte-2019 ranjani94 lsq357 yj2victory vishnupnandanan anthonytasca zhuzhuxiaei datomi79 wangliwei-intel snokh simonasdev arafat-hasan mattgroh vamsijkrishna tony9402 bksain salihinsaealal f9g8h7j654 y0ngsheng rjcc jberros lucky7323 taylover-pei emransalehali sabrexa codingmice chenshen03 duydq12 rushi-the-neural-arch bmquynhlinh jiejie1993 shubhamkr96 hi-ilkin ogechionuoha thenero93 rtitov aramakus bholdmanny pattaro yjzst eurus202425 holmes-gu zhiwenshao bmehta001 getuntun 851624623 programmer2huang linh-amped 2016215226 lelechen63 jianwang-ntu tzuren edvlili

dfdc_deepfake_challenge's Issues

training dataset?

Thank you for your work. What training data do you use? All dfdc?Your model is basically unrecognizable on FOMM video. I want to add this batch of data to train your model.

I encountered continuous target data is not supported with label binarization

I encountered this issue during validation

  File "finetune_xy.py", line 446, in <module>
    main()
  File "finetune_xy.py", line 303, in main
    summary_writer=summary_writer)
  File "finetune_xy.py", line 311, in evaluate_val
    bce, probs, targets = validate(model, data_loader=data_val)
  File "finetune_xy.py", line 366, in validate
    fake_loss = log_loss(y[fake_idx], x[fake_idx], labels=[0, 1])
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2206, in log_loss
    transformed_labels = lb.transform(y_true)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 491, in transform
    sparse_output=self.sparse_output)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 680, in label_binarize
    "binarization" % y_type)
ValueError: continuous target data is not supported with label binarization
[1]+  Exit 1                  nohup python -u finetune_xy.py --config configs/b7.json > log.out

could you explain a little bit the

data_x = []
    data_y = []
    for vid, score in probs.items():
        score = np.array(score)
        lbl = targets[vid]

        score = np.mean(score)
        lbl = np.mean(lbl)
        data_x.append(score)
        data_y.append(lbl)
    y = np.array(data_y)
    x = np.array(data_x)
    fake_idx = y > 0.1
    real_idx = y < 0.1
    fake_loss = log_loss(y[fake_idx], x[fake_idx], labels=[0, 1])
    real_loss = log_loss(y[real_idx], x[real_idx], labels=[0, 1])
    print("{}fake_loss".format(prefix), fake_loss)
    print("{}real_loss".format(prefix), real_loss)

in your code? Thank you

I just have a quick question. You mention that you resize the videos before the face detector.
Do we need to resize the videos before we run preprocess_data.sh?
Or preprocess_data.sh would also handle the resize of the videos as well when we run face detector?

I can not find the code you resize the image. This is the closest thing I find in your code.

dfdc_deepfake_challenge/preprocessing/face_detector.py

Line 67 in ef703c2

frame = frame.resize(size=[s // 2 for s in frame.size])

Thank you!

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

./train.sh path/to/data num-of-gpus
100%██████████████████████████████████████████████████████████████████████████████████| 2765/2765 [06:11<00:00, 7.45it/s]
Traceback (most recent call last):
File "training/pipelines/train_classifier.py", line 363, in
main()
File "training/pipelines/train_classifier.py", line 227, in main
summary_writer=summary_writer)
File "training/pipelines/train_classifier.py", line 235, in evaluate_val
bce, probs, targets = validate(model, data_loader=data_val)
File "training/pipelines/train_classifier.py", line 290, in validate
real_loss = log_loss(y[real_idx], x[real_idx], labels=[0, 1])
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2186, in log_loss
y_pred = check_array(y_pred, ensure_2d=False)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 653, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

metadata.json

I have downloaded the DFDC dataset, but metadata.json wasn't found, can you share the download link of this file, thank you very much

MTCNN thresholds

In face_detector.py :
self.detector = MTCNN(margin=0,thresholds=[0.85, 0.95, 0.95], device=device)

but in kernel_utils.py :
self.detector = MTCNN(margin=0, thresholds=[0.7, 0.8, 0.8], device="cuda")

this is why ?

thank you

Could you explain what does the argument "fold" mean

When I run generate_fold.py,I find the "fold" is alway "0"

                for k, v in metadata.items():
                    fold = None
                    for i, fold_dirs in enumerate(folds):
                        #if part in fold_dirs:
                            fold = i
                            break
                    assert fold is not None
                    video_id = k[:-4]
                    video_fold[video_id] = fold

I debug this part and find "fold” is alway "0" and then break
Since I can't understand what does "fold" mean,I don't know how to solve it and what is the correct
could you explain it in detail? thank you!

Inference on CPU

Is there a way of using the trained weights & do inference using CPU only? My GPU can't handle inference with the current settings...

I find there is something wrong with " remove_landmark " code

I visualized this part of the code and found that it didn't work.
Can you check it out, or open source visual code ?
Thank you very much !

submission.csv all predictions are below 0.5

Hi Selim,

Thank you for sharing your great work here, tried to use your predict_submission.sh to reproduce the submission.csv by using 7 efficientnet-b7 models on test_videos folder, but the prediction scores for all 400 videos are smaller than 0.5, most of them around 0.3-0.4, guess I did something wrong here, but just cannot figure out what could be the possible reasons, can help here?

CUDNN_STATUS_NOT_INITIALIZED

I used Win11 WSL2+Docker. Have the same problem "CUDNN_STATUS_NOT_INITIALIZED".

Is something wrong with that?

Google Colab

Hello! I need to ask, Is it possible to run this code on Google Colab?

Привет.

Как насчёт улучшать дипфейки, а не детектить их? :D

Why RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED?

when i run train_classifier.py,i meet the problem.

I've been stuck here for two days.
could you explain it in detail? thank you!

when excute train script,console print warn log "<data-dir>/diffs/XXX.png” can't open/read file

when I finish Data preparation， then excute "training/pipelines/train_classifier.py", console print warn log "/diffs/XXX.png” can't open/read file

top accuracy is 65.18% ?

https://ai.facebook.com/blog/deepfake-detection-challenge-results-an-open-initiative-to-advance-ai/

the webpage said that the top accuracy in private evaluate dataset is about 65.18%, but I evalute the model you provided, got an accuracy about 89%, what's wrong?

looking forward for any reply.

32 frames are selected continuously or randomly ？

Could you talk about it in detail ?
Thank you !

Unable to understand extraction of crops

dfdc_deepfake_challenge/preprocessing/extract_crops.py

Line 42 in 9925d95

xmin, ymin, xmax, ymax = [int(b * 2) for b in bbox]

Why are you multiplying the coordinates of bounding box by 2? It will shift the region that we want to crop. I think it should not be multiplied.

Thanks.

Issue with bitmap masks.

dfdc_deepfake_challenge/training/datasets/classifier_dataset.py

Line 25 in ef703c2

mid_h = w // 2

I guess this should be mid_h = h // 2.

Will you publish a paper soon

looking forward to that

generate_folds.py Line 106 KeyError

Hi, there.
I'm trying to preprocess your model, however, I had an issue running generate_folds.py

Here is the screenshot of the error.
Run it multiple times then it shows different video file names.

I looked up the metadata.json, I guess the error is from the training video have no original video.
It is means the assert videofold[video] will not equal to videofold[ori_vid].
I not should how to fix this.
Hope to hear from you soon.

Thanks,
Silion

Does this detect face filters?

Hi! Thank you for your work. Does this detect face filters like you would see on instagram, for example how people use for beauty purposes, like to add makeup to the face, or change the face in general? Or does it strictly detect deepfakes?

Timm library version problem

Timm library updated version causes trouble in the code.

dfdc_deepfake_challenge/Dockerfile

Line 59 in 9925d95

RUN pip install albumentations timm pytorch_toolbelt tensorboardx

Installing an older version resolved the issue.
pip install timm==0.1.26

No module named 'VideoDataset'

When I ran python preprocessing/detect_original_faces.py --root-dir DATA_ROOT, I encountered an error:
Traceback (most recent call last):
File "preprocessing/detect_original_faces.py", line 14, in
import face_detector, VideoDataset
ModuleNotFoundError: No module named 'VideoDataset'
How can I solve this problem?

Could you explain how to built "metadata.json" in utils.py?

could i download pretrained model about dfdc dataset? Now I get an error.

--2020-11-02 14:27:42-- https://github.com/selimsef/dfdc_deepfake_challenge/releases/download//final_999_DeepFakeClassifier_tf_efficientnet_b7_ns_0_23
Resolving github.com (github.com)... 15.164.81.167
Connecting to github.com (github.com)|15.164.81.167|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-11-02 14:27:42 ERROR 404: Not Found.

i encontered this issue. What can be done to solve this problem.