Giter Club home page Giter Club logo

pytorch_retinaface's People

Contributors

benuri avatar biubug6 avatar erjanmx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch_retinaface's Issues

leakyRelu activation fuction?

i found your model use leakyRelu activation fuction, the original model uses Relu fuction, is the leakyRelu makes better? thx~~

About processing video

Your work is amazing. I tried it on few sample images and it work well in term of both speed and accuracy. Then, I tried to modify it for addressing face detecting in video file, but the processing time for each frame was so long. Can you provide me code for running with video, or any suggestions about it?
This is what I tried ("txt" format for github's supporting):
video.txt

Dense Regression Branch

Could you please explain a little bit where the Dense Regression Branch is in your model? I think I only find the bounding box, cls, and landmark branch but not the dense regression branch. Thx

Loss weighting

In RetinaFace paper, it has 4 losses and has different weights for each, roughly has a ratio of 65:25:10:1.
Understood that you did not put the dense regression loss in implementation. and I saw a 2:1:1 in the config, just wonder is this the findout of your cross validation to balance the three losses?
Did you test other ratios?

用的是python2?

from torch.jit.annotations import Dict
ImportError: cannot import name 'Dict'
很多依赖都无法import

ValueError: zero-size array to reduction operation minimum which has no identity

Hi, when I run the test_wilderface.py, return the valueError
File "test_widerface.py", line 107, in
im_size_min = np.min(im_shape[0:2])
File "<array_function internals>", line 6, in amin
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 2746, in amin
keepdims=keepdims, initial=initial, where=where)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

How to reduce the time of post-processing?

Although the network forward time only about 5ms, But the post-processing time on my laptop is up to 150 ms.
The main time-consuming places are 98 to 102 lines in detect.py, generating the prior box take up a lot of time, What can we do to reduce it? thanks

classification loss 的算法

有一段不是很能理解

https://github.com/biubug6/Pytorch_Retinaface/blob/master/layers/modules/multibox_loss.py#L101

loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

batch_conf 的 shape 是 (num_anchors, 2)

但這邊是透過 log_sum_exp 把 batch_conf 中每個 anchor 對於每個 class 的值相加取 log (log_sum_exp(batch_conf) ),但是第一個 class 不是背景嗎?有需要把背景考慮進來?

後面又更不懂, batch_conf.gather(1, conf_t.view(-1, 1)) ,這邊為什麼沒有做 log_sum_exp

可以解釋一下這邊的意思嗎?

謝謝

How to Use model_best.pth.tar?

python3 train.py --ngpu 4 --resume_net model_best.pth.tar

Traceback (most recent call last):
File "train.py", line 67, in
net.load_state_dict(new_state_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 777, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RetinaFace:
Missing key(s) in state_dict: "body.stage1.0.0.weight", "body.stage1.0.1.weight", "body.stage1.0.1.bias", "body.stage1.0.1.running_mean", "body.stage1.0.1.running_var", "body.stage1.1.0.weight", "body.stage1.1.1.weight", "body.stage1.1.1.bias", "body.stage1.1.1.running_mean", "body.stage1.1.1.running_var", "body.stage1.1.3.weight", "body.stage1.1.4.weight", "body.stage1.1.4.bias", "body.stage1.1.4.running_mean", "body.stage1.1.4.running_var", "body.stage1.2.0.weight", "body.stage1.2.1.weight", "body.stage1.2.1.bias", "body.stage1.2.1.running_mean", "body.stage1.2.1.running_var", "body.stage1.2.3.weight", "body.stage1.2.4.weight", "body.stage1.2.4.bias", "body.stage1.2.4.running_mean", "body.stage1.2.4.running_var", "body.stage1.3.0.weight", "body.stage1.3.1.weight", "body.stage1.3.1.bias", "body.stage1.3.1.running_mean", "body.stage1.3.1.running_var", "body.stage1.3.3.weight", "body.stage1.3.4.weight", "body.stage1.3.4.bias", "body.stage1.3.4.running_mean", "body.stage1.3.4.running_var", "body.stage1.4.0.weight", "body.stage1.4.1.weight", "body.stage1.4.1.bias", "body.stage1.4.1.running_mean", "body.stage1.4.1.running_var", "body.stage1.4.3.weight", "body.stage1.4.4.weight", "body.stage1.4.4.bias", "body.stage1.4.4.running_mean", "body.stage1.4.4.running_var", "body.stage1.5.0.weight", "body.stage1.5.1.weight", "body.stage1.5.1.bias", "body.stage1.5.1.running_mean", "body.stage1.5.1.running_var", "body.stage1.5.3.weight", "body.stage1.5.4.weight", "body.stage1.5.4.bias", "body.stage1.5.4.running_mean", "body.stage1.5.4.running_var", "body.stage2.0.0.weight", "body.stage2.0.1.weight", "body.stage2.0.1.bias", "body.stage2.0.1.running_mean", "body.stage2.0.1.running_var", "body.stage2.0.3.weight", "body.stage2.0.4.weight", "body.stage2.0.4.bias", "body.stage2.0.4.running_mean", "body.stage2.0.4.running_var", "body.stage2.1.0.weight", "body.stage2.1.1.weight", "body.stage2.1.1.bias", "body.stage2.1.1.running_mean", "body.stage2.1.1.running_var", "body.stage2.1.3.weight", "body.stage2.1.4.weight", "body.stage2.1.4.bias", "body.stage2.1.4.running_mean", "body.stage2.1.4.running_var", "body.stage2.2.0.weight", "body.stage2.2.1.weight", "body.stage2.2.1.bias", "body.stage2.2.1.running_mean", "body.stage2.2.1.running_var", "body.stage2.2.3.weight", "body.stage2.2.4.weight", "body.stage2.2.4.bias", "body.stage2.2.4.running_mean", "body.stage2.2.4.running_var", "body.stage2.3.0.weight", "body.stage2.3.1.weight", "body.stage2.3.1.bias", "body.stage2.3.1.running_mean", "body.stage2.3.1.running_var", "body.stage2.3.3.weight", "body.stage2.3.4.weight", "body.stage2.3.4.bias", "body.stage2.3.4.running_mean", "body.stage2.3.4.running_var", "body.stage2.4.0.weight", "body.stage2.4.1.weight", "body.stage2.4.1.bias", "body.stage2.4.1.running_mean", "body.stage2.4.1.running_var", "body.stage2.4.3.weight", "body.stage2.4.4.weight", "body.stage2.4.4.bias", "body.stage2.4.4.running_mean", "body.stage2.4.4.running_var", "body.stage2.5.0.weight", "body.stage2.5.1.weight", "body.stage2.5.1.bias", "body.stage2.5.1.running_mean", "body.stage2.5.1.running_var", "body.stage2.5.3.weight", "body.stage2.5.4.weight", "body.stage2.5.4.bias", "body.stage2.5.4.running_mean", "body.stage2.5.4.running_var", "body.stage3.0.0.weight", "body.stage3.0.1.weight", "body.stage3.0.1.bias", "body.stage3.0.1.running_mean", "body.stage3.0.1.running_var", "body.stage3.0.3.weight", "body.stage3.0.4.weight", "body.stage3.0.4.bias", "body.stage3.0.4.running_mean", "body.stage3.0.4.running_var", "body.stage3.1.0.weight", "body.stage3.1.1.weight", "body.stage3.1.1.bias", "body.stage3.1.1.running_mean", "body.stage3.1.1.running_var", "body.stage3.1.3.weight", "body.stage3.1.4.weight", "body.stage3.1.4.bias", "body.stage3.1.4.running_mean", "body.stage3.1.4.running_var", "fpn.output1.0.weight", "fpn.output1.1.weight", "fpn.output1.1.bias", "fpn.output1.1.running_mean", "fpn.output1.1.running_var", "fpn.output2.0.weight", "fpn.output2.1.weight", "fpn.output2.1.bias", "fpn.output2.1.running_mean", "fpn.output2.1.running_var", "fpn.output3.0.weight", "fpn.output3.1.weight", "fpn.output3.1.bias", "fpn.output3.1.running_mean", "fpn.output3.1.running_var", "fpn.merge1.0.weight", "fpn.merge1.1.weight", "fpn.merge1.1.bias", "fpn.merge1.1.running_mean", "fpn.merge1.1.running_var", "fpn.merge2.0.weight", "fpn.merge2.1.weight", "fpn.merge2.1.bias", "fpn.merge2.1.running_mean", "fpn.merge2.1.running_var", "ssh1.conv3X3.0.weight", "ssh1.conv3X3.1.weight", "ssh1.conv3X3.1.bias", "ssh1.conv3X3.1.running_mean", "ssh1.conv3X3.1.running_var", "ssh1.conv5X5_1.0.weight", "ssh1.conv5X5_1.1.weight", "ssh1.conv5X5_1.1.bias", "ssh1.conv5X5_1.1.running_mean", "ssh1.conv5X5_1.1.running_var", "ssh1.conv5X5_2.0.weight", "ssh1.conv5X5_2.1.weight", "ssh1.conv5X5_2.1.bias", "ssh1.conv5X5_2.1.running_mean", "ssh1.conv5X5_2.1.running_var", "ssh1.conv7X7_2.0.weight", "ssh1.conv7X7_2.1.weight", "ssh1.conv7X7_2.1.bias", "ssh1.conv7X7_2.1.running_mean", "ssh1.conv7X7_2.1.running_var", "ssh1.conv7x7_3.0.weight", "ssh1.conv7x7_3.1.weight", "ssh1.conv7x7_3.1.bias", "ssh1.conv7x7_3.1.running_mean", "ssh1.conv7x7_3.1.running_var", "ssh2.conv3X3.0.weight", "ssh2.conv3X3.1.weight", "ssh2.conv3X3.1.bias", "ssh2.conv3X3.1.running_mean", "ssh2.conv3X3.1.running_var", "ssh2.conv5X5_1.0.weight", "ssh2.conv5X5_1.1.weight", "ssh2.conv5X5_1.1.bias", "ssh2.conv5X5_1.1.running_mean", "ssh2.conv5X5_1.1.running_var", "ssh2.conv5X5_2.0.weight", "ssh2.conv5X5_2.1.weight", "ssh2.conv5X5_2.1.bias", "ssh2.conv5X5_2.1.running_mean", "ssh2.conv5X5_2.1.running_var", "ssh2.conv7X7_2.0.weight", "ssh2.conv7X7_2.1.weight", "ssh2.conv7X7_2.1.bias", "ssh2.conv7X7_2.1.running_mean", "ssh2.conv7X7_2.1.running_var", "ssh2.conv7x7_3.0.weight", "ssh2.conv7x7_3.1.weight", "ssh2.conv7x7_3.1.bias", "ssh2.conv7x7_3.1.running_mean", "ssh2.conv7x7_3.1.running_var", "ssh3.conv3X3.0.weight", "ssh3.conv3X3.1.weight", "ssh3.conv3X3.1.bias", "ssh3.conv3X3.1.running_mean", "ssh3.conv3X3.1.running_var", "ssh3.conv5X5_1.0.weight", "ssh3.conv5X5_1.1.weight", "ssh3.conv5X5_1.1.bias", "ssh3.conv5X5_1.1.running_mean", "ssh3.conv5X5_1.1.running_var", "ssh3.conv5X5_2.0.weight", "ssh3.conv5X5_2.1.weight", "ssh3.conv5X5_2.1.bias", "ssh3.conv5X5_2.1.running_mean", "ssh3.conv5X5_2.1.running_var", "ssh3.conv7X7_2.0.weight", "ssh3.conv7X7_2.1.weight", "ssh3.conv7X7_2.1.bias", "ssh3.conv7X7_2.1.running_mean", "ssh3.conv7X7_2.1.running_var", "ssh3.conv7x7_3.0.weight", "ssh3.conv7x7_3.1.weight", "ssh3.conv7x7_3.1.bias", "ssh3.conv7x7_3.1.running_mean", "ssh3.conv7x7_3.1.running_var", "ClassHead.0.conv1x1.weight", "ClassHead.0.conv1x1.bias", "ClassHead.1.conv1x1.weight", "ClassHead.1.conv1x1.bias", "ClassHead.2.conv1x1.weight", "ClassHead.2.conv1x1.bias", "BboxHead.0.conv1x1.weight", "BboxHead.0.conv1x1.bias", "BboxHead.1.conv1x1.weight", "BboxHead.1.conv1x1.bias", "BboxHead.2.conv1x1.weight", "BboxHead.2.conv1x1.bias", "LandmarkHead.0.conv1x1.weight", "LandmarkHead.0.conv1x1.bias", "LandmarkHead.1.conv1x1.weight", "LandmarkHead.1.conv1x1.bias", "LandmarkHead.2.conv1x1.weight", "LandmarkHead.2.conv1x1.bias".
Unexpected key(s) in state_dict: "epoch", "arch", "state_dict", "best_acc1", "optimizer".

two eyes

when i conduct similar trial in my face detection codebase, the two eyes' lmk points overlap and always between the eyebrows. But the nose lmk is always right. Do you know what might be the reason?

How to get the camera parameters

Hi, thanks for your excellent work, i run the detect.py and get the box and landmark result. However, i want to get the camera parameters ,like camera location,camera pose and focal length. Is there interface to do this ?

Prefers resize to trained dimensions

image
Graph is on PASCAL
retinaface: resize = 1
retinaface-2: resize = 1.5 (or resize close to 640px, same effect)

Just like FaceBoxes prefers minimum face size (13 px) @ trained size (1024 px), RetinaFace also prefers resize to trained size (640px).

Right now resize is hardcoded to 1. I also now get a better result on AFW/PASCAL/etc..

Speed issue

I tested with set origin_size to True which means using original size on some tiny image.

it runs around 300 ms on GTX1080ti, seems a little slow...

can't reproduce the same AP

Hi,
I use your codes to train a retinaface detector using the Resnet50 pre-trained model and the widerface data you provide. I didn't change any configuration parameter in config.py except batch_size(from original 24 to 4).
However, I cannot get the same AP as yours. By single scale testing, the result is as follows: Easy Val AP 0.918, Medium Val AP 0.874, Hard Val AP 0.620.
I want to know how to get the same AP and why there is a big gap between my result and yours. Thanks a lot.

About the meaning of conf elements.

Hi. The shape of conf is (N, boxes_num, 2). (N, boxes_num, 1:2) is the confidence of face classification. May I ask, is (N, boxes_num, 0:1) the confidence of not a face classification? Thanks.

Is Retinaface particularly slow to see a new image size for the first time?

I've seen a problem of speed in the issue #1 which is closed now, i found that sometimes the speed is very slow, so i test it, in the detect.py:

  1. i add a line before # testing begin

    test_dataset = ['./test_image/0_Parade_marchingband_1_465.jpg',
    './test_image/0_Parade_Parade_0_43.jpg',
    './test_image/1_Handshaking_Handshaking_1_94.jpg',
    './test_image/1_Handshaking_Handshaking_1_158.jpg',
    './test_image/1.jpg', './test_image/2.jpg', './test_image/3.jpg']

The first four pictures are from wider_val, They are all of different sizes,
The last three pictures are the same size downloaded from internet.
the test_dataset list have 7 images and 5 different sizes

  1. i change line81 from image_path = "./curve/test.jpg" to
    image_path = test_dataset[i % len(test_dataset)]

then i run python detect.py , i got the result:

net forward time: 1.5058
net forward time: 1.5821
net forward time: 2.2512
net forward time: 1.5150
net forward time: 2.6007

net forward time: 0.0072
net forward time: 0.0059
net forward time: 0.0061

i found that the first five times were very slow.

so, Is Retinaface particularly slow to see a new image size for the first time?
and where is the main time spent? thanks

Issue for Cuda

Hi biu,
When i run command line for training. I'm facing this issue:
RuntimeError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 3.94 GiB total capacity; 2.79 GiB already allocated; 208.50 MiB free; 28.71 MiB cached).

Please give me some advice.

Best regards,
PeterPham

Can you share the final loss?

Hi,

I am fine tuning the network. Can you share the final loss you trained with mobile0.25?

You may forgot the exact values, but can you give a rough value (like 3.xxx or 4.xxx) ? I need a reference.

thanks.

about the pretrained model

Dear biubug6,
I would like to change a bit about the backbone network. I wonder how to train a pretrained model, like you did as mobilnet0.25. Any suggestion is highly appreciated. Thank you very much.

best,
XL

你好,有关自定义数据集测试结果偏差的问题

我用RetinaFace(包括在你的框架下构建的其他检测器)在不带landmark自己的数据集上训练的模型(训练size 512 x 512),发现512的测试size会发生严重的检测框偏移(都是向左下),原图测试无偏移且结果正常。但带landmark的WIDER FACE数据集训练的模型基本没有偏移,而且做landmark和不做landmark的相同检测器均存在偏移。
请问下我是哪个环节出错了,在读取label时,默认所有的gt框landmark值为-1,类别label为1,这样处理对吗。

an error in data augment?

I think you have a mistake in data_augment.py preproc ruction.
image
maybe you should multiply the resize_size??

Runtime Error

@biubug6 Hi, I feed when I try to use detect.py, return this error:

File "...../Pytorch_Retinaface/utils/box_utils.py", line 214, in decode
priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
RuntimeError: The size of tensor a (2688) must match the size of tensor b (2470) at non-singleton dimension 0

evaluation.py error

hi, thanks for you share.
First i run this command: python setup.py build_ext --inplace
and then i run: python evaluation.py -p widerface_txt/ -g ground_truth/
i get error like this: ImportError: dynamic module does not define module export function (PyInit_bbox) , it seems that i have not succeed in compile box_overlaps.pyx.
but what should i do.
my env is python3.5 and pytorch1.2.0.

Inf loss detected for Loc

While training, I hit an Inf loss for the bounding boxes.

Epoch:162/300 || Epochiter: 57/101 || Iter: 16318/30300 || Loc: inf Cla: 1.8687 Landm: 4.1464 || LR: 0.00100000 || Batchtime: 5.3929 s || ETA: 20:56:48

您好,请问我在运行detect.py时设置不同尺度输入进行测试,为什么net forword time没有变化

for i in range(100):
image_path = "./curve/3.jpg"
img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)
count=0

    # img=cv2.resize(img,(1280,1920))
    target_size = 1080
    max_size = 1920
    im_shape=img.shape
    im_size_min=np.min(im_shape[0:2])
    im_size_max=np.max(im_shape[0:2])
    resize = float(target_size) / float(im_size_min)
    if np.round(resize * im_size_max) > max_size:
        resize = float(max_size)/float(im_size_max)
    if args.origin_size:
        resize = 1
    
    if resize!=1:
        img=cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)

    im_height, im_width, _ = img.shape
    scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
    img -= (104, 117, 123)
    img = img.transpose(2, 0, 1)
    img = torch.from_numpy(img).unsqueeze(0)
    img = img.to(device)
    scale = scale.to(device)

    tic = time.time()
    loc, conf, landms = net(img)  # forward pass
    toc=time.time()
    print('net forward time: {:.4f}'.format(toc - tic))

Loss balancing parameters and normalization

Hi @biubug6,

First off, thank you for your work and for sharing. I had a question about your loss definitions.

The loss defined in the paper L_cls(pi, pi^) + lamda_1 * pi^ * L_box(ti, ti*) + lamda_2 * pi^* * L_pts(li, li*) + lamda_3 * pi^* * L_pixel. The loss contains
3 loss balancing parameters, which the paper say they set to 0.25, 0.1 and 0.01 respectively.

However, when looking in the code I only see one loss balancing parameter for the localization loss. Furthermore, I do not see the loss balancing happening in the multibox loss module. And do not see the pixel loss also. Is there a reason for this or will it be added later?

The second question is about the normalization of the loss inside the mutlibox loss module. The loss for the loss classes is normalized the same as the localizations, normally it is better to normalize the classes by all boxes and the localization by the positives ones. is there also a reason for this?

Thanks in advance.

Best,

Casper Thuis

change backbone

@biubug6 hi, thanks for sharing your project.
can you provide short wiki for changing backbone?
how can I change the backbone and train?

performance on oriented images

Thanks for sharing your work.

I wanted to know your thoughts on how will retinaface perform on rotated face images/ face at weird angles?
When i tested it the result is not very good.

Moreover,
How could i use the landmarks information to maybe align the image?

Thanks in advance.

dcn module

Note that dcn module was reported in paper, is there an implementation of dcn included in this repo?

Strange results

@biubug6 Hi, thanks for your great work.
I tested to pictures and get strange results.
these are inputs:
12
62072401121361486

and these are outputs:

1570897339 4553294
1570896991 89142

the first image has more score compare to some faces in second images.
is it right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.