biubug6 / pytorch_retinaface Goto Github PK
View Code? Open in Web Editor NEWRetinaface get 80.99% in widerface hard val using mobilenet0.25.
License: MIT License
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
License: MIT License
i found your model use leakyRelu activation fuction, the original model uses Relu fuction, is the leakyRelu makes better? thx~~
Your work is amazing. I tried it on few sample images and it work well in term of both speed and accuracy. Then, I tried to modify it for addressing face detecting in video file, but the processing time for each frame was so long. Can you provide me code for running with video, or any suggestions about it?
This is what I tried ("txt" format for github's supporting):
video.txt
Could you please explain a little bit where the Dense Regression Branch is in your model? I think I only find the bounding box, cls, and landmark branch but not the dense regression branch. Thx
can you export onnx?
In RetinaFace paper, it has 4 losses and has different weights for each, roughly has a ratio of 65:25:10:1.
Understood that you did not put the dense regression loss in implementation. and I saw a 2:1:1 in the config, just wonder is this the findout of your cross validation to balance the three losses?
Did you test other ratios?
RT
from torch.jit.annotations import Dict
ImportError: cannot import name 'Dict'
很多依赖都无法import
whats the retinaface of resnet50 input shape?
Hi, when I run the test_wilderface.py, return the valueError
File "test_widerface.py", line 107, in
im_size_min = np.min(im_shape[0:2])
File "<array_function internals>", line 6, in amin
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 2746, in amin
keepdims=keepdims, initial=initial, where=where)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
Have you ever trained on more larger resolution like 800x800 with mobile0.25? I got much more worse result than default 640x640.
Any idea?
Although the network forward time only about 5ms, But the post-processing time on my laptop is up to 150 ms.
The main time-consuming places are 98 to 102 lines in detect.py, generating the prior box take up a lot of time, What can we do to reduce it? thanks
有一段不是很能理解
https://github.com/biubug6/Pytorch_Retinaface/blob/master/layers/modules/multibox_loss.py#L101
loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))
batch_conf
的 shape 是 (num_anchors, 2)
但這邊是透過 log_sum_exp 把 batch_conf 中每個 anchor 對於每個 class 的值相加取 log (log_sum_exp(batch_conf)
),但是第一個 class 不是背景嗎?有需要把背景考慮進來?
後面又更不懂, batch_conf.gather(1, conf_t.view(-1, 1))
,這邊為什麼沒有做 log_sum_exp
可以解釋一下這邊的意思嗎?
謝謝
Hi, Anyone can tell me,
What is the difference between Pytorch (same parameter with Mxnet) and Pytorch (original image scale)?
How to train the model of Pytorch (same parameter with Mxnet) or how to get 79.69% on wider face hard set using Mobilenet0.25?
Thanks!
python3 train.py --ngpu 4 --resume_net model_best.pth.tar
Traceback (most recent call last):
File "train.py", line 67, in
net.load_state_dict(new_state_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 777, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RetinaFace:
Missing key(s) in state_dict: "body.stage1.0.0.weight", "body.stage1.0.1.weight", "body.stage1.0.1.bias", "body.stage1.0.1.running_mean", "body.stage1.0.1.running_var", "body.stage1.1.0.weight", "body.stage1.1.1.weight", "body.stage1.1.1.bias", "body.stage1.1.1.running_mean", "body.stage1.1.1.running_var", "body.stage1.1.3.weight", "body.stage1.1.4.weight", "body.stage1.1.4.bias", "body.stage1.1.4.running_mean", "body.stage1.1.4.running_var", "body.stage1.2.0.weight", "body.stage1.2.1.weight", "body.stage1.2.1.bias", "body.stage1.2.1.running_mean", "body.stage1.2.1.running_var", "body.stage1.2.3.weight", "body.stage1.2.4.weight", "body.stage1.2.4.bias", "body.stage1.2.4.running_mean", "body.stage1.2.4.running_var", "body.stage1.3.0.weight", "body.stage1.3.1.weight", "body.stage1.3.1.bias", "body.stage1.3.1.running_mean", "body.stage1.3.1.running_var", "body.stage1.3.3.weight", "body.stage1.3.4.weight", "body.stage1.3.4.bias", "body.stage1.3.4.running_mean", "body.stage1.3.4.running_var", "body.stage1.4.0.weight", "body.stage1.4.1.weight", "body.stage1.4.1.bias", "body.stage1.4.1.running_mean", "body.stage1.4.1.running_var", "body.stage1.4.3.weight", "body.stage1.4.4.weight", "body.stage1.4.4.bias", "body.stage1.4.4.running_mean", "body.stage1.4.4.running_var", "body.stage1.5.0.weight", "body.stage1.5.1.weight", "body.stage1.5.1.bias", "body.stage1.5.1.running_mean", "body.stage1.5.1.running_var", "body.stage1.5.3.weight", "body.stage1.5.4.weight", "body.stage1.5.4.bias", "body.stage1.5.4.running_mean", "body.stage1.5.4.running_var", "body.stage2.0.0.weight", "body.stage2.0.1.weight", "body.stage2.0.1.bias", "body.stage2.0.1.running_mean", "body.stage2.0.1.running_var", "body.stage2.0.3.weight", "body.stage2.0.4.weight", "body.stage2.0.4.bias", "body.stage2.0.4.running_mean", "body.stage2.0.4.running_var", "body.stage2.1.0.weight", "body.stage2.1.1.weight", "body.stage2.1.1.bias", "body.stage2.1.1.running_mean", "body.stage2.1.1.running_var", "body.stage2.1.3.weight", "body.stage2.1.4.weight", "body.stage2.1.4.bias", "body.stage2.1.4.running_mean", "body.stage2.1.4.running_var", "body.stage2.2.0.weight", "body.stage2.2.1.weight", "body.stage2.2.1.bias", "body.stage2.2.1.running_mean", "body.stage2.2.1.running_var", "body.stage2.2.3.weight", "body.stage2.2.4.weight", "body.stage2.2.4.bias", "body.stage2.2.4.running_mean", "body.stage2.2.4.running_var", "body.stage2.3.0.weight", "body.stage2.3.1.weight", "body.stage2.3.1.bias", "body.stage2.3.1.running_mean", "body.stage2.3.1.running_var", "body.stage2.3.3.weight", "body.stage2.3.4.weight", "body.stage2.3.4.bias", "body.stage2.3.4.running_mean", "body.stage2.3.4.running_var", "body.stage2.4.0.weight", "body.stage2.4.1.weight", "body.stage2.4.1.bias", "body.stage2.4.1.running_mean", "body.stage2.4.1.running_var", "body.stage2.4.3.weight", "body.stage2.4.4.weight", "body.stage2.4.4.bias", "body.stage2.4.4.running_mean", "body.stage2.4.4.running_var", "body.stage2.5.0.weight", "body.stage2.5.1.weight", "body.stage2.5.1.bias", "body.stage2.5.1.running_mean", "body.stage2.5.1.running_var", "body.stage2.5.3.weight", "body.stage2.5.4.weight", "body.stage2.5.4.bias", "body.stage2.5.4.running_mean", "body.stage2.5.4.running_var", "body.stage3.0.0.weight", "body.stage3.0.1.weight", "body.stage3.0.1.bias", "body.stage3.0.1.running_mean", "body.stage3.0.1.running_var", "body.stage3.0.3.weight", "body.stage3.0.4.weight", "body.stage3.0.4.bias", "body.stage3.0.4.running_mean", "body.stage3.0.4.running_var", "body.stage3.1.0.weight", "body.stage3.1.1.weight", "body.stage3.1.1.bias", "body.stage3.1.1.running_mean", "body.stage3.1.1.running_var", "body.stage3.1.3.weight", "body.stage3.1.4.weight", "body.stage3.1.4.bias", "body.stage3.1.4.running_mean", "body.stage3.1.4.running_var", "fpn.output1.0.weight", "fpn.output1.1.weight", "fpn.output1.1.bias", "fpn.output1.1.running_mean", "fpn.output1.1.running_var", "fpn.output2.0.weight", "fpn.output2.1.weight", "fpn.output2.1.bias", "fpn.output2.1.running_mean", "fpn.output2.1.running_var", "fpn.output3.0.weight", "fpn.output3.1.weight", "fpn.output3.1.bias", "fpn.output3.1.running_mean", "fpn.output3.1.running_var", "fpn.merge1.0.weight", "fpn.merge1.1.weight", "fpn.merge1.1.bias", "fpn.merge1.1.running_mean", "fpn.merge1.1.running_var", "fpn.merge2.0.weight", "fpn.merge2.1.weight", "fpn.merge2.1.bias", "fpn.merge2.1.running_mean", "fpn.merge2.1.running_var", "ssh1.conv3X3.0.weight", "ssh1.conv3X3.1.weight", "ssh1.conv3X3.1.bias", "ssh1.conv3X3.1.running_mean", "ssh1.conv3X3.1.running_var", "ssh1.conv5X5_1.0.weight", "ssh1.conv5X5_1.1.weight", "ssh1.conv5X5_1.1.bias", "ssh1.conv5X5_1.1.running_mean", "ssh1.conv5X5_1.1.running_var", "ssh1.conv5X5_2.0.weight", "ssh1.conv5X5_2.1.weight", "ssh1.conv5X5_2.1.bias", "ssh1.conv5X5_2.1.running_mean", "ssh1.conv5X5_2.1.running_var", "ssh1.conv7X7_2.0.weight", "ssh1.conv7X7_2.1.weight", "ssh1.conv7X7_2.1.bias", "ssh1.conv7X7_2.1.running_mean", "ssh1.conv7X7_2.1.running_var", "ssh1.conv7x7_3.0.weight", "ssh1.conv7x7_3.1.weight", "ssh1.conv7x7_3.1.bias", "ssh1.conv7x7_3.1.running_mean", "ssh1.conv7x7_3.1.running_var", "ssh2.conv3X3.0.weight", "ssh2.conv3X3.1.weight", "ssh2.conv3X3.1.bias", "ssh2.conv3X3.1.running_mean", "ssh2.conv3X3.1.running_var", "ssh2.conv5X5_1.0.weight", "ssh2.conv5X5_1.1.weight", "ssh2.conv5X5_1.1.bias", "ssh2.conv5X5_1.1.running_mean", "ssh2.conv5X5_1.1.running_var", "ssh2.conv5X5_2.0.weight", "ssh2.conv5X5_2.1.weight", "ssh2.conv5X5_2.1.bias", "ssh2.conv5X5_2.1.running_mean", "ssh2.conv5X5_2.1.running_var", "ssh2.conv7X7_2.0.weight", "ssh2.conv7X7_2.1.weight", "ssh2.conv7X7_2.1.bias", "ssh2.conv7X7_2.1.running_mean", "ssh2.conv7X7_2.1.running_var", "ssh2.conv7x7_3.0.weight", "ssh2.conv7x7_3.1.weight", "ssh2.conv7x7_3.1.bias", "ssh2.conv7x7_3.1.running_mean", "ssh2.conv7x7_3.1.running_var", "ssh3.conv3X3.0.weight", "ssh3.conv3X3.1.weight", "ssh3.conv3X3.1.bias", "ssh3.conv3X3.1.running_mean", "ssh3.conv3X3.1.running_var", "ssh3.conv5X5_1.0.weight", "ssh3.conv5X5_1.1.weight", "ssh3.conv5X5_1.1.bias", "ssh3.conv5X5_1.1.running_mean", "ssh3.conv5X5_1.1.running_var", "ssh3.conv5X5_2.0.weight", "ssh3.conv5X5_2.1.weight", "ssh3.conv5X5_2.1.bias", "ssh3.conv5X5_2.1.running_mean", "ssh3.conv5X5_2.1.running_var", "ssh3.conv7X7_2.0.weight", "ssh3.conv7X7_2.1.weight", "ssh3.conv7X7_2.1.bias", "ssh3.conv7X7_2.1.running_mean", "ssh3.conv7X7_2.1.running_var", "ssh3.conv7x7_3.0.weight", "ssh3.conv7x7_3.1.weight", "ssh3.conv7x7_3.1.bias", "ssh3.conv7x7_3.1.running_mean", "ssh3.conv7x7_3.1.running_var", "ClassHead.0.conv1x1.weight", "ClassHead.0.conv1x1.bias", "ClassHead.1.conv1x1.weight", "ClassHead.1.conv1x1.bias", "ClassHead.2.conv1x1.weight", "ClassHead.2.conv1x1.bias", "BboxHead.0.conv1x1.weight", "BboxHead.0.conv1x1.bias", "BboxHead.1.conv1x1.weight", "BboxHead.1.conv1x1.bias", "BboxHead.2.conv1x1.weight", "BboxHead.2.conv1x1.bias", "LandmarkHead.0.conv1x1.weight", "LandmarkHead.0.conv1x1.bias", "LandmarkHead.1.conv1x1.weight", "LandmarkHead.1.conv1x1.bias", "LandmarkHead.2.conv1x1.weight", "LandmarkHead.2.conv1x1.bias".
Unexpected key(s) in state_dict: "epoch", "arch", "state_dict", "best_acc1", "optimizer".
1.为什么classnum=2呢?不是只有一个类别就是脸吗?
2.wider_face.py 里,67,68行
if (annotation[0, 4]<0):
annotation[0, 14] = -1
为什么要等于 -1 呢?
望解惑,谢谢!
when i conduct similar trial in my face detection codebase, the two eyes' lmk points overlap and always between the eyebrows. But the nose lmk is always right. Do you know what might be the reason?
Hi, thanks for your excellent work, i run the detect.py and get the box and landmark result. However, i want to get the camera parameters ,like camera location,camera pose and focal length. Is there interface to do this ?
Graph is on PASCAL
retinaface: resize = 1
retinaface-2: resize = 1.5 (or resize close to 640px, same effect)
Just like FaceBoxes prefers minimum face size (13 px) @ trained size (1024 px), RetinaFace also prefers resize to trained size (640px).
Right now resize is hardcoded to 1. I also now get a better result on AFW/PASCAL/etc..
I tested with set origin_size
to True which means using original size on some tiny image.
it runs around 300 ms on GTX1080ti, seems a little slow...
Hi,
I use your codes to train a retinaface detector using the Resnet50 pre-trained model and the widerface data you provide. I didn't change any configuration parameter in config.py except batch_size(from original 24 to 4).
However, I cannot get the same AP as yours. By single scale testing, the result is as follows: Easy Val AP 0.918, Medium Val AP 0.874, Hard Val AP 0.620.
I want to know how to get the same AP and why there is a big gap between my result and yours. Thanks a lot.
Hi. The shape of conf is (N, boxes_num, 2). (N, boxes_num, 1:2) is the confidence of face classification. May I ask, is (N, boxes_num, 0:1) the confidence of not a face classification? Thanks.
what's the diff of posprocess between pytorch and mxnet after got the output blob ? Is the same way to get the anchors at each output layer?
i convert the pth model tried by your code to caffe model, and make infer at this c++ imp https://github.com/clancylian/retinaface. but, i got the wrong result. have you get any idea about this?? thx
hi @biubug6 , thanks for sharing your code! The pytorch version is much more clear for me compared the MXnet version. May I ask you, do you keep the same network structure and loss function in this pytorch version as the original MXnet?
There is a problem on the mobilenetV1X0.25_pretrain.tar, Can you update the download link?Thanks
I've seen a problem of speed in the issue #1 which is closed now, i found that sometimes the speed is very slow, so i test it, in the detect.py:
i add a line before # testing begin
test_dataset = ['./test_image/0_Parade_marchingband_1_465.jpg',
'./test_image/0_Parade_Parade_0_43.jpg',
'./test_image/1_Handshaking_Handshaking_1_94.jpg',
'./test_image/1_Handshaking_Handshaking_1_158.jpg',
'./test_image/1.jpg', './test_image/2.jpg', './test_image/3.jpg']
The first four pictures are from wider_val, They are all of different sizes,
The last three pictures are the same size downloaded from internet.
the test_dataset list have 7 images and 5 different sizes
then i run python detect.py , i got the result:
net forward time: 1.5058
net forward time: 1.5821
net forward time: 2.2512
net forward time: 1.5150
net forward time: 2.6007
net forward time: 0.0072
net forward time: 0.0059
net forward time: 0.0061
i found that the first five times were very slow.
so, Is Retinaface particularly slow to see a new image size for the first time?
and where is the main time spent? thanks
Hi biu,
When i run command line for training. I'm facing this issue:
RuntimeError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 3.94 GiB total capacity; 2.79 GiB already allocated; 208.50 MiB free; 28.71 MiB cached).
Please give me some advice.
Best regards,
PeterPham
Hi,
I am fine tuning the network. Can you share the final loss you trained with mobile0.25?
You may forgot the exact values, but can you give a rough value (like 3.xxx or 4.xxx) ? I need a reference.
thanks.
RT
Dear biubug6,
I would like to change a bit about the backbone network. I wonder how to train a pretrained model, like you did as mobilnet0.25. Any suggestion is highly appreciated. Thank you very much.
best,
XL
我用RetinaFace(包括在你的框架下构建的其他检测器)在不带landmark自己的数据集上训练的模型(训练size 512 x 512),发现512的测试size会发生严重的检测框偏移(都是向左下),原图测试无偏移且结果正常。但带landmark的WIDER FACE数据集训练的模型基本没有偏移,而且做landmark和不做landmark的相同检测器均存在偏移。
请问下我是哪个环节出错了,在读取label时,默认所有的gt框landmark值为-1,类别label为1,这样处理对吗。
@biubug6
for the code, I find the crop coordinate but it not properly cropping,
ex:
x = int(b[0])
y = int(b[1])
w = int(b[2])
h = int(b[3])
crop_img = img_raw[y:y+h,x:x+w]
please provide the coordinate for cropping he face
@biubug6 Hi, I feed when I try to use detect.py, return this error:
File "...../Pytorch_Retinaface/utils/box_utils.py", line 214, in decode
priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
RuntimeError: The size of tensor a (2688) must match the size of tensor b (2470) at non-singleton dimension 0
hi, thanks for you share.
First i run this command: python setup.py build_ext --inplace
and then i run: python evaluation.py -p widerface_txt/ -g ground_truth/
i get error like this: ImportError: dynamic module does not define module export function (PyInit_bbox) , it seems that i have not succeed in compile box_overlaps.pyx.
but what should i do.
my env is python3.5 and pytorch1.2.0.
While training, I hit an Inf loss for the bounding boxes.
Epoch:162/300 || Epochiter: 57/101 || Iter: 16318/30300 || Loc: inf Cla: 1.8687 Landm: 4.1464 || LR: 0.00100000 || Batchtime: 5.3929 s || ETA: 20:56:48
Hi,
Is the default min_size calculated based on (640x640)?
I found the scale of width and height of each anchor is weird.
https://github.com/biubug6/Pytorch_Retinaface/blob/master/layers/functions/prior_box.py#L23
s_kx = min_size / self.image_size[1]
I think it should be
s_kx = min_size * (self.image_size[1] / 640)
when testing with new image scales.
Not sure if I am right.
thank you.
I want to evaluate the performance, but got the above trouble.
for i in range(100):
image_path = "./curve/3.jpg"
img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)
count=0
# img=cv2.resize(img,(1280,1920))
target_size = 1080
max_size = 1920
im_shape=img.shape
im_size_min=np.min(im_shape[0:2])
im_size_max=np.max(im_shape[0:2])
resize = float(target_size) / float(im_size_min)
if np.round(resize * im_size_max) > max_size:
resize = float(max_size)/float(im_size_max)
if args.origin_size:
resize = 1
if resize!=1:
img=cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
im_height, im_width, _ = img.shape
scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
img -= (104, 117, 123)
img = img.transpose(2, 0, 1)
img = torch.from_numpy(img).unsqueeze(0)
img = img.to(device)
scale = scale.to(device)
tic = time.time()
loc, conf, landms = net(img) # forward pass
toc=time.time()
print('net forward time: {:.4f}'.format(toc - tic))
ModuleNotFoundError: No module named 'torchvision.models._utils'
没法导入这个模块
As shown in the title
Hi @biubug6,
First off, thank you for your work and for sharing. I had a question about your loss definitions.
The loss defined in the paper L_cls(pi, pi^) + lamda_1 * pi^ * L_box(ti, ti*) + lamda_2 * pi^* * L_pts(li, li*) + lamda_3 * pi^* * L_pixel. The loss contains
3 loss balancing parameters, which the paper say they set to 0.25, 0.1 and 0.01 respectively.
However, when looking in the code I only see one loss balancing parameter for the localization loss. Furthermore, I do not see the loss balancing happening in the multibox loss module. And do not see the pixel loss also. Is there a reason for this or will it be added later?
The second question is about the normalization of the loss inside the mutlibox loss module. The loss for the loss classes is normalized the same as the localizations, normally it is better to normalize the classes by all boxes and the localization by the positives ones. is there also a reason for this?
Thanks in advance.
Best,
Casper Thuis
@biubug6 hi, thanks for sharing your project.
can you provide short wiki for changing backbone?
how can I change the backbone and train?
Thanks for sharing your excellent work.
I'd like to try it, but I can't find its license information.
Could you tell me the license to use this for my work?
It would be helpful if it were this license:
https://en.wikipedia.org/wiki/Permissive_software_license
Thanks for sharing your work.
I wanted to know your thoughts on how will retinaface perform on rotated face images/ face at weird angles?
When i tested it the result is not very good.
Moreover,
How could i use the landmarks information to maybe align the image?
Thanks in advance.
Note that dcn module was reported in paper, is there an implementation of dcn included in this repo?
@biubug6 Hi, thanks for your great work.
I tested to pictures and get strange results.
these are inputs:
and these are outputs:
the first image has more score compare to some faces in second images.
is it right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.