hrnet / hrnet-facial-landmark-detection Goto Github PK

This is an official implementation of facial landmark detection for our TPAMI paper "Deep High-Resolution Representation Learning for Visual Recognition". https://arxiv.org/abs/1908.07919

License: MIT License

Python 100.00%

deep-high-resolution-net face-alignment facealignment facial-landmarks hrnets

hrnet-facial-landmark-detection's Issues

train error

ERROR: ValueError: only one element tensors can be converted to Python scalars

please help ~~THANK YOU.

I have a question regarding training on AFLW. The dataset provides annotations for landmark visibility. I am interested in the reason why have you not took this into account when you train for AFLW. I guess the NME would have been lower in this way.

Thank you,
Andrei

what's means about GAMMA1,GANMMA2 in you trainlog?

请问有谁在WFLW上复现了吗？

Bug in load best model?

When i load the best saved model as below:
python tools/test.py --cfg experiments/300w/face_alignment_300w_hrnet_w18.yaml --model-file output/300W/face_alignment_300w_hrnet_w18/model_best.pth
step 1
In tools/test.py
##args.model_file = output/300W/face_alignment_300w_hrnet_w18/model_best.pth
state_dict = torch.load(args.model_file) ## The state_dict is not dict but module
if 'state_dict' in state_dict.keys(): ## Here raise an error
state_dict = state_dict['state_dict']
model.load_state_dict(state_dict)
else:
model.module.load_state_dict(state_dict)
step 2
So, i follow the best model save code in lib/utils/utils.py/save_checkpiont()
if is_best and 'state_dict' in states.keys():
torch.save(states['state_dict'].module, os.path.join(output_dir, 'model_best.pth')) ##Here is the bug. The code save module instead of state_dict. So, the error raise when we load the best model.

Question about flipping data.

HRNet-Facial-Landmark-Detection/lib/utils/transforms.py

Line 40 in 81ae541

x[:, 0] = width - x[:, 0]

For instance, when flipping (0, 0) point in 3x3 image, the result should be (3-1-0, 0). So I think x[:, 0] = width -1 - x[:, 0] maybe better. Is that right? Thx.

How could I get the center and scale of wild images?

In the WLFW model, the decoding of preds needs the center and scale to get final results. How could I get the center and scale of wild images (e.g. a cropped and resized face image)?

can you provide the dataset link?

hi, author
can you provide the dataset download link? 300w aflw cfow wflw

Error while trying to run inference.

while trying to run inference for a webcam.

error.

Traceback (most recent call last):
File "camera.py", line 10, in
from model.pfld import PFLDInference, AuxiliaryNet
ModuleNotFoundError: No module named 'model.pfld'; 'model' is not a package

please help

The effect of scale, center_w, center_h on performance?

Thanks for you code! But the annoying thing is that scale, center_w, center_h are needed when inference. Whether there are substitutable transformations without scale, center_w, center_h? Or how much of an impact does while preprocessing with scale, center_w, center_h? Looking forward to a reply, thanks~

code using probloms

hi, I've tried to use your code for some demo tests on my own images, but it seems it can't be used for single forward inference. So what should I do to make it into my practical using?
thanks a lot!

inference problem

It shows that the results is good in dataset,but it depends on scale, center_w, center_h.If I want to do inference of an image without information of scale, center_w, center_h, how do I get scale and center_w,center_h accurately.if these variable are not accurate,the landmark is not good.So I must get accurate scale, center_w and center_h. Any suggestions?

300W Dataset Numbers

Hi authors,
Thankyou for releasing your code. I just happened to see a different number on the README as well as on the arxiv paper for 300W dataset (although the number for the test is same). It would be really great if you could tell the differences between the training of these two models.

	Common	Challenge	Full	Test
README	2.91	5.11	3.34	3.85
Arxiv paper	2.87	5.15	3.32	3.85

Some questions about inference function

after running wlfw dataset and models,
i got confused....

File "tools/test.py", line 78, in main
nme, predictions = function.inference(config, test_loader, model)

File 'lib/core/function.py ', line 194, in inference
preds = decode_preds(score_map, meta['center'], meta['scale'], [64, 64])

File 'lib/core/evolution.py ', line 64, in decode_preds
coords = get_preds(output) # shape （8,98,2）
actually, thats the matrix number of the greatest value in 64*64 output matrix.
also, i think that number should be int.
i dont understand why u just let
coords[n][p] += diff.sign() * .25
and
coords += 0.5

Hi, Can anyone share an inference file to check the landmarks through a webcam or a video, please

the scale

how to 68model convert to onnx?

because this part,i can not convert model to onnx! please help!

@HRNet

Inference on our own images

Hi! Thank you for your work. Like some others, I've modified the code to take my own dataset and get predictions on it using dummy values for the ground truths, but the points look nothing like expected. Any help would be appreciated.

Hey, I rewrote a script to do inference on single image. However, I observed that the predicted points are not correct after transformation. Any suggestions relating to it?

Originally posted by @testingshanu in #7 (comment)

Thank you! Ana

the files in the OneDrive is not reachable

hi, i want try to download the trained weights and preprocessed files, but the file link of OneDrive is not reachable, does anyone share the trained weights and preprocessed annotation files with any other locations?

some issue about dataloader wflw

wflw.py the data loader of wflw.

in line 63:
img = np.array(Image.open(image_path).convert('RGB'), dtype=np.float32)

after that, normalizing the img:
img = (img/255.0 - self.mean) / self.std

why dont u just use
PIL and pytorch.transform to normalize while training and testing?

Missing keys while loading the model

Hi,

I was trying out test.py, but I get the error missing keys(s) in state_dict and list of keys.
Am I missing something ?

RuntimeError: Error(s) in loading state_dict for HighResolutionNet:
Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", .... goes on

pretrained model

Explanation of Annotation .csv file

In your provided annotation file face_landmarks_wflw_test.csv, how are the annotations for "scale", "center_w" and "center_h" computed from the original face bounding box annotations of <x_min, y_min, x_max, y_max> for the face bounding provided by the authors of the WFLW dataset? Could you explain?

About AFLW dataset

I download AFLW dataset from someone baiduyun link,maybe the some images is broken.
And when I train the model,Warning message :

TiffImagePlugin.py:754: UserWarning: Possibly corrupt EXIF data. Expecting to read 19660800 bytes but only got 0. Skipping tag 0
" Skipping tag %s" % (size, len(data), tag))
PIL/TiffImagePlugin.py:771: UserWarning: Corrupt EXIF data. Expecting to read 12 bytes but only got 6.

I don't know wheather the image in native dataset is OK.
Have you ever come across the same situation , I wanna ensure wheather the baiduyun_dataset is the same with native one.

About Inference Time

When I run tools/test.py with WFLW dataset , It may cost 115s on single 2080Ti, 85s on 4 2080Tis.
And the CPU cost is high with 300% , but gpu always cost 0% and gpu-memory is very low even though set batchsize as 256.

So It is working fine? If not, can u tell me how to do with it.

how the get the coordinates of keypoint from the output of the model

what is the mean of "center" and "scale" in face_landmarks_300w_test.csv

I mean, how do you get the "center_w", "center_h" and "scale"?

Could you share your train log and test log?

How do I make the models run in a laptop without GP

Not convergence at WFLW dataset

Hi, thanks for your excellent work. When I tried to train from pretrained imagenet model on WFLW, it's wired that the loss is convergence at about 0.0011 no matter how I change the lr or the optimizer.BTW,the nme on test set is about 0.20.
All my configuration is just the config file you provided. Do you have any idea about this?

[demo] for own dataset

Thanks for your great job!
Would it be possible for you to inform me about how I should config test.py if I only want to extract facial landmark in my own dataset.
Thanks in advance !

The link of model weights failure?

Thanks for your wonderful jobs~ One question, your link to model weights failure, maybe some mistakes happened?

How do I make the models run in a laptop without GPUs?

Currently having problems with running since I don't have a GPU.

ImportError: No module named pathlib

It seems that there is a lack of pathlib.py

how to calculate the scale, can you provide me the script to write csv?

Thank you for your code!
Could you please provide the script to write csv? Thank you!

Default Model don't match weights from link?

Another question plz, refer to your tools/test.py, I use model = models.get_face_alignment_net(config) to load a default model ,then load HR18-300W.pth. However, some mistakes happen, it show that the feature nums from default model are inconsistent with weights from link. Waiting for your responding~

Not convergence on 300W when using pretrained model.

I have the similar problem on 300W dataset when using pretrained weights, for which the NME is around 1.0. I just can't figure it out.

Some problems about generate_target

Hello, I'm so glad to read your paper and code. But I have some problems.

+1 in transform_pixel

-1 in __getitem__

I'm a little confused.

I debug the code generate_targe

def generate_target(img, pt, sigma, label_type='Gaussian'):
    """
    :param img: heatmap of a landmark    64 64
    :param pt: a landmark (1,2)
    :param sigma:
    :param label_type:
    :return:
    """
    # Check that any part of the gaussian is in-bounds
    tmp_size = sigma * 3                                        #辐射范围为4.5
    ul = [int(pt[0] - tmp_size), int(pt[1] - tmp_size)]
    br = [int(pt[0] + tmp_size + 1), int(pt[1] + tmp_size + 1)] #这里+1之后两边对称，
    if (ul[0] >= img.shape[1] or ul[1] >= img.shape[0] or
            br[0] < 0 or br[1] < 0):                            #如果超出了辐射范围
        # If not, just return the image as is
        return img                                              #返回空白的heatmap

    # Generate gaussian
    size = 2 * tmp_size + 1                                     #10
    x = np.arange(0, size, 1, np.float32)
    y = x[:, np.newaxis]
    x0 = y0 = size // 2                                         #5
    # The gaussian is not normalized, we want the center value to equal 1
    if label_type == 'Gaussian':                                #11x11高斯核
        g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))
    else:
        g = sigma / (((x - x0) ** 2 + (y - y0) ** 2 + sigma ** 2) ** 1.5)

    # Usable gaussian range
    g_x = max(0, -ul[0]), min(br[0], img.shape[1]) - ul[0]
    g_y = max(0, -ul[1]), min(br[1], img.shape[0]) - ul[1]


    # Image range
    img_x = max(0, ul[0]), min(br[0], img.shape[1])
    img_y = max(0, ul[1]), min(br[1], img.shape[0])

    img[img_y[0]:img_y[1], img_x[0]:img_x[1]] = g[g_y[0]:g_y[1], g_x[0]:g_x[1]]
    return img

if __name__ == "__main__":
    heatmap = np.zeros((64,64))
    pt = (0,1)
    sigma = 1.5
    heatmap = generate_target(heatmap, pt, sigma)

the heatmap is like this..the max value index is (1, 1)

when I change pt = (10, 10） and get correct heatmap

I'm not sure it's a bug or something.

Thanks for your reply.

What is ”200“ using for?

HRNet-Facial-Landmark-Detection/lib/datasets/cofw.py

Line 71 in 7b6c551

 scale = max(math.ceil(xmax) - math.floor(xmin), math.ceil(ymax) - math.floor(ymin)) / 200.0 

Could you explain what is the meaning of "200"?
Thanks

Hi! some errors during the training!

face_alignment_300w_hrnet_w18_2019-07-18-21-22_train.log
Hi!, I am very interesting in your codes, and I want to retrain the network by your configure file, but it seems that it has a big difference between yours and mine. It is my log file, the nme is about 0.048 which is much bigger than your 0.038, can you help me to get the same result with yours? By the way, in your training log, the version of pytorch seems not 1.0, is it right, can you tell me your version. Thank you !!!

HRNET in human pose estimation

Do you ever do some experiments with HRNetv2 or HRNetv2p in human pose estimation? Does it have some improvements?

Results without pre-training

Hi, thanks for this awesome repo.
I noticed that all models use imagenet pre-trained initialization. I am wondering, would you mind to provide results without imagenet pre-training.

inconsistent implementation for saved .pth model

Hi,

Thanks for sharing such a awesome work.

As current implementation did not using model.module anymore.

Using model.load_state_dict(state_dict) instead of model.module.load_state_dict(state_dict)
works for all trained weights provided except HR18-300W.pth which still saved under model.module.

For now I just modify the keys as below and it works correctly.

new_state_dict = OrderedDict()
for i, key in enumerate(state_dict):
    new_state_dict[key.replace('module.', '')] = state_dict[key]
model.load_state_dict(new_state_dict)

rotations are not considered when training NME is computed.

Hello, it seems that rotations are not considered when counting training NME.

I'll put 300W experiments as example.

In lib.datasets.face300w, the original ground truth landmarks are passed (in some cases they are flipped), and random rotation is carried out for training set (note that ground truth landmarks are not rotated).

And in lib.core.function.train, the NME is computed with

preds = decode_preds(score_map, meta['center'], meta['scale'], [64, 64])
nme_batch = compute_nme(preds, meta)

Where the parameters don't contain rotation information

And in lib.utils.transforms.transform_preds

coords[p, 0:2] = torch.tensor(transform_pixel(coords[p, 0:2], center, scale, output_size, 1, 0))

the rotation factor is set to constant 0.

So a non-rotated ground truth and prediction based on random rotated image (without rotated back) is used to compute training NME, is there anything wrong?

the data source of each row in the face.png

In face.png, there are 4 rows...Could you point out the data source of each of them?

thanks!

may be a bug on last epoch

In tools.train.py LINE 69,

last_epoch = config.TRAIN.BEGIN_EPOCH

I think it may should be config.TRAIN.END_EPOCH?

post-processing

Dear author, how can I understand the post-processing part in the decode_preds function？

# pose-processing
for n in range(coords.size(0)):
    for p in range(coords.size(1)):
        hm = output[n][p]
        px = int(math.floor(coords[n][p][0]))
        py = int(math.floor(coords[n][p][1]))
        if (px > 1) and (px < res[0]) and (py > 1) and (py < res[1]):
            diff = torch.Tensor([hm[py - 1][px] - hm[py - 1][px - 2], hm[py][px - 1]-hm[py - 2][px - 1]])
            coords[n][p] += diff.sign() * .25
coords += 0.5
preds = coords.clone()

hrnet / hrnet-facial-landmark-detection Goto Github PK

hrnet-facial-landmark-detection's Issues

Recommend Projects

Recommend Topics

Recommend Org