Giter Club home page Giter Club logo

Comments (13)

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

You need to run a face detector at first to obtain the coordinate of the face bounding box, instead of directly resizing the original image. The reason is that the standard training and testing images are cropped according to their bounding box and fed into the model with a larger size of the face.

from stylealign.

ilovecv avatar ilovecv commented on June 19, 2024

Hi @TheSouthFrog ,

Thank you very much for your quick reply. So I tried to preprocess the image you show in crop_pic.py, i.e., pre prop with expand ratio = 0.2, resize image to 256*256. The result is still not very good. Should I expand the face bounding box? Here is the result:
2

from stylealign.

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

The bounding box you provided is too small. You may expand the ratio then. I guess the result should be improved a lot.

More concretely, you can also refer to this issue. The general reason is that you should provide the bounding box with a similar style in your training data, e.g. MTCNN for 300W. Since the model we are using is trained on WFLW. , we are supposed to provide similar box annotations to the Wider Dataset. You can either choose to expand the ratio or use a face detector pre-trained on Wider Face.

from stylealign.

ilovecv avatar ilovecv commented on June 19, 2024

Hi @TheSouthFrog,

As you suggested, I tried another face detector which is trained on Wider face dataset, https://github.com/TencentYoutuResearch/FaceDetection-DSFD
But still got no luck. I am wondering if it is possible for you to run your face detector and show the face alignment results? You can find the original image below. Thanks!
2

1

Here is the original image:
2

from stylealign.

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

As I mentioned, the box you provided is too small. have you tried to expand the box ratio?

Here is an example of cropped out training images which show the proper size of input bounding box.
WFLW_Train_98pt_7--Cheering_%dir%_7_Cheering_Cheering_7_367 jpg_0000006556

from stylealign.

ilovecv avatar ilovecv commented on June 19, 2024

Hi @TheSouthFrog,

I tried to expand the box ratio, it did improve the result. However, for faces with large poses, like the image below, what should I do? Thanks!
1

from stylealign.

ilovecv avatar ilovecv commented on June 19, 2024

Hi @TheSouthFrog,

Now I get good results on landmark detection. And I had trained the variational u-net for the style transfer. However, there is a problem with the result, please see the below image. It looks like the landmarks positions are not correct:
train_06095
transfer_0050100

from stylealign.

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

Hi, that's actually a very interesting problem in facial landmark detection. One general approach for mitigating large poses is to pre-use a 5-point landmark detector to obtain the basic angles and coordinates, and then align&crop from the original image according to the initial points and then run more fine-grained detector, e.g. 98 points in our case.

from stylealign.

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

The reason of your second question is that you didn't align the input images first.

If you want to train the generator on your own images, you have to make sure that the landmarks are aligned and centered as well as cropping the corresponding regions of input images. You can refer to the training set I provided which we have pre-processed and cropeed the input raw data.

from stylealign.

ilovecv avatar ilovecv commented on June 19, 2024

Hi @TheSouthFrog,

Thank you very much for your quick response. I learned a lot from it. I am wondering if you can provide the pre-processing script, so I can test on my own images?

from stylealign.

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

I am sorry that I can't provide my pre-processing script to you since we have shipped the detection, align and crop pipeline into SDK for convenience that might not be able to release.

However, I would recommend you to re-train the model on your own dataset. Note that in this work we are trying to augment the styles on the originally-available landmarks, thus we didn't emphasize too much on the generalization of unseen images. If the testing image is significantly different from the training ones. The result could be not that great. So I recommend you to first pre-process your data by align&cropping your own training set using some open-sourced script(there are quite a few). And then train a model using the given hyper-parameters. I suppose that way you can see some okay results.

from stylealign.

ilovecv avatar ilovecv commented on June 19, 2024

Hi @TheSouthFrog,

Thank you very much for your suggestions. I am wondering if it is possible for you to point me to some open-sourced scripts for the preprocessing? Thank you very much!

from stylealign.

TheSouthFrog avatar TheSouthFrog commented on June 19, 2024

Sorry for the late response since I missed the notification these days and forget to respond.

I believe the simplest tool you can use is dlib or a similar widely-used package.

from stylealign.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.