Giter Club home page Giter Club logo

Comments (6)

dragonmeteor avatar dragonmeteor commented on May 13, 2024

Thank you for your suggestions. However, to keep the functionality minimal, I will not be accepting such a big pull request if you were to send one. It's probably the best to direct people to your fork instead.

from talking-head-anime-2-demo.

graphemecluster avatar graphemecluster commented on May 13, 2024

That's OK. I will mainly work on the first 2 points then.
For the fourth point I am actually seeking for someone's help for the algorithm. This is also one of the reason why I open this issue.

from talking-head-anime-2-demo.

graphemecluster avatar graphemecluster commented on May 13, 2024

I edited my fork to minimize the alternations to this repository and I would like to open a pull request because there are some mistakes found (I am not sure if they are intentional, though) and I think it is worthy of automatizing the process of alternating all pixels of (n, n, n, 0) to (0, 0, 0, 0).

Additionally, I have read some part of your article and you’ve mentioned about keypoint-based tracking. Although a direct mapping may not be possible, I wonder if you have studied in converting a set of facial landmarks returned by dlib to your poser’s parameter set. I think it is a simple task but I do not have enough ability to competent to this job.

from talking-head-anime-2-demo.

dragonmeteor avatar dragonmeteor commented on May 13, 2024

First, if you submit a pull request with the first two improvements, I will work with you to incorporate them into the main repository. I think they are simple enhancements that make the software easier to use.

As you might have already observed, the dlib landmarks are quite enough to determine how open/close the eyes and the mouths are, so I used them in the much simpler Version 1 of the software.

Can we determine the rest of the parameters from them? I don't think so. The landmarks simply do not have enough information. For example, it does not tell you where the irises are, so you cannot determine the iris parameters from them.

Even if you abandon the iris parameters, I also think determining other parameters would be hard. The iPhone needs an RGB capture and probably a depth image to infer the 52 blendshape parameters. Works that determine the Action Units such as EmotioNet use both shape (landmark) and shading (image) cues. The landmarks, on the other hand, give you only the shape, but not the shading.

from talking-head-anime-2-demo.

dragonmeteor avatar dragonmeteor commented on May 13, 2024

In the end, I think the problem you want to solve is how to control characters without having to have an iPhone. To do this, you need to be able to determine pose parameters from a video feed instead of dlib landmarks.

Guess what, I tried to solve this problem, but I gave up. I was thinking that I could leverage some free tools to do it. The most promising seemed to be OpenFace 2, which outputs a number of Action Units. Nonetheless, OpenFace 2 does not give me any information about the irises, so I decided to supplement it with outputs from the MediaPipe Iris and MediaPipe Face Mesh models.

At that point, I thought I could cook up some simple formulas to reliably determine the parameters from all the outputs. I was wrong. OpenFace 2 was not at all reliable. The face mesh deformed in weird ways when the face was not looking straight ahead. I also had a hard time determining how open/close the eyes and the mouth were from OpenFace 2's landmarks and the face mesh to the point that I had to revert back to dlib landmarks to do the jobs. Nevertheless, dlib landmarks were also unstable, and the simple algorithm I used in Version 1 broke down when the face moved away from the camera. To get even passable results, I had to specify extra parameters for every video I tried to process. (Here is one such passable result: https://twitter.com/dragonmeteor/status/1329157949156061186/video/1.)

At one point, I realized that trying to solve this problem was not worth my effort. The problem I really want to solve is how to generate animations, not how to determine pose parameters from a video feed. So, I bought an iPhone and a copy of iFacialMocap, and I was able to quickly cook up simple formulas for the parameters from the blendshape parameters that it outputted. I think this decision was a good one.

So, what to do with the rest of the parameters? I don't know either. If I knew, the project would have contained many more demonstration videos. You have to do your own research. Please let me know if you are successful.

from talking-head-anime-2-demo.

graphemecluster avatar graphemecluster commented on May 13, 2024

Sorry for talking such a digressive problem and asking so much of you. I should have known that you have made great effort on it. I am glad that you shared your experiences on this problem. Thank you for reading and writing such a big essay. I must apologize if I did cause any inconvenience. I am sorry to bother you and please forgive my selfishness.
I will try my best to work on it when I have enough ability to do so.

from talking-head-anime-2-demo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.