Giter Club home page Giter Club logo

sd-wav2lip-uhq's Introduction

๐Ÿ”‰๐Ÿ‘„ Wav2Lip UHQ extension for Stable Diffusion WebUI Automatic1111

Illustration

demo_1.mp4

๐Ÿ’ก Description

This repository contains a Wav2Lip UHQ extension for Automatic1111.

It's an all-in-one solution: just choose a video and a speech file (wav or mp3), and the extension will generate a lip-sync video. It improves the quality of the lip-sync videos generated by the Wav2Lip tool by applying specific post-processing techniques with Stable diffusion tools.

Illustration

๐Ÿ“– Quick Index

๐Ÿš€ Updates

2023.08.17

  • ๐Ÿ› Fixed purple lips bug

2023.08.16

  • โšก Added Wav2lip and enhanced video output, with the option to download the one that's best for you, likely the "generated video".
  • ๐Ÿšข Updated User Interface: Introduced control over CodeFormer Fidelity.
  • ๐Ÿ‘„ Removed image as input, SadTalker is better suited for this.
  • ๐Ÿ› Fixed a bug regarding the discrepancy between input and output video that incorrectly positioned the mask.
  • ๐Ÿ’ช Refined the quality process for greater efficiency.
  • ๐Ÿšซ Interruption will now generate videos if the process creates frames

2023.08.13

  • โšก Speed-up computation
  • ๐Ÿšข Change User Interface : Add controls on hidden parameters
  • ๐Ÿ‘„ Only Track mouth if needed
  • ๐Ÿ“ฐ Control debug
  • ๐Ÿ› Fix resize factor bug

๐Ÿ”— Requirements

  • latest version of Stable Diffusion WebUI Automatic1111 by following the instructions on the Stable Diffusion Webui repository.

๐Ÿ’ป Installation

  1. Launch Automatic1111
  2. In the extensions tab, enter the following URL in the "Install from URL" field and click "Install":

Illustration

  1. Go to the "Installed Tab" in the extensions tab and click "Apply and quit".

Illustration

  1. If you don't see the "Wav2Lip UHQ tab" restart Automatic1111.

  2. ๐Ÿ”ฅ Important: Get the weights. Download the model weights from the following locations and place them in the corresponding directories (take care about the filename, especially for s3fd)

Model Description Link to the model install folder
Wav2Lip Highly accurate lip-sync Link extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
Wav2Lip + GAN Slightly inferior lip-sync, but better visual quality Link extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
s3fd Face Detection pre trained model Link extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
landmark predicator Dlib 68 point face landmark prediction (click on the download icon) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator Dlib 68 point face landmark prediction (alternate link) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator Dlib 68 point face landmark prediction (alternate link click on the download icon) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat

๐Ÿ Usage

  1. Choose a video (avi or mp4 format) with a face in it. If there is no face in only one frame of the video, process will fail. Note avi file will not appear in Video input but process will works.
  2. Choose an audio file with speech.
  3. choose a checkpoint (see table above).
  4. Padding: Wav2Lip uses this to add a black border around the mouth, which is useful to prevent the mouth from being cropped by the face detection. You can change the padding value to suit your needs, but the default value gives good results.
  5. No Smooth: When checked, this option retains the original mouth shape without smoothing.
  6. Resize Factor: This is a resize factor for the video. The default value is 1.0, but you can change it to suit your needs. This is useful if the video size is too large.
  7. Only Mouth: This option tracks only the mouth, removing other facial motions like those of the cheeks and chin.
  8. Mouth Mask Dilate: This will dilate the mouth mask to cover more area around the mouth. depends on the mouth size.
  9. Face Mask Erode: This will erode the face mask to remove some area around the face. depends on the face size.
  10. Mask Blur: This will blur the mask to make it more smooth, try to keep it under or equal to Mouth Mask Dilate.
  11. Code Former Fidelity:
    1. A value of 0 offers higher quality but may significantly alter the person's facial appearance and cause noticeable flickering between frames.
    2. A value of 1 provides lower quality but maintains the person's face more consistently and reduces frame flickering.
    3. Using a value below 0.5 is not advised. Adjust this setting to achieve optimal results. Starting with a value of 0.75 is recommended.
  12. Active debug: This will create step-by-step images in the debug folder.
  13. Click on the "Generate" button.

๐Ÿ“– Behind the scenes

This extension operates in several stages to improve the quality of Wav2Lip-generated videos:

  1. Generate a Wav2lip video: The script first generates a low-quality Wav2Lip video using the input video and audio.
  2. Mask Creation: The script creates a mask around the mouth and tries to keep other facial motions like those of the cheeks and chin.
  3. Video Quality Enhancement: It takes the low-quality Wav2Lip video and overlays the low-quality mouth onto the high-quality original video guided by the mouth mask.
  4. Face Enhancer: The script then sends the original image with the low-quality mouth on face_enhancer tool of stable diffusion to generate a high-quality mouth image.
  5. Video Generation: The script then takes the high-quality mouth image and overlays it onto the original image guided by the mouth mask.
  6. Video Post Processing: The script then uses the ffmpeg tool to generate the final video.

๐Ÿ’ช Quality tips

  • Use a high quality video as input
  • Utilize a video with a consistent frame rate. Occasionally, videos may exhibit unusual playback frame rates (not the standard 24, 25, 30, 60), which can lead to issues with the face mask.
  • Use a high quality audio file as input, without background noise or music. Clean audio with a tool like https://podcast.adobe.com/enhance.
  • Try to minimize the grain on the face on the input as much as possible. For example, you can use the "Restore faces" feature in img2img before using an image as input for Wav2Lip.
  • Dilate the mouth mask. This will help the model retain some facial motion and hide the original mouth.
  • Mask Blur maximum twice the value of Mouth Mask Dilate. If you want to increase the blur, increase the value of Mouth Mask Dilate otherwise the mouth will be blurred and the underlying mouth could be visible.
  • Upscaling can be good for improving result, particularly around the mouth area. However, it will extend the processing duration. Use this tutorial from Olivio Sarikas to upscale your video: https://www.youtube.com/watch?v=3z4MKUqFEUk. Ensure the denoising strength is set between 0.0 and 0.05, select the 'revAnimated' model, and use the batch mode.
  • Ensure there is a face on each frame of the video. If the face is not detected, process will stop.

โš  Noted Constraints

  • The model may struggle with beards.
  • If the initial phase is excessively lengthy, consider using the "resize factor" to decrease the video's dimensions.
  • While there's no strict size limit for videos, larger videos will require more processing time. It's advisable to employ the "resize factor" to minimize the video size and then upscale the video once processing is complete.

๐Ÿ“ To do

  • Add Suno/Bark to generate text to speech audio as wav file input (see bark) and Add a way to generate a high quality speech audio file from a text input
  • Possibility to resume a video generation
  • Will be renamed to "Wav2Lip Studio" in Automatic1111
  • Add more examples and tutorials
  • Convert avi to mp4. Avi is not show in video input but process work fine

๐Ÿ˜Ž Contributing

We welcome contributions to this project. When submitting pull requests, please provide a detailed description of the changes. see CONTRIBUTING for more information.

๐Ÿ™ Appreciation

๐Ÿ“ Citation

If you use this project in your own work, in articles, tutorials, or presentations, we encourage you to cite this project to acknowledge the efforts put into it.

To cite this project, please use the following BibTeX format:

@misc{wav2lip_uhq,
  author = {numz},
  title = {Wav2Lip UHQ},
  year = {2023},
  howpublished = {GitHub repository},
  publisher = {numz},
  url = {https://github.com/numz/sd-wav2lip-uhq}
}

๐Ÿ“œ License

  • The code in this repository is released under the MIT license as found in the LICENSE file.

sd-wav2lip-uhq's People

Contributors

numz avatar vincentqyw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.