Giter Club home page Giter Club logo

wav2lip_redefined's Introduction

About this Repository:

NOTE: For lipsynced video results, scroll below

To Run The Model:


1. Specify the file paths, and make a new conda environment with Python = 3.6

2. Now Install the necessary Liabraries from requirements.txt, and make a media folder, inside of which add the video and audio

3. Simply Run "Python main.py" and your Final video will be ready

Objectives Achieved:

  1. Visual and Audio Quality Lip Sync: The project successfully lip-syncs videos with improved visual and audio quality, ensuring that the lip movements accurately match the spoken words.
  2. Robustness for Any Video: Unlike the original Wav2Lip model, the developed AI can handle videos with or without a face in each frame, making it more versatile and error-free.
  3. Support for Longer Videos: The model overcomes the limitations of the original Wav2Lip GAN model, now effectively lip-syncing longer videos exceeding 1 minute in duration.
  4. Particular Segments Can be extracted easily, unlike the original Model, any part with or without face can be extracted, with te desired audio combined.

Metrics: I haven't trained the model of any further dataset, so the Metrics is the same for the model

  1. Average Mean Squared Error = 5.050382572478908
  2. Average Peak Signal to Noise Ratio = 40.32044758489997

Challenges:

1. Since, my aim was to extract any particulkar segment, I had to be very concious around timestamps
2. Wav2Lip also doesn't have a mechanism to make a distinction between the target speaker and other faces that appears in the video.
3. The Model does not perform good for high resolution videos
4. Long runtimes for longer videos.


Results include LipSync Videos for:

  1. Hindi Voice-Over on English Video.
  2. Long Videos with some No-face or Other than Target speaker face with a lot of head and hands movement & Telugu to Hindi Translation Voice-Over synced
    results can be reproduced using the colab notebook or can be accessed at this google drive for reference: https://drive.google.com/file/d/1zjxMi1p3S9SL9UuatoC-2RgWbepyZWUY/view?usp=sharing

NOTE:

  1. Instead of rendering the whole video at once, my approach breaks it into small pieces, which makes the process faster.
  2. I had a different approach of this, rather than just skipping the frames, with no face(Recommende), I wrote the code for extracting the videos with faces and without faces, and individually rendering them. Through this, I didn't made any changes int the internal filing of the model.
  3. All my work is done in main.py

wav2lip_redefined's People

Contributors

arjundevsingla avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.