Giter Club home page Giter Club logo

langflip's Introduction

LangFlip: Advanced Video Translation System

Overview

LangFlip stands at the forefront of video translation technology, designed to automate and refine the process of translating spoken content in videos into various languages. This project leverages cutting-edge AI models and custom algorithms to transcribe, translate, and dub video content, aiming to deliver a seamless viewing experience in multiple languages with accurate subtitles and natural-sounding dubbed audio.

Core Features

  • Automated Speech Recognition (ASR): Utilizes the Whisper API for extracting and transcribing spoken content from videos with high accuracy, catering to diverse accents and dialects.
  • Translation Engine: Integrates with the LibreTranslate API for translating transcribed text into multiple target languages, emphasizing contextual accuracy and nuance preservation.
  • Dubbing and Subtitling: Employs a combination of TTS (Text-to-Speech) technologies, including the OpenVoice API, for generating natural-sounding voiceovers in the target language, alongside generating synchronized subtitles.
  • Lip-syncing Technology: Implements advanced deep learning algorithms to analyze and align the facial movements in the video with the new voiceovers, ensuring the speakers' lip movements match the spoken dialogue in the target language. This feature leverages face-alignment and custom lip-sync models to provide a cohesive and immersive viewing experience by minimizing audio-visual discrepancies.

License

This project is licensed under the GNU General Public License (GPL), version 3. For more details, see the LICENSE file in the root directory of this source code or visit GPLv3.

The GPL license ensures that the software and all derived works are free and open-source. It guarantees that users have the freedom to run, study, share, and modify the software.

Architecture Considerations

  • Modular Design: Structured to support plug-and-play functionality for different ASR, translation, and TTS services, allowing easy integration of alternative technologies.
  • Scalability: Designed to efficiently handle video files of varying sizes and lengths, from short clips to full-length films, through optimized processing pipelines.
  • Quality Assurance: Incorporates automated and manual quality checks to ensure translation accuracy, proper timing of subtitles, and naturalness of dubbed audio.

Project Setup

Prerequisites

  • Python 3.10
  • FFmpeg
  • Conda (steps to install it described below)

Setting Up the Environment

Right now the only way to setup langflip is by using Conda. We are working on a poetry and a Python venv setup.

1. Install Conda

If you haven't installed Conda, download and install it from Miniconda or Anaconda.

2. Download the project

git clone https://github.com/FlipSuite/LangFlip.git
cd LangFlip
pip install -r requirements.txt

Create and Activate the Conda Environment

Navigate to the LangFlip project directory and run the following command to create a Conda environment from the env.yml file:

conda env create -f env.yml
conda activate langflip
pip install -e .

Technical Deep Dive

Video Processing Workflow

Audio Extraction: Utilizes the spleeter library, a state-of-the-art tool for music and speech separation, to extract clean audio tracks from videos. This separation allows for precise manipulation of speech content without background noise interference, essential for accurate speech recognition and dubbing.

Speech Recognition: Implements the Whisper API for converting spoken content in the extracted audio into text. Whisper is renowned for its robustness across various languages and dialects, offering high accuracy in transcribing diverse accents and noisy environments.

Translation: Leverages the LibreTranslate API for translating the transcribed text into multiple target languages. LibreTranslate is chosen for its emphasis on privacy, open-source nature, and support for a wide range of languages, ensuring translations maintain contextual accuracy and cultural nuance.

Text Processing (Pre-Translation): Employs custom Python scripts and natural language processing (NLP) libraries, such as NLTK and spaCy, for cleaning and preparing the transcribed text for translation. This step includes punctuation normalization, spell checking, and splitting text into manageable segments for more accurate translation.

Text Processing (Post-Translation): Utilizes the same NLP libraries to refine the translated text, ensuring that it aligns with the target language's grammatical rules and stylistic nuances. This process includes adjusting sentence structure, verb tenses, and idiomatic expressions to enhance readability and naturalness in the target language.

Dubbing: Engages the OpenVoice API for generating natural-sounding voiceovers in the target language. OpenVoice offers a diverse range of voice models, allowing for the selection of voices that best match the original speaker's tone, pitch, and emotion, ensuring a seamless audio-visual experience.

Subtitles Generation: Integrates Subtitle Edit or similar open-source subtitle creation tools for generating and synchronizing subtitles with the video. This involves timing the translated text with the video's frames, ensuring subtitles appear on-screen in sync with the corresponding spoken dialogue.

Video Preprocessing: Uses ffmpeg for initial video processing tasks, such as format conversion, resolution adjustment, and extracting raw video streams. This step prepares the video for detailed processing and ensures compatibility with subsequent workflow stages.

Video Postprocessing: Combines processed audio tracks with the original video while ensuring lip-sync accuracy using custom deep learning models and the face-alignment library. Final adjustments, including color correction and audio normalization, are performed using ffmpeg, culminating in a professionally dubbed video ready for distribution.

langflip's People

Contributors

remymenard avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.