Giter Club home page Giter Club logo

train_your_own_sora's Introduction

Latte Text to Video Training

Latte is by far the closest to SORA among the open-source video generation models.

Original Latte didn't provide text to video training code. We reproduced the paper and implemented the text to video training based on the paper.

Please find out more details from the paper:

Latte: Latent Diffusion Transformer for Video Generation

The architecture of Latte

Improments

The following improvements are implemented to the training code:

  • added the support of gradient accumulation (config: gradient_accumulation_steps)
  • added valiation samples generation to generate (config: validation) testing videos in the training process
  • added wandb support
  • added classifier-free guidance training (config: cfg_random_null_text_ratio)

Step 1: setup the environment

First, download and set up the repo:

git clone https://github.com/lyogavin/Latte_t2v_training.git
conda env create -f environment.yml
conda activate latte

If you find it too complicated to setup the environment and solve all the package versions, cuda drivers, etc, you can try our vast.ai template here.

Step 2: download pretrained model

You can download the pretrained model as follows:

sudo apt-get install git-lfs # or: sudo yum install git-lfs
git lfs install

git clone --depth=1 --no-single-branch  https://huggingface.co/maxin-cn/Latte /root/pretrained_Latte/

Step 4: prepare training data

Put video files in a directory and create a csv file to specify the prompt for each video.

The csv file format:

video_file_name prompt
VIDEO_FILE_001.mp4 PROMPT_001
VIDEO_FILE_002.mp4 PROMPT_002
... ...

Step 5: config

Config is in configs/t2v/t2v_img_train.yaml and it's pretty self-explanotary.

A few config entries to note:

  • point video_folder and csv_path to the path of training data
  • point pretrained_model_path to the t2v_required_models directory of downloaded model.
  • point pretrained to the t2v.pt file in the downloaded model
  • You can change text_prompt under validation section to the testing validation prompts. During the training process every ckpt_every steps, it'll test generating videos based on the prompts and publish to wandb for you to checkout.

Step 6: train!

./run_img_t2v_train.sh

Cloud GPUs

We recommend vast.ai GPUs for training.

We find it pretty good, low price, good network speed, wide range of GPUs to choose. Everything professionally optimized for AI training.

Feel free to use our template here where the environment is all ready to use.

Inference

Reference original repo for how to infer.

Stay Connected with Us

Wechat public account

group

Wechat group

group

Discord

Discord

Tech Blog

Website

Little RedBook

redbook

Contribution

Buy me a coffee please! ๐Ÿ™

"Buy Me A Coffee"

By: Anima AI

aiwrite

train_your_own_sora's People

Contributors

lyogavin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.