Giter Club home page Giter Club logo

lamp's Introduction

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

Python 3.8 pytorch 1.12.0

This repository is the official implementation of LAMP

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Ruiqi Wu, Linagyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang
( * indicates corresponding author)

[Arxiv Paper]  [Website Page]  [Google Drive]  [Baidu Disk (pwd: ffsp)]  [Colab Notebookmethod 

🚀 LAMP is a few-shot-based method for text-to-video generation. You only need 8~16 videos 1 GPU (> 15 GB VRAM) for training!! Then you can generate videos with learned motion pattern.

News

  • [2023/11/02] The Colab demo is released! Thanks for the PR of @ShashwatNigam99.
  • [2023/10/21] We add Google Drive link about our checkpoints and training data.
  • [2023/10/17] We release our checkpoints and Arxiv paper.
  • [2023/10/16] Our code is publicly available.

Preparation

Dependencies and Installation

  • Ubuntu > 18.04
  • CUDA=11.3
  • Others:
# clone the repo
git clone https://github.com/RQ-Wu/LAMP.git
cd LAMP

# create virtual environment
conda create -n LAMP python=3.8
conda activate LAMP

# install packages
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install xformers==0.0.13

Weights and Data

  1. You can download pre-trained T2I diffusion models on Hugging Face. In our work, we use Stable Diffusion v1.4 as our backbone network. Clone the pretrained weights by git-lfs and put them in ./checkpoints

  2. Our checkpoint and training data are listed as followed, you can also collected video data by your own (Suggest websites: pexels, frozen-in-time) and put .mp4 files in ./training_videos/[motion_name]/

Motion Name Checkpoint Link Training data
Birds fly Baidu Disk (pwd: jj0o) Baidu Disk (pwd: w96b)
Firework Baidu Disk (pwd: wj1p) Baidu Disk (pwd: oamp)
Helicopter Baidu Disk (pwd: egpe) Baidu Disk (pwd: t4ba)
Horse run Baidu Disk (pwd: 19ld) Baidu Disk (pwd: mte7)
Play the guitar Baidu Disk (pwd: l4dw) Baidu Disk (pwd: js26)
Rain Baidu Disk (pwd: jomu) Baidu Disk (pwd: 31ug)
Turn to smile Baidu Disk (pwd: 2bkl) Baidu Disk (pwd: l984)
Waterfall Baidu Disk (pwd: vpkk) Baidu Disk (pwd: 2edp)
All Baidu Disk (pwd: ifsm) Baidu Disk (pwd: 2i2k)

Get Started

1. Training to learn a motion pattern

CUDA_VISIBLE_DEVICES=X accelerate launch train_lamp.py config="configs/XXX.yaml"

2. Inference

Here is an example command for inference

python inference_script.py \
    --weight ./my_weight/turn_to_smile/unet \
    --pretrain_weight ./checkpoints/stable-diffusion-v1-4 \
    --first_frame_path ./benchmark/head_photo_of_a_cute_girl,_comic_style.png \
    --prompt "head photo of a cute girl, comic style, turns to smile" \
    # default prompt is same to the image's filename
    # [Other optional configs...]
    # --output results/ \ 
    # --height 320 \ 
    #--width 512 \
    #--length 16 \
    #--cfg 12.5 \

Visual Examples

Few-Shot-Based Text-to-Video Generation

Horse run
A horse runs in the universe. A horse runs on the Mars. A horse runs on the road.
Firework
Fireworks in desert night. Fireworks over the mountains. Fireworks in the night city.
Play the guitar
GTA5 poster, a man plays the guitar. A woman plays the guitar. An astronaut plays the guitar, photorealistic.
Birds fly
Birds fly in the pink sky. Birds fly in the sky, over the sea. Many Birds fly over a plaza.

Video Editing

Origin Videos Editing Result-1 Editing Result-2
A girl in black runs on the road. A man runs on the road.
A man is dancing. A girl in white is dancing.

Citation

If you find our repo useful for your research, please cite us:

@artical{wu2023lamp,
    title={LAMP: Learn a Motion Pattern by Few-Shot Tuning a Text-to-Image Diffusion Model},
    author={Wu, Ruiqi and Chen, Liangyu and Yang, Tong and Guo, Chunle and Li, Chongyi and Zhang, Xiangyu},
    journal={arXiv preprint arXiv:2310.10769},
    year={2023}
}

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Ruiqi Wu. The code is built based on Tune-A-Video. Thanks for the excellent open-source code!!

lamp's People

Contributors

rq-wu avatar anonymous-3917 avatar guspan-tanadi avatar shashwatnigam99 avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.