This repository is the official implementation of LAMP
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Ruiqi Wu, Linagyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang
( * indicates corresponding author)
[Arxiv Paper] [Website Page] [Google Drive] [Baidu Disk (pwd: ffsp)] [Colab Notebook]
🚀 LAMP is a few-shot-based method for text-to-video generation. You only need 8~16 videos 1 GPU (> 15 GB VRAM) for training!! Then you can generate videos with learned motion pattern.
- [2023/11/02] The Colab demo is released! Thanks for the PR of @ShashwatNigam99.
- [2023/10/21] We add Google Drive link about our checkpoints and training data.
- [2023/10/17] We release our checkpoints and Arxiv paper.
- [2023/10/16] Our code is publicly available.
- Ubuntu > 18.04
- CUDA=11.3
- Others:
# clone the repo
git clone https://github.com/RQ-Wu/LAMP.git
cd LAMP
# create virtual environment
conda create -n LAMP python=3.8
conda activate LAMP
# install packages
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install xformers==0.0.13
-
You can download pre-trained T2I diffusion models on Hugging Face. In our work, we use Stable Diffusion v1.4 as our backbone network. Clone the pretrained weights by
git-lfs
and put them in./checkpoints
-
Our checkpoint and training data are listed as followed, you can also collected video data by your own (Suggest websites: pexels, frozen-in-time) and put .mp4 files in
./training_videos/[motion_name]/
Motion Name | Checkpoint Link | Training data |
Birds fly | Baidu Disk (pwd: jj0o) | Baidu Disk (pwd: w96b) |
Firework | Baidu Disk (pwd: wj1p) | Baidu Disk (pwd: oamp) |
Helicopter | Baidu Disk (pwd: egpe) | Baidu Disk (pwd: t4ba) |
Horse run | Baidu Disk (pwd: 19ld) | Baidu Disk (pwd: mte7) |
Play the guitar | Baidu Disk (pwd: l4dw) | Baidu Disk (pwd: js26) |
Rain | Baidu Disk (pwd: jomu) | Baidu Disk (pwd: 31ug) |
Turn to smile | Baidu Disk (pwd: 2bkl) | Baidu Disk (pwd: l984) |
Waterfall | Baidu Disk (pwd: vpkk) | Baidu Disk (pwd: 2edp) |
All | Baidu Disk (pwd: ifsm) | Baidu Disk (pwd: 2i2k) |
CUDA_VISIBLE_DEVICES=X accelerate launch train_lamp.py config="configs/XXX.yaml"
Here is an example command for inference
python inference_script.py \
--weight ./my_weight/turn_to_smile/unet \
--pretrain_weight ./checkpoints/stable-diffusion-v1-4 \
--first_frame_path ./benchmark/head_photo_of_a_cute_girl,_comic_style.png \
--prompt "head photo of a cute girl, comic style, turns to smile" \
# default prompt is same to the image's filename
# [Other optional configs...]
# --output results/ \
# --height 320 \
#--width 512 \
#--length 16 \
#--cfg 12.5 \
Origin Videos | Editing Result-1 | Editing Result-2 |
A girl in black runs on the road. | A man runs on the road. | |
A man is dancing. | A girl in white is dancing. |
If you find our repo useful for your research, please cite us:
@artical{wu2023lamp,
title={LAMP: Learn a Motion Pattern by Few-Shot Tuning a Text-to-Image Diffusion Model},
author={Wu, Ruiqi and Chen, Liangyu and Yang, Tong and Guo, Chunle and Li, Chongyi and Zhang, Xiangyu},
journal={arXiv preprint arXiv:2310.10769},
year={2023}
}
Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.
This repository is maintained by Ruiqi Wu. The code is built based on Tune-A-Video. Thanks for the excellent open-source code!!