Giter Club home page Giter Club logo

controlvideo's Introduction

ControlVideo

Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"

arXiv visitors Replicate


ControlVideo adapts ControlNet to the video counterpart without any finetuning, aiming to directly inherit its high-quality and consistent generation

News

Setup

1. Download Weights

All pre-trained weights are downloaded to checkpoints/ directory, including the pre-trained weights of Stable Diffusion v1.5, ControlNet conditioned on canny edges, depth maps, human poses. The flownet.pkl is the weights of RIFE. The final file tree likes:

checkpoints
├── stable-diffusion-v1-5
├── sd-controlnet-canny
├── sd-controlnet-depth
├── sd-controlnet-openpose
├── flownet.pkl

2. Requirements

conda create -n controlvideo python=3.10
conda activate controlvideo
pip install -r requirements.txt

xformers is recommended to save memory and running time.

Inference

To perform text-to-video generation, just run this command in inference.sh:

python inference.py \
    --prompt "A striking mallard floats effortlessly on the sparkling pond." \
    --condition "depth" \
    --video_path "data/mallard-water.mp4" \
    --output_path "outputs/" \
    --video_length 15 \
    --smoother_steps 19 20 \
    --width 512 \
    --height 512 \
    # --is_long_video

where --video_length is the length of synthesized video, --condition represents the type of structure sequence, --smoother_steps determines at which timesteps to perform smoothing, and --is_long_video denotes whether to enable efficient long-video synthesis.

Visualizations

ControlVideo on depth maps

"A charming flamingo gracefully wanders in the calm and serene water, its delicate neck curving into an elegant shape." "A striking mallard floats effortlessly on the sparkling pond." "A gigantic yellow jeep slowly turns on a wide, smooth road in the city."
"A sleek boat glides effortlessly through the shimmering river, van gogh style." "A majestic sailing boat cruises along the vast, azure sea." "A contented cow ambles across the dewy, verdant pasture."

ControlVideo on canny edges

"A young man riding a sleek, black motorbike through the winding mountain roads." "A white swan movingon the lake, cartoon style." "A dusty old jeep was making its way down the winding forest road, creaking and groaning with each bump and turn."
"A shiny red jeep smoothly turns on a narrow, winding road in the mountains." "A majestic camel gracefully strides across the scorching desert sands." "A fit man is leisurely hiking through a lush and verdant forest."

ControlVideo on human poses

"James bond moonwalk on the beach, animation style." "Goku in a mountain range, surreal style." "Hulk is jumping on the street, cartoon style." "A robot dances on a road, animation style."

Long video generation

"A steamship on the ocean, at sunset, sketch style." "Hulk is dancing on the beach, cartoon style."

Citation

If you make use of our work, please cite our paper.

@article{zhang2023controlvideo,
  title={ControlVideo: Training-free Controllable Text-to-Video Generation},
  author={Zhang, Yabo and Wei, Yuxiang and Jiang, Dongsheng and Zhang, Xiaopeng and Zuo, Wangmeng and Tian, Qi},
  journal={arXiv preprint arXiv:2305.13077},
  year={2023}
}

Acknowledgement

This work repository borrows heavily from Diffusers, ControlNet, Tune-A-Video, and RIFE.

There are also many interesting works on video generation: Tune-A-Video, Text2Video-Zero, Follow-Your-Pose, Control-A-Video, et al.

controlvideo's People

Contributors

chenxwh avatar d8ahazard avatar hackeranonymousdeepweb avatar ybybzhang avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.