Giter Club home page Giter Club logo

vid2vid-zero's Introduction

We propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos.

Highlights

  • Video editing with off-the-shelf image diffusion models.

  • No training on any video.

  • Promising results in editing attributes, subjects, places, etc., in real-world videos.

News

  • [2023.4.12] Online Gradio Demo is available here.
  • [2023.4.11] Add Gradio Demo (runs in local).
  • [2023.4.9] Code released!

Installation

Requirements

pip install -r requirements.txt

Installing xformers is highly recommended for improved efficiency and speed on GPUs.

Weights

[Stable Diffusion] Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The pre-trained Stable Diffusion models can be downloaded from 🤗 Hugging Face (e.g., Stable Diffusion v1-4, v2-1). We use Stable Diffusion v1-4 by default.

Zero-shot testing

Simply run:

accelerate launch test_vid2vid_zero.py --config path/to/config

For example:

accelerate launch test_vid2vid_zero.py --config configs/car-moving.yaml

Gradio Demo

Launch the local demo built with gradio:

python app.py

Or you can use our online gradio demo here.

Note that we disable Null-text Inversion and enable fp16 for faster demo response.

Examples

Input Video Output Video Input Video Output Video
"A car is moving on the road" "A Porsche car is moving on the desert" "A car is moving on the road" "A jeep car is moving on the snow"
"A man is running" "Stephen Curry is running in Time Square" "A man is running" "A man is running in New York City"
"A child is riding a bike on the road" "a child is riding a bike on the flooded road" "A child is riding a bike on the road" "a lego child is riding a bike on the road.gif"
"A car is moving on the road" "A car is moving on the snow" "A car is moving on the road" "A jeep car is moving on the desert"

Citation

@article{vid2vid-zero,
  title={Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models},
  author={Wang, Wen and Xie, kangyang and Liu, Zide and Chen, Hao and Cao, Yue and Wang, Xinlong and Shen, Chunhua},
  journal={arXiv preprint arXiv:2303.17599},
  year={2023}
}

Acknowledgement

Tune-A-Video, diffusers, prompt-to-prompt.

Contact

We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, visual perception and multimodal learning, please contact Xinlong Wang ([email protected]) and Yue Cao ([email protected]).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.