g-u-n / animatelcm Goto Github PK

View Code? Open in Web Editor NEW

452.0 25.0 37.0 65.12 MB

AnimateLCM: Let's Accelerate the Video Generation within 4 Steps!

Home Page: https://animatelcm.github.io

License: MIT License

Python 100.00%

consistency-models fast-sampling video-generation animatelcm video deep-learning

animatelcm's Introduction

⚡️AnimateLCM: Accelerating the Animation of Your Personalized Models and Adapters through Decoupled Consistency Learning

[Paper] [Project Page ✨] [Demo in 🤗Hugging Face] [Pre-trained Models] [Civitai]

by Fu-Yun Wang, Zhaoyang Huang📮, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li📮

If you use any components of our work, please cite it.

@article{wang2024animatelcm,
  title={AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning},
  author={Wang, Fu-Yun and Huang, Zhaoyang and Shi, Xiaoyu and Bian, Weikang and Song, Guanglu and Liu, Yu and Li, Hongsheng},
  journal={arXiv preprint arXiv:2402.00769},
  year={2024}
}

Here is a screen recording of usage. Prompt:"river reflecting mountain"

Introduction

Consistency model is a promising new family of generative models for fast yet high-quality generation, proposed by Professor Yang Song.

Animate-LCM is a pioneer work and exploratory on fast animation generation following the consistency models, being able to generate animations in good quality with 4 inference steps.

It relies on the decoupled learning paradigm, firstly learning image generation prior and then learning the temporal generation prior for fast sampling, greatly boosting the training efficiency.

The High-level workflow of AnimateLCM can be

Demos

We have launched lots of demo videos generated by Animate-LCM on the Project Page. Generally speaking, AnimateLCM works for fast, text-to-video, control-to-video, image-to-video, video-to-video stylization, and longer video generation.

Models

So far, we have released three models for usage

Animate-LCM-T2V: A spatial LoRA weight and a motion module for personalized video generation. Some trying from the community point out that the motion module is also compatible with many personalized models tuned for LCM, for example Dreamshaper-LCM.
AnimateLCM-SVD-xt. I provide AnimateLCM-SVD-xt and AnimateLCM-SVD-xt 1.1, which are tuned from SVD-xt and SVD-xt 1.1 respectively. They work for high-resolution image animation with 25 frames with 1~8 steps. You can try it with the Hugging Face Demo. Thanks to the Hugging Face team for providing the GPU grants.
AnimateLCM-I2V. A spatial LoRA weight and a motion module with an additional image encoder for personalized image animation. It is our trying to directly train an image animation model for fast sampling without any teacher models. It can generate animations with a personalized image with 2~4 steps. Yet due to the training resources is very limited, it is not as stable as I would like (Just like most I2V models built on Stable-Diffusion-v1-5, they generally not very stable for generation).

Install & Usage Instruction

We split the animatelcm_sd15 and animatelcm_svd into two folders. They are based on different environments. Please refer to README_animatelcm_sd15 and README_animatelcm_svd for instructions.

Usage Tips

AnimateLCM-T2V:
- 4 steps can generally work well. For better quality, apply 6~8 inference steps to improve the generation quality.
- CFG scale should be set between 1~2. Set CFG=1 can reduce the sampling cost by half. However, generally, I would prefer using CFG 1.5 and setting proper negative prompts for sampling to achieve better quality.
- Set the video length to 16 frames for sampling. This is the length that the model trained with.
- The models should work with IP-Adapter, ControlNet, and lots of adapters tuned for Stable Diffusion in a zero-shot manner. If you hope for better results of combination, you can try to tune them together by applying the teacher-free adaptation script I provide. It will not corrupt the sampling speed.
AnimateLCM-I2V:
- 2-4 steps should work for personalized image animation.
- In most cases, the model does not need CFG values. Just set the CFG=1 to reduce inference cost.
- I additionally set a motion scale hyper-parameter. Set it to 0.8 as the default choice. If you set it to 0.0, you should always obtain static animations. You can increase the motion scale for larger motions, but that will sometimes cause generation failure.
- The typical workflow can be:
  - Using your personalized image models to generate an image with good quality.
  - Applying the generated image as input and reusing the same prompt for image animation.
  - You can even further apply AnimateLCM-T2V to refine the final motion quality.
AnimateLCM-SVD:
- 1-4 steps should work.
- SVD requires two CFG values. CFG_min and CFG_max. By default, CFG_min is set to 1. Slightly adjusting CFG_max between [1, 1.5] will obtain good results. Again, just setting it to 1 to reduce the inference cost.
- For other hyper-parameters of AnimateLCM-SVD-xt, please just follow the original SVD design.

Related Notes

🎉 Tutorial video of AnimateLCM on ComfyUI: Tutorial Video
🎉 ComfyUI for AnimateLCM: AnimateLCM-ComfyUI & ComfyUI-Reddit

Comparison

Screen recording of AnimateLCM-T2V. Prompt: "dog with sunglasses".

Contact & Collaboration

I am open to collaboration, but not to a full-time intern. If you find some of my work interesting and hope for collaboration/discussion in any format, please do not hesitate to contact me.

📧 Email: [email protected]

Acknowledge

I would thank AK for broadcasting our work and the hugging face team for providing help in building the gradio demo and storing the models. Would thank the Dhruv Nair for providing help in diffusers.

animatelcm's People

Contributors

Stargazers

Watchers

animatelcm's Issues

Not working on M1

Code:

import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

# Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Error:

Traceback (most recent call last):
  File "p.py", line 16, in <module>
    frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]
  File "/Users/yuki/anaconda3/envs/ai/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/yuki/anaconda3/envs/ai/lib/python3.8/site-packages/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py", line 441, in __call__
    image_embeddings = self._encode_image(image, device, num_videos_per_prompt, self.do_classifier_free_guidance)
  File "/Users/yuki/anaconda3/envs/ai/lib/python3.8/site-packages/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py", line 168, in _encode_image
    image = image.to(device=device, dtype=dtype)
  File "/Users/yuki/anaconda3/envs/ai/lib/python3.8/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

SDXL

Thanks for great work. Are you planning to shift to SDXL from SD1.5, will it take much effort?

How did you train the model ?

I find this model far superior than it peers.
How did you make it like that ?

Distillation of Video Diffusion Based Model

Hi, Firstly, Great work.

I have a question regarding the distillation of video diffusion model. Did you used the DDIM sampler while distilling from the video based diffusion model and did the training used skip timesteps while training the online consistency distillation model?

Also, how many optimization steps did the training involved for generating good results with the distilled model?

Thanks for the help.

will the code be released？

Teacher-Free Adaptation is Latent Consistency Fine-tuning (LCF)?

Great works~

I am confused about 'Teacher-Free Adaptation'. Does it mean Latent Consistency Fine-tuning (LCF)? Directly select two time steps and get nosied data z_{t} and z_{t-1}, and then directly calculate the consistency loss for these two time steps to enforce self-consistency property as LCF in LCM paper?

So the training procedure is ：

Train base image diffusion model, using Latent Consistency Distillation and image data.
Fix the lcm image diffusion weight and add a trainable temporal layer, using Latent Consistency Distillation and new Initialization strategy and video data.
Add other controls (controlnet or IPadpter), using Latent Consistency Fine-tuning and the data about control conditions.

Do I have an exact understanding?

Training or Finetuning code

Great work, Please will you release the training code. I will like to know when

When will the code be released?

Nice work! And when will the code be released?

How to replicate the i2v results?

I made modifications to the code https://huggingface.co/spaces/wangfuyun/AnimateLCM based on the animatediff i2v process at https://github.com/talesofai/AnimateDiff/blob/04b2715b39d4a02334b08cb6ee3dfe79f0a6cd7c/animatediff/pipelines/pipeline_animation.py#L288, but it is difficult for me to achieve the same results as shown on the project homepage. Is there a better way to implement it?

How to use AnimateLCM image-to-video models.

The diffusers code?

usable in sd-webui?

Hello,
Could you help? I miss something in my sd-webui (1.6.1) setup:

Any idea?

is this project similar to videolcm?

https://arxiv.org/abs/2312.09109

Image to Video generation

Hello devs!

I would like to know if it's possible to start from an image, instead of generating the image. I have many nice images I would like to animate with this. I've skimmed the code and I couldn't find an easy way.

Thanks for the answer!
Kind regard,
Timon Käch

KeyError in webui

model_channels = state_dict['{}input_blocks.0.0.weight'.format(key_prefix)].shape[0]
KeyError: 'model.diffusion_model.input_blocks.0.0.weight'

License

Would it be possible to get a license added to the repo? My company would love to use the project but without a license it's not possible to do so.

Tutorial for I2V?

UCF101 evaluation details

Dear authors,

Thank you for the great work!

I want to seek some clarification on the evaluation details described in Section 5.1 of your paper, particularly concerning the resolution of the snippets generated for the UCF101 dataset analysis. In the section, it's mentioned that the snippets are generated at a resolution of 512x512. However, considering that the original UCF101 videos are at a resolution of 320x240 and the I3D classifier is trained on 224x224 resolution inputs.

Could you kindly provide further insight into the rationale behind selecting a 512x512 resolution for the snippets in this context?

Thank you in advance!

Regards,
Yuanhao

Possible Typo in the Paper

Hi, Thanks for the interesting paper.

I think the part shown in equation (10) should be c. If x is correct, could you tell me why?

How to generate long video?

Dear AnimateLCM team,

Thank you for your great work, I really like it.

Could you tell me how to generate a long video (>10s) as you show on the readme page? I try to increate the num_frames from 16 to 32, the results degrade a lot.

output = pipe(
    prompt="a young woman walking on street, 4k, high resolution",
    negative_prompt="bad quality, worse quality, low resolution",
    num_frames=32, #16
    guidance_scale=2.0,
    num_inference_steps=6,
    generator=torch.Generator("cpu").manual_seed(0),
)
frames = output.frames[0]
export_to_gif(frames, "animatelcm.gif")

Thank you for your help.

Best Wishes,
Zongze

import error

from diffusers import AnimateDiffPipeline, LCMScheduler, MotionAdapter
ImportError: cannot import name 'AnimateDiffPipeline' from 'diffusers' (/root/miniconda3/lib/python3.7/site-packages/diffusers/init.py)

diffusers version is 0.21.4