myniuuu / mofa-video Goto Github PK

[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.

Home Page: https://myniuuu.github.io/MOFA_Video

License: Other

Shell 2.24% Python 97.74% CSS 0.02%

aigc diffusion-models eccv2024 generative-ai generative-models video-diffusion-model controllable-generation image2video

mofa-video's Introduction

🦄️ MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model (ECCV 2024)

Muyao Niu ^1,2 Xiaodong Cun^2,* Xintao Wang² Yong Zhang² Ying Shan² Yinqiang Zheng^1,*

¹ The University of Tokyo ² Tencent AI Lab ^* Corresponding Author

In European Conference on Computer Vision (ECCV) 2024

🔥🔥🔥 New Features/Updates

(2024.07.15) We have released the training code for trajectory-based image animation! Please refer to Here for more instructions.
MOFA-Video will be appeared in ECCV 2024! 🇮🇹🇮🇹🇮🇹
We have released the Gradio inference code and the checkpoints for Hybrid Controls! Please refer to Here for more instructions.
Free online demo via HuggingFace Spaces will be coming soon!
If you find this work interesting, please do not hesitate to give a ⭐!

📰 CODE RELEASE

(2024.05.31) Gradio demo and checkpoints for trajectory-based image animation
(2024.06.22) Gradio demo and checkpoints for image animation with hybrid control
(2024.07.15) Training scripts for trajectory-based image animation
Inference scripts and checkpoints for keypoint-based facial image animation
Training scripts for keypoint-based facial image animation

TL;DR

Image 🏞️ + Hybrid Controls 🕹️ = Videos 🎬🍿


Trajectory + Landmark Control


Trajectory Control


Landmark Control

Check the gallery of our project page for more visual results!

Introduction

We introduce MOFA-Video, a method designed to adapt motions from different domains to the frozen Video Diffusion Model. By employing sparse-to-dense (S2D) motion generation and flow-based motion adaptation, MOFA-Video can effectively animate a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.

During the training stage, we generate sparse control signals through sparse motion sampling and then train different MOFA-Adapters to generate video via pre-trained SVD. During the inference stage, different MOFA-Adapters can be combined to jointly control the frozen SVD.

🕹️ Image Animation with Hybrid Controls

1. Clone the Repository

git clone https://github.com/MyNiuuu/MOFA-Video.git
cd ./MOFA-Video

2. Environment Setup

The demo has been tested on CUDA version of 11.7.

cd ./MOFA-Video-Hybrid
conda create -n mofa python==3.10
conda activate mofa
pip install -r requirements.txt
pip install opencv-python-headless
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

IMPORTANT: ⚠️⚠️⚠️ Gradio Version of 4.5.0 in the requirements.txt should be strictly followed since other versions may cause errors.

3. Downloading Checkpoints

Download the checkpoint of CMP from here and put it into ./MOFA-Video-Hybrid/models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints.
Download the ckpts folder from the huggingface repo which contains necessary pretrained checkpoints and put it under ./MOFA-Video-Hybrid. You may use git lfs to download the entire ckpts folder:
1. Download git lfs from https://git-lfs.github.com. It is commonly used for cloning repositories with large model checkpoints on HuggingFace.
2. Execute git clone https://huggingface.co/MyNiuuu/MOFA-Video-Hybrid to download the complete HuggingFace repository, which currently only includes the ckpts folder.
3. Copy or move the ckpts folder to the GitHub repository.
NOTE: If you encounter the error git: 'lfs' is not a git command on Linux, you can try this solution that has worked well for my case.

Finally, the checkpoints should be orgnized as ./MOFA-Video-Hybrid/ckpt_tree.md.

4. Run Gradio Demo

Using audio to animate the facial part

cd ./MOFA-Video-Hybrid
python run_gradio_audio_driven.py

🪄🪄🪄 The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!

Using reference video to animate the facial part

cd ./MOFA-Video-Hybrid
python run_gradio_video_driven.py

🪄🪄🪄 The Gradio Interface is displayed as below. Please refer to the instructions on the gradio interface during the inference process!

💫 Trajectory-based Image Animation

Please refer to Here for instructions.

Training your own MOFA-Adapter

Please refer to Here for more instructions.

Citation

@article{niu2024mofa,
  title={MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model},
  author={Niu, Muyao and Cun, Xiaodong and Wang, Xintao and Zhang, Yong and Shan, Ying and Zheng, Yinqiang},
  journal={arXiv preprint arXiv:2405.20222},
  year={2024}
}

Acknowledgements

We sincerely appreciate the code release of the following projects: DragNUWA, SadTalker, AniPortrait, Diffusers, SVD_Xtend, Conditional-Motion-Propagation, and Unimatch.

mofa-video's People

Contributors

Stargazers

Watchers

mofa-video's Issues

Please Fix this project to work with the latest CUDA and if possible have a video tutorial. TIA

After the work arounds i managed to open gradio but i got this error:

(mofa) D:\AI\MOFA-Video\MOFA-Video-Hybrid>python run_gradio_audio_driven.py
C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\requests_init_.py:86: RequestsDependencyWarning: Unable to find acceptable character detection dependency (chardet or charset_normalizer).
warnings.warn(
start loading models...
IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

layers per block is 2
layers per block is 2
=> loading checkpoint './models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar'
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder4.8.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder1.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.0.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.2.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_encoder.features.5.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.0.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.0.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.fusion8.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder8.8.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.1.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.2.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.0.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder4.5.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.0.downsample.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.2.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.0.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.2.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder8.5.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.2.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.0.downsample.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.2.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.2.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.3.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.1.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder1.7.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.1.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.0.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.0.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.skipconv4.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.2.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.3.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder4.2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.2.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.0.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder1.4.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.1.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.0.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.0.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.3.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder8.2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_encoder.features.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.4.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.0.downsample.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.fusion4.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.3.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.2.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder2.5.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.5.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder2.2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.0.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.1.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.0.downsample.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.1.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.1.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.1.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.1.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.5.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.2.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.4.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.5.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.decoder2.8.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer1.1.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.1.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.2.bn1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer4.1.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer2.3.bn3.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.fusion2.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.4.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.0.bn2.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.flow_decoder.skipconv2.1.num_batches_tracked
caution: missing keys from checkpoint ./models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints\ckpt_iter_42000.pth.tar: module.image_encoder.layer3.3.bn2.num_batches_tracked
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 80.03it/s]
models loaded.
Running on local URL: http://127.0.0.1:9080

To create a public link, set share=True in launch().
You selected None at [170, 216] from image
[[[170, 216]]]
You selected None at [162, 265] from image
[[[170, 216], [162, 265]]]
torch.Size([1, 24, 2, 512, 512])
You selected None at [274, 231] from image
[[[170, 216], [162, 265]], [[274, 231]]]
You selected None at [284, 269] from image
[[[170, 216], [162, 265]], [[274, 231], [284, 269]]]
torch.Size([1, 24, 2, 512, 512])
torch.Size([1, 24, 2, 512, 512])
You selected None at [280, 222] from image
[[[170, 216], [162, 265]], [[280, 222]]]
You selected None at [267, 276] from image
[[[170, 216], [162, 265]], [[280, 222], [267, 276]]]
torch.Size([1, 24, 2, 512, 512])
C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\requests_init_.py:86: RequestsDependencyWarning: Unable to find acceptable character detection dependency (chardet or charset_normalizer).
warnings.warn(
using safetensor as default
load [net_G] and [net_G_ema] from ./ckpts/sad_talker\epoch_00190_iteration_000400000_checkpoint.pt
3DMM Extraction for source image
Traceback (most recent call last):
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\sadtalker_audio2pose\inference.py", line 187, in
main(args)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\sadtalker_audio2pose\inference.py", line 76, in main
first_coeff_path, crop_pic_path, crop_info = preprocess_model.generate(pic_path, first_frame_dir, args.preprocess,
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\sadtalker_audio2pose\src\utils\preprocess.py", line 103, in generate
x_full_frames, crop, quad = self.propress.crop(x_full_frames, still=True if 'ext' in crop_or_resize.lower() else False, xsize=512)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\sadtalker_audio2pose\src\utils\croper.py", line 129, in crop
lm = self.get_landmark(img_np)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\sadtalker_audio2pose\src\utils\croper.py", line 35, in get_landmark
lm = landmark_98_to_68(self.predictor.detector.get_landmarks(img)) # [0]
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\facexlib\alignment\awing_arch.py", line 373, in get_landmarks
pred = calculate_points(heatmaps).reshape(-1, 2)
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\facexlib\alignment\awing_arch.py", line 18, in calculate_points
preds = preds.astype(np.float, copy=False)
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\numpy_init_.py", line 324, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'float'.
np.float was a deprecated alias for the builtin float. To avoid this error in existing code, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'cfloat'?
Traceback (most recent call last):
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1522, in process_api
result = await self.call_function(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1144, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\utils.py", line 674, in wrapper
response = f(*args, **kwargs)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 860, in run
outputs = self.forward_sample(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 442, in forward_sample
ldmk_controlnet_flow, ldmk_pose_imgs, landmarks, num_frames = self.get_landmarks(save_root, first_frame_path, audio_path, input_first_frame[0], self.model_length, ldmk_render=ldmk_render)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 708, in get_landmarks
ldmknpy_dir = self.audio2landmark(audio_path, first_frame_path, ldmk_dir, ldmk_render)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 688, in audio2landmark
assert return_code == 0, "Errors in generating landmarks! Please trace back up for detailed error report."
AssertionError: Errors in generating landmarks! Please trace back up for detailed error report.
Traceback (most recent call last):
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1522, in process_api
result = await self.call_function(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1144, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\utils.py", line 674, in wrapper
response = f(*args, **kwargs)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 860, in run
outputs = self.forward_sample(
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 442, in forward_sample
ldmk_controlnet_flow, ldmk_pose_imgs, landmarks, num_frames = self.get_landmarks(save_root, first_frame_path, audio_path, input_first_frame[0], self.model_length, ldmk_render=ldmk_render)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 708, in get_landmarks
ldmknpy_dir = self.audio2landmark(audio_path, first_frame_path, ldmk_dir, ldmk_render)
File "D:\AI\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 688, in audio2landmark
assert return_code == 0, "Errors in generating landmarks! Please trace back up for detailed error report."
AssertionError: Errors in generating landmarks! Please trace back up for detailed error report.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 501, in process_events
response = await self.call_prediction(awake_events, batch)
File "C:\Users\Renel\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 465, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: None

花了一天时间装好了，但不会用

有没有具体的操作教程，MOFA-Video-Traj还简单些，另一个MOFA-Video-Hybrid上传了后，怎么画，尤其是后面两个，一个是轨迹，一个是landmark，谁有具体的操作教程？感谢

How to run audio-drivern workflow

Thank you for your great work.

I wanna test your audio-driven generation to make talking head,
but it only generates only 1 sec length of videos regardless of any input audios.

Is it working properly in your environments?

Click "Add Trajectory" Error

After upload image, click "Add Trajectory" get error:
Traceback (most recent call last):
File "C:\conda\envs\DiffSynthStudio\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\conda\envs\DiffSynthStudio\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\.conda\envs\DiffSynthStudio\lib\site-packages\gradio\blocks.py", line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:.conda\envs\DiffSynthStudio\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:.conda\envs\DiffSynthStudio\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "C:.conda\envs\DiffSynthStudio\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:.conda\envs\DiffSynthStudio\lib\site-packages\gradio\utils.py", line 707, in wrapper
response = f(*args, **kwargs)
File "e:\004-VideoGen\code\MOFA-Video-Traj\run_gradio.py", line 692, in add_drag
tracking_points.constructor_args['value'].append([])
AttributeError: 'State' object has no attribute 'constructor_args'

Can you give some suggestions?

about MOFA-ADapter download link？

I can't find the corresponding link, looking for Pointers
Download the checkpoint of MOFA-Adapter from huggingface to ./ckpts I

get error when pip install "git+https://github.com/facebookresearch/pytorch3d.git"

install as readme.md

reproduce step:

git clone https://github.com/MyNiuuu/MOFA-Video.git
cd MOFA-Video/
cd MOFA-Video-Hybrid/
conda create -n mofa python==3.10
conda activate mofa
pip install -r requirements.txt

Get one warning as:
WARNING: typer 0.12.3 does not provide the extra 'all'

continue next step:

pip install opencv-python-headless
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

get error:


(mofa) lq@ubuntu:~/MOFA-Video/MOFA-Video-Hybrid$ pip install "git+https://github.com/facebookresearch/pytorch3d.git"
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/facebookresearch/pytorch3d.git
  Cloning https://github.com/facebookresearch/pytorch3d.git to /tmp/pip-req-build-jgh4go7x
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/pytorch3d.git /tmp/pip-req-build-jgh4go7x
  Resolved https://github.com/facebookresearch/pytorch3d.git to commit 89653419d0973396f3eff1a381ba09a07fffc2ed
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-jgh4go7x/setup.py", line 17, in <module>
          from torch.utils.cpp_extension import CppExtension, CUDA_HOME, CUDAExtension
        File "/home/lq/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 25, in <module>
          from pkg_resources import packaging  # type: ignore[attr-defined]
      ImportError: cannot import name 'packaging' from 'pkg_resources' (/home/lq/.local/lib/python3.10/site-packages/pkg_resources/__init__.py)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(mofa) lq@ubuntu:~/MOFA-Video/MOFA-Video-Hybrid$

try to use gitee:


(mofa) lq@ubuntu:~/MOFA-Video/MOFA-Video-Hybrid$ pip install "git+https://gitee.com/yimlu/pytorch3d.git@stable"
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://gitee.com/yimlu/pytorch3d.git@stable
  Cloning https://gitee.com/yimlu/pytorch3d.git (to revision stable) to /tmp/pip-req-build-qere_0o2
  Running command git clone --filter=blob:none --quiet https://gitee.com/yimlu/pytorch3d.git /tmp/pip-req-build-qere_0o2
  Running command git checkout -q 2f11ddc5ee7d6bd56f2fb6744a16776fab6536f7
  Resolved https://gitee.com/yimlu/pytorch3d.git to commit 2f11ddc5ee7d6bd56f2fb6744a16776fab6536f7
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-qere_0o2/setup.py", line 17, in <module>
          from torch.utils.cpp_extension import CppExtension, CUDA_HOME, CUDAExtension
        File "/home/lq/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 25, in <module>
          from pkg_resources import packaging  # type: ignore[attr-defined]
      ImportError: cannot import name 'packaging' from 'pkg_resources' (/home/lq/.local/lib/python3.10/site-packages/pkg_resources/__init__.py)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(mofa) lq@ubuntu:~/MOFA-Video/MOFA-Video-Hybrid$

MOFA-Video-Traj graio demo出错

使用MOFA-Video-Traj graio demo想询问该如何解决

模组都有按照位置放好

C:\MOFA\MOFA-Video-Traj>conda activate traj

(traj) C:\MOFA\MOFA-Video-Traj>python run_gradio.py
Traceback (most recent call last):
File "C:\Users\bomchoho\AppData\Local\anaconda3\envs\traj\lib\site-packages\cupy_init_.py", line 18, in
from cupy import core # NOQA
File "C:\Users\bomchoho\AppData\Local\anaconda3\envs\traj\lib\site-packages\cupy_core_init.py", line 1, in
from cupy._core import core # NOQA
File "cupy_core\core.pyx", line 1, in init cupy.core.core
File "C:\Users\bomchoho\AppData\Local\anaconda3\envs\traj\lib\site-packages\cupy\cuda_init.py", line 8, in
from cupy.cuda import compiler # NOQA
File "C:\Users\bomchoho\AppData\Local\anaconda3\envs\traj\lib\site-packages\cupy\cuda\compiler.py", line 13, in
from cupy.cuda import device
File "cupy\cuda\device.pyx", line 1, in init cupy.cuda.device
ImportError: DLL load failed while importing runtime: 找不到指定的模組。

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\MOFA\MOFA-Video-Traj\run_gradio.py", line 655, in
DragNUWA_net = Drag("cuda:0", target_size, target_size, 25)
File "C:\MOFA\MOFA-Video-Traj\run_gradio.py", line 225, in init
self.pipeline, self.cmp = init_models(
File "C:\MOFA\MOFA-Video-Traj\run_gradio.py", line 93, in init_models
from pipeline.pipeline import FlowControlNetPipeline
File "C:\MOFA\MOFA-Video-Traj\pipeline\pipeline.py", line 9, in
from models.svdxt_featureflow_forward_controlnet_s2d_fixcmp_norefine import FlowControlNet
File "C:\MOFA\MOFA-Video-Traj\models\svdxt_featureflow_forward_controlnet_s2d_fixcmp_norefine.py", line 12, in
from models.softsplat import softsplat
File "C:\MOFA\MOFA-Video-Traj\models\softsplat.py", line 4, in
import cupy
File "C:\Users\bomchoho\AppData\Local\anaconda3\envs\traj\lib\site-packages\cupy_init_.py", line 20, in
raise ImportError(f'''
ImportError:

Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html

Original error:
ImportError: DLL load failed while importing runtime: 找不到指定的模組。

IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

(traj) C:\MOFA\MOFA-Video-Traj>conda uninstall cupy-cuda12x

PackagesNotFoundError: The following packages are missing from the target environment:

cupy-cuda12x

(traj) C:\MOFA\MOFA-Video-Traj>pip list
Package Version

accelerate 0.30.1
aiofiles 23.2.1
altair 5.3.0
annotated-types 0.7.0
ansicon 1.89.0
anyio 4.4.0
attrs 23.2.0
av 12.2.0
blessed 1.20.0
certifi 2024.6.2
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
colorlog 6.8.2
contourpy 1.2.1
cupy-cuda117 10.6.0
cycler 0.12.1
diffusers 0.24.0
dnspython 2.6.1
einops 0.8.0
email_validator 2.2.0
exceptiongroup 1.2.1
fastapi 0.111.0
fastapi-cli 0.0.4
fastrlock 0.8.2
ffmpy 0.3.2
filelock 3.15.4
fonttools 4.53.0
fsspec 2024.6.1
gpustat 1.1.1
gradio 4.5.0
gradio_client 0.7.0
h11 0.14.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.4
idna 3.7
imageio 2.34.2
importlib_metadata 8.0.0
importlib_resources 6.4.0
Jinja2 3.1.4
jinxed 1.2.1
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lazy_loader 0.4
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.0
mdurl 0.1.2
mpmath 1.3.0
networkx 3.3
numpy 1.24.4
nvidia-ml-py 12.555.43
opencv-python 4.10.0.84
opencv-python-headless 4.10.0.84
orjson 3.10.5
packaging 24.1
pandas 2.2.2
pillow 10.4.0
pip 24.0
psutil 6.0.0
pydantic 2.8.0
pydantic_core 2.20.0
pydub 0.25.1
Pygments 2.18.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
rich 13.7.1
rpds-py 0.18.1
safetensors 0.4.3
scikit-image 0.24.0
scipy 1.14.0
semantic-version 2.10.0
setuptools 69.5.1
shellingham 1.5.4
six 1.16.0
sniffio 1.3.1
starlette 0.37.2
sympy 1.12.1
tifffile 2024.6.18
tokenizers 0.19.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.0.1
torchvision 0.15.2
tqdm 4.66.4
transformers 4.41.1
typer 0.12.3
typing_extensions 4.12.2
tzdata 2024.1
ujson 5.10.0
urllib3 2.2.2
uvicorn 0.30.1
watchfiles 0.22.0
wcwidth 0.2.13
websockets 11.0.3
wheel 0.43.0
zipp 3.19.2

(traj) C:\MOFA\MOFA-Video-Traj>where nvcc
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe

(traj) C:\MOFA\MOFA-Video-Traj>pip install --upgrade cupy-cuda117
Requirement already satisfied: cupy-cuda117 in c:\users\bomchoho\appdata\local\anaconda3\envs\traj\lib\site-packages (10.6.0)
Requirement already satisfied: numpy<1.25,>=1.18 in c:\users\bomchoho\appdata\local\anaconda3\envs\traj\lib\site-packages (from cupy-cuda117) (1.24.4)
Requirement already satisfied: fastrlock>=0.5 in c:\users\bomchoho\appdata\local\anaconda3\envs\traj\lib\site-packages (from cupy-cuda117) (0.8.2)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\MOFA\MOFA-Video-Traj\run_gradio.py", line 655, in
DragNUWA_net = Drag("cuda:0", target_size, target_size, 25)
File "C:\MOFA\MOFA-Video-Traj\run_gradio.py", line 225, in init
self.pipeline, self.cmp = init_models(
File "C:\MOFA\MOFA-Video-Traj\run_gradio.py", line 93, in init_models
from pipeline.pipeline import FlowControlNetPipeline
File "C:\MOFA\MOFA-Video-Traj\pipeline\pipeline.py", line 9, in
from models.svdxt_featureflow_forward_controlnet_s2d_fixcmp_norefine import FlowControlNet
File "C:\MOFA\MOFA-Video-Traj\models\svdxt_featureflow_forward_controlnet_s2d_fixcmp_norefine.py", line 12, in
from models.softsplat import softsplat
File "C:\MOFA\MOFA-Video-Traj\models\softsplat.py", line 4, in
import cupy
File "C:\Users\bomchoho\AppData\Local\anaconda3\envs\traj\lib\site-packages\cupy_init_.py", line 20, in
raise ImportError(f'''
ImportError:

Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html

Original error:
ImportError: DLL load failed while importing runtime: 找不到指定的模組。

IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

(traj) C:\MOFA\MOFA-Video-Traj>

CMP model is now gone

There's now a 404 error when downloading the first CMP checkpoint in the instructions. Was it taken down by huggingface? Is there another source somewhere?

What version of Python is recommended?

Please let us know which version to use for a local setup. There is nothing in the docs. Thanks

About train dataset processing

Hi, thank you for such a wonderful work！I would like to ask a question about the preparation of training sets. Notice that you mentioned in the paper During training, we randomly sample 14 video frames with a stride of 4. ...with a resolution of 256 × 256. We first train ... and directly taking the first frame together with the estimated optical flow from Unimatch.

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible，but it hard to just use these points to reconstruct the ori video, is this normal？

Btw，I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

No module named 'cv2'

Traceback (most recent call last):
File "G:\AI\ZHB\MOFA-Video\MOFA-Video-Hybrid\aniportrait\audio2ldmk.py", line 6, in
import cv2
ModuleNotFoundError: No module named 'cv2'

opencv-python已经装了，却提示缺少cv2，请问这个如何解决？

how to install?

i would appreciate a simple guide on how to install this. wasted 2 hours already trying everything, git clone and stuff.

只能得到25帧?

which means that the first 25 frames of the generated landmark sequences are used to obtain the result.

最终的结果只有25帧吗？可以获取更多帧吗？

Stage2 training included and freeze both cmp and image encoder flow, any particular reason?

Thx for releasing the training code

From my understanding from the paper, stage 2 training should include and find tune cmp module but in the training code, cmp is included and freezed as well as image encoder flow part. Is there a particular reason for freezing these two parts?

Diffusion process reached 100% in 10h and still no output

I ran MOFA-Video-Traj in a PyTorch NGC Container.
I followed the demo's detailed instructions to set the trajectories. After initiating the demo, it processed for 10 hours.
While running, the demo only utilized 7GB of memory and 1 CPU, despite having access to more resources.
When the Stable Diffusion process reached 100%, it didn't output any video.

花了一天的时间安装上了，现在就是不会使用

如题
有没有具体的操作教程，MOFA-Video-Traj还简单些，另一个MOFA-Video-Hybrid上传了后，怎么画，尤其是后面两个，一个是轨迹，一个是landmark，谁有具体的操作教程？感谢

audio_driven run error

你好，我运行了audio-driven gradio，然后依次添加了图片（luna）、音频（一段BGM）、轨迹、运动画笔和landmark蒙版，但是运行的时候报错：
AssertionError: Errors in generating landmarks! Please trace back up for detailed error report.
请问是什么原因呀？

Questions with hand-crafted trajectories

First of all, thank you very much for sharing such a wonderful work!

I have a question with the formulation of hand-crafted trajectory.
If I understand correctly, hand-crafted trajectories F_{i-1} are sparse motion hints between frames i and 0. However, according to the definition of F (from section 3.2, ), F^s_i is a sparse forward optical flow between frames i and i+1.
If the network was trained on the sparse optical flow samples from the definition from 3.2, wouldn't this mismatch between the two definitions cause unexpected behavior, since definition for F for hand-crafted trajectory is vastly different from what the network was trained on?

Let me know if I am missing something.
Thanks!

MOFA-Video-Traj graio demo出错

作者好，已经把所有模型下载完毕，但是启动gradio的时候，出现 missing keys的问题：

另外，当我上传图片后，开始选择轨迹起点时候，会报错：

unable to run due to errors

Great Work ! :). I liked the concept and started experimenting, but I'm encountering the following issue. I need your help to fix it.

without paint mark

File "/mnt/d/MOFA-Video/MOFA-Video-Traj/venv/lib/python3.10/site-packages/scipy/interpolate/_cubic.py", line 249, in init
x, _, y, axis, _ = prepare_input(x, y, axis)
File "/mnt/d/MOFA-Video/MOFA-Video-Traj/venv/lib/python3.10/site-packages/scipy/interpolate/_cubic.py", line 55, in prepare_input
raise ValueError("x must contain at least 2 elements.")
ValueError: x must contain at least 2 elements.

with paint mark
es/gradio/utils.py", line 832, in wrapper
response = f(*args, **kwargs)
File "/mnt/d/MOFA-Video/MOFA-Video-Traj/run_gradio.py", line 507, in run
divide_points_afterinterpolate(resized_all_points_384, motion_brush_mask_384)
File "/mnt/d/MOFA-Video/MOFA-Video-Traj/run_gradio.py", line 43, in divide_points_afterinterpolate
starts = resized_all_points[:, 0] # [K, 2]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Please post video tutorial as an example

After hours to install and fixing the issues, couldn't put it to use.

If anyone figured how to use please create a small video tutorial. I know there are instructions in the WebUI but maynot be helpful for everyone especially who are not technical or new to this.

landmarks condition is stronger than flow condition

Hi,
I find that your FlowControlNet injects landmarks except flow. Through my experiments, I find that landmarks condition is actually stronger than flow condition. Below is my result:

output_flow.mp4

This is the result only using flow condition.

output_ldmk.mp4

This is the result only using landmark condition.

output_flow_ldmk.mp4

This is the result using landmark and flow condition simultaneously.

Can you explain this result? How can we get good results using only flow?

可以支持cuda12.1吗

run gradio error

Hi when I run the gradio,error message is as follows, Can you help me out? Thanks a lot
Traceback (most recent call last):
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 501, in process_events
response = await self.call_prediction(awake_events, batch)
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 465, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: None
torch.Size([1, 24, 2, 512, 512])
Traceback (most recent call last):
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 456, in call_prediction
output = await route_utils.call_process_api(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1522, in process_api
result = await self.call_function(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1144, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/utils.py", line 674, in wrapper
response = f(*args, **kwargs)
File "/workspace/MOFA-Video/MOFA-Video-Hybrid/run_gradio_audio_driven.py", line 760, in run
save_root = os.path.join(os.path.dirname(audio_path), save_name)
File "/opt/miniconda/envs/mofa/lib/python3.10/posixpath.py", line 152, in dirname
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 456, in call_prediction
output = await route_utils.call_process_api(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1522, in process_api
result = await self.call_function(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1144, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/utils.py", line 674, in wrapper
response = f(*args, **kwargs)
File "/workspace/MOFA-Video/MOFA-Video-Hybrid/run_gradio_audio_driven.py", line 760, in run
save_root = os.path.join(os.path.dirname(audio_path), save_name)
File "/opt/miniconda/envs/mofa/lib/python3.10/posixpath.py", line 152, in dirname
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 501, in process_events
response = await self.call_prediction(awake_events, batch)
File "/opt/miniconda/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 465, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: None

Hybird gradio error

Hi,
Thank you for your great work. I met following error with environment. Could you please help me check with this, thank you.
Traceback (most recent call last):
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 456, in call_prediction
output = await route_utils.call_process_api(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1522, in process_api
result = await self.call_function(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1144, in call_function
prediction = await anyio.to_thread.run_sync(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/utils.py", line 674, in wrapper
response = f(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/run_gradio_audio_driven.py", line 860, in run
outputs = self.forward_sample(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/run_gradio_audio_driven.py", line 452, in forward_sample
val_output = self.pipeline(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/pipeline/pipeline.py", line 454, in call
down_res_face_tmp, mid_res_face_tmp, controlnet_flow, _ = self.face_controlnet(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/ldmk_ctrlnet.py", line 446, in forward
warped_cond_feature, occlusion_mask = self.get_warped_frames(cond_feature, scale_flows[fh // ch], fh // ch)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/ldmk_ctrlnet.py", line 300, in get_warped_frames
warped_frame = softsplat(tenIn=first_frame.float(), tenFlow=flows[:, i].float(), tenMetric=None, strMode='avg').to(dtype) # [b, c, w, h]
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/softsplat.py", line 251, in softsplat
tenOut = softsplat_func.apply(tenIn, tenFlow)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
return fwd(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/softsplat.py", line 284, in forward
cuda_launch(cuda_kernel('softsplat_out', '''
File "cupy/_util.pyx", line 67, in cupy._util.memoize.decorator.ret
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/softsplat.py", line 225, in cuda_launch
return cupy.cuda.compile_with_cache(objCudacache[strKey]['strKernel'], tuple(['-I ' + os.environ['CUDA_HOME'], '-I ' + os.environ['CUDA_HOME'] + '/include'])).get_function(objCudacache[strK
ey]['strFunction'])
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 464, in compile_with_cache
return _compile_module_with_cache(*args, **kwargs)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache
return _compile_with_cache_cuda(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 561, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid
Traceback (most recent call last):
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 456, in call_prediction
output = await route_utils.call_process_api(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1522, in process_api
result = await self.call_function(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/blocks.py", line 1144, in call_function
prediction = await anyio.to_thread.run_sync(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/utils.py", line 674, in wrapper
response = f(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/run_gradio_audio_driven.py", line 860, in run
outputs = self.forward_sample(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/run_gradio_audio_driven.py", line 452, in forward_sample
val_output = self.pipeline(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/pipeline/pipeline.py", line 454, in call
down_res_face_tmp, mid_res_face_tmp, controlnet_flow, _ = self.face_controlnet(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/ldmk_ctrlnet.py", line 446, in forward
warped_cond_feature, occlusion_mask = self.get_warped_frames(cond_feature, scale_flows[fh // ch], fh // ch)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/ldmk_ctrlnet.py", line 300, in get_warped_frames
warped_frame = softsplat(tenIn=first_frame.float(), tenFlow=flows[:, i].float(), tenMetric=None, strMode='avg').to(dtype) # [b, c, w, h]
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/softsplat.py", line 251, in softsplat
tenOut = softsplat_func.apply(tenIn, tenFlow)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
return fwd(*args, **kwargs)
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/softsplat.py", line 284, in forward
cuda_launch(cuda_kernel('softsplat_out', '''
File "cupy/_util.pyx", line 67, in cupy._util.memoize.decorator.ret
File "/autodl-fs/data/yt/MOFA-Video-Hybrid/models/softsplat.py", line 225, in cuda_launch
return cupy.cuda.compile_with_cache(objCudacache[strKey]['strKernel'], tuple(['-I ' + os.environ['CUDA_HOME'], '-I ' + os.environ['CUDA_HOME'] + '/include'])).get_function(objCudacache[strK
ey]['strFunction'])
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 464, in compile_with_cache
return _compile_module_with_cache(*args, **kwargs)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache
return _compile_with_cache_cuda(
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 561, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 501, in process_events
response = await self.call_prediction(awake_events, batch)
File "/root/miniconda3/envs/mofa/lib/python3.10/site-packages/gradio/queueing.py", line 465, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: None

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _Proact

Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 501, in process_events
response = await self.call_prediction(awake_events, batch)
File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 465, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: None
You selected None at [202, 183] from image
[[[202, 183]]]
You selected None at [266, 181] from image
[[[202, 183], [266, 181]]]
torch.Size([1, 24, 2, 512, 512])
You selected None at [361, 184] from image
[[[202, 183], [266, 181]], [[361, 184]]]
You selected None at [322, 183] from image
[[[202, 183], [266, 181]], [[361, 184], [322, 183]]]
torch.Size([1, 24, 2, 512, 512])
torch.Size([1, 24, 2, 512, 512])
torch.Size([1, 24, 2, 512, 512])
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ckpts/aniportrait/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ckpts/aniportrait/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1720787583.023144 13052 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1720787583.041082 2184 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1720787583.049103 12412 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1720787583.057753 13100 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please '
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [00:47<00:00, 1.89s/it]
100%|█████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 192.78it/s]
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\mofa\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "D:\ProgramData\anaconda3\envs\mofa\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

Question about max frame numbers

Amazing project!
What is the max frames it generates?

AttributeError: module 'cupy.cuda' has no attribute 'compile_with_cache'

I was using cupy-cuda12, I got this error. Any suggestions?

trying to set-up

Hi, thanks for this wonderful code !

I'm getting issues to install it, tried to follow the folder logic

(cupyenv) PS E:\MOFA-Video-main\MOFA-Video-main\MOFA-Video-Traj> python run_gradio.py
start loading models...
IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

Traceback (most recent call last):
File "run_gradio.py", line 655, in
DragNUWA_net = Drag("cuda:0", target_size, target_size, 25)
File "run_gradio.py", line 225, in init
self.pipeline, self.cmp = init_models(
File "run_gradio.py", line 101, in init_models
vae = AutoencoderKLTemporalDecoder.from_pretrained(
File "C:\Users\aravi.conda\envs\cupyenv\lib\site-packages\diffusers\models\modeling_utils.py", line 712, in from_pretrained
config, unused_kwargs, commit_hash = cls.load_config(
File "C:\Users\aravi.conda\envs\cupyenv\lib\site-packages\diffusers\configuration_utils.py", line 365, in load_config
raise EnvironmentError(
OSError: Error no file named config.json found in directory ckpts/stable-video-diffusion-img2vid-xt-1-1.

i can't find any config.json for the stable-video-diffusion-img2vid-xt-1-1

I downloaded file by files from huggingface the diffuser but maybe it needs to be done in a more convenient way ?

thanks in advance !

traj demo 报错

作者你好，在尝试运行traj demo的时候，我通过顺序执行：上传图片->add trajectories->add motion brush here，我发现我的visualized flow窗口始终没有出现图片，点击run以后报错error，报错信息如下：

File "/viscam/projects/wonderland/haoyi/MOFA-Video/MOFA-Video-Traj/run_gradio.py", line 498, in run
new_resized_all_points.append(interpolate_trajectory(input_all_points[tnum], self.model_length))
File "/viscam/projects/wonderland/haoyi/MOFA-Video/MOFA-Video-Traj/run_gradio.py", line 168, in interpolate_trajectory
fx = PchipInterpolator(t, x)
File "/viscam/u/haoyid/.conda/envs/mofa/lib/python3.10/site-packages/scipy/interpolate/_cubic.py", line 249, in init
x, _, y, axis, _ = prepare_input(x, y, axis)
File "/viscam/u/haoyid/.conda/envs/mofa/lib/python3.10/site-packages/scipy/interpolate/_cubic.py", line 55, in prepare_input
raise ValueError("x must contain at least 2 elements.")
ValueError: x must contain at least 2 elements.

请问你知道这种问题如何解决吗？谢谢

Clarifications on experiments

Hi,

I have several questions with regards to the ablation settings and training details.

(1) Could you elaborate on the experiment condition for non-tuning model? (Section 4.2)
Does this setting include training the reference encoder, and warping features with dense optical flow estimated through Unimatch?
Or, is this purely a tuning-free model (without ControlNet) where the dense optical flow is concatenated with the reference frame and given as condition to SVD?

(2) Could you elaborate on the first stage of training the model in Implementation Details section? (Section 4)
The paper says, "We first train the model as a flow-based reconstruction model by removing the S2D motion generator and directly taking the first frame together with the estimated optical flow from Unimatch"
The spirit of the question is same as question (1). Does this mean warping the features from reference encoder using the dense optical flow from Unimatch?

Thank you.

Periodic Sampling for Longer Animation效果复现的问题

按照论文中Periodic Sampling for Longer Animation部分的方案，我在原代码的基础上，在denoise step中将latent按照滑窗大小进行拆分后分别通过unet模型，然后在合并窗口。但是复现后发现物体的运动总出现抖动，不知道怎么回事，有人遇到类似的问题吗？

AssertionError: Errors in generating landmarks! Please trace back up for detailed error report.

Running application, upload as following,

click run button, got error as below


D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
  warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please '
Traceback (most recent call last):
  File "D:\MOFA-Video\MOFA-Video-Hybrid\aniportrait\audio2ldmk.py", line 309, in <module>
    main()
  File "D:\MOFA-Video\MOFA-Video-Hybrid\aniportrait\audio2ldmk.py", line 253, in main
    audio_chunks[-2] = torch.cat((audio_chunks[-2], audio_chunks[-1]), dim=1)
IndexError: list index out of range
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 860, in run
    outputs = self.forward_sample(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 442, in forward_sample
    ldmk_controlnet_flow, ldmk_pose_imgs, landmarks, num_frames = self.get_landmarks(save_root, first_frame_path, audio_path, input_first_frame[0], self.model_length, ldmk_render=ldmk_render)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 708, in get_landmarks
    ldmknpy_dir = self.audio2landmark(audio_path, first_frame_path, ldmk_dir, ldmk_render)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 698, in audio2landmark
    assert return_code == 0, "Errors in generating landmarks! Please trace back up for detailed error report."
AssertionError: Errors in generating landmarks! Please trace back up for detailed error report.
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 860, in run
    outputs = self.forward_sample(
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 442, in forward_sample
    ldmk_controlnet_flow, ldmk_pose_imgs, landmarks, num_frames = self.get_landmarks(save_root, first_frame_path, audio_path, input_first_frame[0], self.model_length, ldmk_render=ldmk_render)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 708, in get_landmarks
    ldmknpy_dir = self.audio2landmark(audio_path, first_frame_path, ldmk_dir, ldmk_render)
  File "D:\MOFA-Video\MOFA-Video-Hybrid\run_gradio_audio_driven.py", line 698, in audio2landmark
    assert return_code == 0, "Errors in generating landmarks! Please trace back up for detailed error report."
AssertionError: Errors in generating landmarks! Please trace back up for detailed error report.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 501, in process_events
    response = await self.call_prediction(awake_events, batch)
  File "D:\ProgramData\anaconda3\envs\mofa\lib\site-packages\gradio\queueing.py", line 465, in call_prediction
    raise Exception(str(error) if show_error else None) from error
Exception: None

pytorch3d,cupy-12x install successfully. Any workround for that?

I tested my own image according to the case, the character does not speak, this is why (I added the character's mask)

Question about the mask in paper `Sparse Motion Vectors from Dense Optical Flow`

Hi this work is so great and i wish to try Keypoints from driven-video! There is no corresponding inference script for Keypoints from driven-video. So I am trying to code the inference code by myself.

I met problems in generating sparse motion vectors from video's dense optical flow. It seems unclear how to get the mask with watershed sampling strategy for a given video. How many masks should we get if we have a 25 frame video?
If total framesNum is 25, Is the mask should be taken frame by frame(for optical flow) like having 24 masks, OR the mask only need to use the mask in first frame?

Thanks.

gradio runtime error ，ValueError: ('Event not found', '36e6e6a9a14943379612ad24e5166c87')

when i click on the image for add trajectory ,an error occurs.

myniuuu / mofa-video Goto Github PK

mofa-video's Introduction

🦄️ MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model (ECCV 2024)

🔥🔥🔥 New Features/Updates

📰 CODE RELEASE

TL;DR

Image 🏞️ + Hybrid Controls 🕹️ = Videos 🎬🍿

Introduction

🕹️ Image Animation with Hybrid Controls

1. Clone the Repository

2. Environment Setup

3. Downloading Checkpoints

4. Run Gradio Demo

💫 Trajectory-based Image Animation

Training your own MOFA-Adapter

Citation

Acknowledgements

mofa-video's People

Contributors

Stargazers

Watchers

Forkers

mofa-video's Issues

Original error: ImportError: DLL load failed while importing runtime: 找不到指定的模組。

IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

Original error: ImportError: DLL load failed while importing runtime: 找不到指定的模組。

IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

(cupyenv) PS E:\MOFA-Video-main\MOFA-Video-main\MOFA-Video-Traj> python run_gradio.py start loading models... IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.

Recommend Projects

Recommend Topics

Recommend Org

Original error:
ImportError: DLL load failed while importing runtime: 找不到指定的模組。

Original error:
ImportError: DLL load failed while importing runtime: 找不到指定的模組。

(cupyenv) PS E:\MOFA-Video-main\MOFA-Video-main\MOFA-Video-Traj> python run_gradio.py
start loading models...
IMPORTANT: You are using gradio version 4.5.0, however version 4.29.0 is available, please upgrade.