Giter Club home page Giter Club logo

musev's Introduction

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Zhiqiang Xia *, Zhaokang Chen*, Bin Wu, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, Corresponding Author, [email protected])

Lyra Lab, Tencent Music Entertainment

github huggingface HuggingfaceSpace project Technical report (comming soon)

We have setup the world simulator vision since March 2023, believing diffusion models can simulate the world. MuseV was a milestone achieved around July 2023. Amazed by the progress of Sora, we decided to opensource MuseV, hopefully it will benefit the community. Next we will move on to the promising diffusion+transformer scheme.

Update:

  1. We have released MuseTalk, a real-time high quality lip sync model, which can be applied with MuseV as a complete virtual human generation solution.
  2. 🆕 We are thrilled to announce that MusePose has been released. MusePose is an image-to-video generation framework for virtual human under control signal like pose. Together with MuseV and MuseTalk, we hope the community can join us and march towards the vision where a virtual human can be generated end2end with native ability of full body movement and interaction.

Overview

MuseV is a diffusion-based virtual human video generation framework, which

  1. supports infinite length generation using a novel Visual Conditioned Parallel Denoising scheme.
  2. checkpoint available for virtual human video generation trained on human dataset.
  3. supports Image2Video, Text2Image2Video, Video2Video.
  4. compatible with the Stable Diffusion ecosystem, including base_model, lora, controlnet, etc.
  5. supports multi reference image technology, including IPAdapter, ReferenceOnly, ReferenceNet, IPAdapterFaceID.
  6. training codes (comming very soon).

Important bug fixes

  1. musev_referencenet_pose: model_name of unet, ip_adapter of Command is not correct, please use musev_referencenet_pose instead of musev_referencenet.

News

  • [03/27/2024] release MuseV project and trained model musev, muse_referencenet.
  • [03/30/2024] add huggingface space gradio to generate video in gui

Model

Overview of model structure

model_structure

Parallel denoising

parallel_denoise

Cases

All frames were generated directly from text2video model, without any post process. MoreCase is in project, including 1-2 minute video.

Examples bellow can be accessed at configs/tasks/example.yaml

Text/Image2Video

Human

image video prompt
yongen_c.mp4
(masterpiece, best quality, highres:1),(1boy, solo:1),(eye blinks:1.8),(head wave:1.3)
seaside4.mp4
(masterpiece, best quality, highres:1), peaceful beautiful sea scene
seaside_girl.mp4
(masterpiece, best quality, highres:1), peaceful beautiful sea scene
boy_play_guitar.mp4
(masterpiece, best quality, highres:1), playing guitar
girl_play_guitar2_c.mp4
(masterpiece, best quality, highres:1), playing guitar
dufu.mp4
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3),Chinese ink painting style
Mona_Lisa_c.mp4
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3)

Scene

image video prompt
waterfall4_c.mp4
(masterpiece, best quality, highres:1), peaceful beautiful waterfall, an endless waterfall
seaside2_c.mp4
(masterpiece, best quality, highres:1), peaceful beautiful sea scene

VideoMiddle2Video

pose2video In duffy mode, pose of the vision condition frame is not aligned with the first frame of control video. posealign will solve the problem.

image video prompt
video1_plus.mp4
(masterpiece, best quality, highres:1) , a girl is dancing, animation
pose2video_bear_with_audio.mp4
(masterpiece, best quality, highres:1), is dancing, animation

MuseTalk

The character of talk, Sun Xinying is a supermodel KOL. You can follow her on douyin.

name video
talk
sun02.mp4
sing
default.mp4

TODO:

  • technical report (comming soon).
  • training codes.
  • release pretrained unet model, which is trained with controlnet、referencenet、IPAdapter, which is better on pose2video.
  • support diffusion transformer generation framework.
  • release posealign module

Quickstart

Prepare python environment and install extra package like diffusers, controlnet_aux, mmcm.

Third party integration

Thanks for the third-party integration, which makes installation and use more convenient for everyone. We also hope you note that we have not verified, maintained, or updated third-party. Please refer to this project for specific results.

netdisk:https://www.123pan.com/s/Pf5Yjv-Bb9W3.html

code: glut

Prepare environment

You are recommended to use docker primarily to prepare python environment.

prepare python env

Attention: we only test with docker, there are maybe trouble with conda, or requirement. We will try to fix it. Use docker Please.

Method 1: docker

  1. pull docker image
docker pull anchorxia/musev:latest
  1. run docker
docker run --gpus all -it --entrypoint /bin/bash anchorxia/musev:latest

The default conda env is musev.

Method 2: conda

create conda environment from environment.yaml

conda env create --name musev --file ./environment.yml

Method 3: pip requirements

pip install -r requirements.txt

Prepare mmlab package

if not use docker, should install mmlab package additionally.

pip install --no-cache-dir -U openmim 
mim install mmengine 
mim install "mmcv>=2.0.1" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0" 

Prepare custom package / modified package

clone

git clone --recursive https://github.com/TMElyralab/MuseV.git

prepare PYTHONPATH

current_dir=$(pwd)
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/MMCM
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/diffusers/src
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/controlnet_aux/src
cd MuseV
  1. MMCM: multi media, cross modal process package。
  2. diffusers: modified diffusers package based on diffusers
  3. controlnet_aux: modified based on controlnet_aux

Download models

git clone https://huggingface.co/TMElyralab/MuseV ./checkpoints
  • motion: text2video model, trained on tiny ucf101 and tiny webvid dataset, approximately 60K videos text pairs. GPU memory consumption testing on resolution$=512*512$, time_size=12.
    • musev/unet: only has and train unet motion module. GPU memory consumption $\approx 8G$.
    • musev_referencenet: train unet module, referencenet, IPAdapter. GPU memory consumption $\approx 12G$.
      • unet: motion module, which has to_k, to_v in Attention layer refer to IPAdapter
      • referencenet: similar to AnimateAnyone
      • ip_adapter_image_proj.bin: images clip emb project layer, refer to IPAdapter
    • musev_referencenet_pose: based on musev_referencenet, fix referencenetand controlnet_pose, train unet motion and IPAdapter. GPU memory consumption $\approx 12G$
  • t2i/sd1.5: text2image model, parameter are frozen when training motion module. Different t2i base_model has a significant impact.could be replaced with other t2i base.
  • IP-Adapter/models: download from IPAdapter
    • image_encoder: vision clip model.
    • ip-adapter_sd15.bin: original IPAdapter model checkpoint.
    • ip-adapter-faceid_sd15.bin: original IPAdapter model checkpoint.

Inference

Prepare model_path

Skip this step when run example task with example inference command. Set model path and abbreviation in config, to use abbreviation in inference script.

  • T2I SD:ref to musev/configs/model/T2I_all_model.py
  • Motion Unet: refer to musev/configs/model/motion_model.py
  • Task: refer to musev/configs/tasks/example.yaml

musev_referencenet

text2video

python scripts/inference/text2video.py   --sd_model_name majicmixRealv6Fp16   --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet   -test_data_path ./configs/tasks/example.yaml  --output_dir ./output  --n_batch 1  --target_datas yongen  --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder  --time_size 12 --fps 12  

common parameters:

  • test_data_path: task_path in yaml extention
  • target_datas: sep is ,, sample subtasks if name in test_data_path is in target_datas.
  • sd_model_cfg_path: T2I sd models path, model config path or model path.
  • sd_model_name: sd model name, which use to choose full model path in sd_model_cfg_path. multi model names with sep =,, or all
  • unet_model_cfg_path: motion unet model config path or model path。
  • unet_model_name: unet model name, use to get model path in unet_model_cfg_path, and init unet class instance in musev/models/unet_loader.py. multi model names with sep=,, or all. If unet_model_cfg_path is model path, unet_name must be supported in musev/models/unet_loader.py
  • time_size: num_frames per diffusion denoise generation。default=12.
  • n_batch: generation numbers of shot, $total_frames=n_batch * time_size + n_viscond$, default=1
  • context_frames: context_frames num. If time_size > context_frametime_size window is split into many sub-windows for parallel denoising"。 default=12

To generate long videos, there two ways:

  1. visual conditioned parallel denoise: set n_batch=1, time_size = all frames you want.
  2. traditional end-to-end: set time_size = context_frames = frames of a shot (12), context_overlap = 0;

model parameters: supports referencenet, IPAdapter, IPAdapterFaceID, Facein.

  • referencenet_model_name: referencenet model name.
  • ImageClipVisionFeatureExtractor: ImageEmbExtractor name, extractor vision clip emb used in IPAdapter.
  • vision_clip_model_path: ImageClipVisionFeatureExtractor model path.
  • ip_adapter_model_name: from IPAdapter, it's ImagePromptEmbProj, used with ImageEmbExtractor
  • ip_adapter_face_model_name: IPAdapterFaceID, from IPAdapter to keep faceid,should set face_image_path

Some parameters that affect the motion range and generation results

  • video_guidance_scale: Similar to text2image, control influence between cond and uncond,default=3.5
  • use_condition_image: Whether to use the given first frame for video generation, if not generate vision condition frames first. Default=True.
  • redraw_condition_image: Whether to redraw the given first frame image.
  • video_negative_prompt: Abbreviation of full negative_prompt in config path. default=V2.

video2video

t2i base_model has a significant impact. In this case, fantasticmix_v10 performs better than majicmixRealv6Fp16.

python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12

import parameters

Most of the parameters are same as musev_text2video. Special parameters of video2video are:

  1. need to set video_path as reference video in test_data. Now reference video supports rgb video and controlnet_middle_video
  • which2video: whether rgb video influences initial noise, influence of rgb is stronger than of controlnet condition.
  • controlnet_name:whether to use controlnet condition, such as dwpose,depth.
  • video_is_middle: video_path is rgb video or controlnet_middle_video. Can be set for every test_data in test_data_path.
  • video_has_condition: whether condtion_images is aligned with the first frame of video_path. If Not, exrtact condition of condition_images firstly generate, and then align with concatation. set in test_data

all controlnet_names refer to mmcm

['pose', 'pose_body', 'pose_hand', 'pose_face', 'pose_hand_body', 'pose_hand_face', 'dwpose', 'dwpose_face', 'dwpose_hand', 'dwpose_body', 'dwpose_body_hand', 'canny', 'tile', 'hed', 'hed_scribble', 'depth', 'pidi', 'normal_bae', 'lineart', 'lineart_anime', 'zoe', 'sam', 'mobile_sam', 'leres', 'content', 'face_detector']

musev_referencenet_pose

Only used for pose2video train based on musev_referencenet, fix referencenet, pose-controlnet, and T2I, train motion module and IPAdapter.

t2i base_model has a significant impact. In this case, fantasticmix_v10 performs better than majicmixRealv6Fp16.

python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev_referencenet_pose --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet_pose    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas  dance1   --fps 12 --time_size 12

musev

Only has motion module, no referencenet, requiring less gpu memory.

text2video

python scripts/inference/text2video.py   --sd_model_name majicmixRealv6Fp16   --unet_model_name musev   -test_data_path ./configs/tasks/example.yaml  --output_dir ./output  --n_batch 1  --target_datas yongen  --time_size 12 --fps 12

video2video

python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev    -test_data_path ./configs/tasks/example.yaml --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas  dance1   --fps 12 --time_size 12

Gradio demo

MuseV provides gradio script to generate a GUI in a local machine to generate video conveniently.

cd scripts/gradio
python app.py

Acknowledgements

  1. MuseV has referred much to TuneAVideo, diffusers, Moore-AnimateAnyone, animatediff, IP-Adapter, AnimateAnyone, VideoFusion, insightface.
  2. MuseV has been built on ucf101 and webvid datasets.

Thanks for open-sourcing!

Limitation

There are still many limitations, including

  1. Lack of generalization ability. Some visual condition image perform well, some perform bad. Some t2i pretraied model perform well, some perform bad.
  2. Limited types of video generation and limited motion range, partly because of limited types of training data. The released MuseV has been trained on approximately 60K human text-video pairs with resolution 512*320. MuseV has greater motion range while lower video quality at lower resolution. MuseV tends to generate less motion range with high video quality. Trained on larger, higher resolution, higher quality text-video dataset may make MuseV better.
  3. Watermarks may appear because of webvid. A cleaner dataset without watermarks may solve this issue.
  4. Limited types of long video generation. Visual Conditioned Parallel Denoise can solve accumulated error of video generation, but the current method is only suitable for relatively fixed camera scenes.
  5. Undertrained referencenet and IP-Adapter, beacause of limited time and limited resources.
  6. Understructured code. MuseV supports rich and dynamic features, but with complex and unrefacted codes. It takes time to familiarize.

Citation

@article{musev,
  title={MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising},
  author={Xia, Zhiqiang and Chen, Zhaokang and Wu, Bin and Li, Chao and Hung, Kwok-Wai and Zhan, Chao and He, Yingjie and Zhou, Wenjiang},
  journal={arxiv},
  year={2024}
}

Disclaimer/License

  1. code: The code of MuseV is released under the MIT License. There is no limitation for both academic and commercial usage.
  2. model: The trained model are available for non-commercial research purposes only.
  3. other opensource model: Other open-source models used must comply with their license, such as insightface, IP-Adapter, ft-mse-vae, etc.
  4. The testdata are collected from internet, which are available for non-commercial research purposes only.
  5. AIGC: This project strives to impact the domain of AI-driven video generation positively. Users are granted the freedom to create videos using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

musev's People

Contributors

asdf2kr avatar czk32611 avatar eltociear avatar honestqiao avatar itechmusic avatar mahone3297 avatar phighting avatar xzqjack avatar zhanchao019 avatar zhoupb01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

musev's Issues

运行示例报错

Traceback (most recent call last):
File "/workspace/MuseV/scripts/inference/text2video.py", line 16, in
from diffusers.models.autoencoder_kl import AutoencoderKL
ModuleNotFoundError: No module named 'diffusers.models.autoencoder_kl'

用docker拉取的

training memory usage

Could you please provide some information about gpus you use for training and the duration of your training stages? As you kindly plan to open source your training code soon, I'm wondering whether it is possible to train it on my devices!

眨眼频次如何控制

请问prompt如何控制眨眼的频次,目前eye blinks:1.8发现眨眼较多,比较不自然

error when video2video

2024-04-14 17:41:48,897- root:180- ERROR- Traceback (most recent call last):
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-MuseV\nodes.py", line 2255, in run
) = sd_predictor.run_pipe_video2video(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI_windows_portable\ComfyUI/custom_nodes/ComfyUI-MuseV\musev\pipelines\pipeline_controlnet_predictor.py", line 1237, in run_pipe_video2video
out_videos = np.concatenate(out_videos, axis=2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: need at least one array to concatenate

"pose2video" Long video generation results are incorrect.

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev_referencenet_pose --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet_pose -test_data_path ./configs/tasks/example.yaml --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --output_dir ./output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas dacne1 --fps 12 --time_size 96

vm2v_m.majicmixRealv6Fp16_rm.musev_referencenet_c.dacne1_w.512_h.960_t.12_n.8_vn.10_w.0.001_w.0.5_s.22855934_n.dwpose_body_hand_s.0.8_g.7.5_vs.1.0_vg.3.5_p.e1d72_V2_r.spa_ip.spa_f.no.mp4

请问prompt中都支持哪些内置变量,哪里有文档可以查到

mmajicmixRealv6Fp16_rmno_caseclv0cr7um0000q6pnythq75xm_w1024_h1792_t12_nb1_s26305314_p9b52a_w0.001_ms8.0_s0.8_g3.5_c-iclv0c_r-cFalse_w0.5_V2_rno_ipno_fno.mp4

prompt:(masterpiece, best quality, highres:1)(eye blinks:1.8)1monkey,applauding

生成的视频没有眨眼和拍手。

求助:请问prompt中像eye blinks,head wave之类的变量,在哪里可以查到?

启动app.py,提示找不到路径

这个问题怎么解决啊,求助~

Traceback (most recent call last):
File "G:\AI\ZHB\MuseV\scripts\gradio\app.py", line 32, in
from gradio_video2video import online_v2v_inference
File "G:\AI\ZHB\MuseV\scripts\gradio\gradio_video2video.py", line 302, in
sd_model_params_dict_src = load_pyhon_obj(sd_model_cfg_path, "MODEL_CFG")
File "G:\AI\ZHB\MuseV\MMCM\mmcm\utils\load_util.py", line 33, in load_pyhon_obj
spec.loader.exec_module(module)
File "", line 879, in exec_module
File "", line 1016, in get_code
File "", line 1073, in get_data
FileNotFoundError: [Errno 2] No such file or directory: 'G:\AI\ZHB\MuseV\../../configs/model/T2I_all_model.py'

10G显存还跑不起来这个museV。。。

10G显存还跑不起来这个museV。。。要求也太高了。。。能否优化下显存使用的代码???
torch.cuda.outofMemoryerror
而且能否在文章说明的开头描述清除对硬件的要求。。。我花了好多时间安装好并运行,提示显存不够。。。
如果我早点能看到对显存有这么高的要求,也不会安装了。。

can't run gradio app.py

when i run app.py,an error appears as below:
Traceback (most recent call last):
File "/home/qm/MuseV/scripts/gradio/app.py", line 6, in
from gradio_videocreation_text2video import online_t2v_inference
File "/home/qm/MuseV/scripts/gradio/gradio_videocreation_text2video.py", line 20, in
from mmcm.utils.load_util import load_pyhon_obj
File "/home/qm/agui/miniconda/envs/musev/lib/python3.11/site-packages/mmcm/init.py", line 5, in
from .vision import *
File "/home/qm/agui/miniconda/envs/musev/lib/python3.11/site-packages/mmcm/vision/init.py", line 8, in
from .transition.TransNetV2.transnetv2_predictor import TransNetV2Predictor
ModuleNotFoundError: No module named 'mmcm.vision.transition.TransNetV2'

i've installed
pip install--no-cache-dir -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"

and set up pythonpath env:
current_dir=$(pwd)
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/MMCM
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/diffusers/src
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/controlnet_aux/src

idk what else should i do to solve this problem

pose2map

cannot import name pose2map from controlnet_aux.dwpose

from_tf=true问题

按照git上的pull docker方案安装了环境,在加载t2i/sd1.5/majicmixRealv6Fp16/text_encoder的pytorch_model.bin就出现了from_tf的问题,请教各位大佬有遇到这个问题吗?

Apple M2 compatibility

After lots of research I ended into following error while trying to run Gradio:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Is NVIDIA GPU is a must to run this code locally ?

ModuleNotFoundError: No module named 'mmcm.vision.human.face_cluster_by_infomap'

(musev) root@ubuntu096058:/data/heming/MuseV/scripts/gradio# python app.py
2024-03-28 03:37:50.401133: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-28 03:37:50.940254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/data/heming/MuseV/scripts/gradio/app.py", line 6, in
from gradio_videocreation_text2video import online_t2v_inference
File "/data/heming/MuseV/scripts/gradio/gradio_videocreation_text2video.py", line 20, in
from mmcm.utils.load_util import load_pyhon_obj
File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcm/init.py", line 5, in
from .vision import *
File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcm/vision/init.py", line 1, in
from .human import (
File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcm/vision/human/init.py", line 1, in
from .face_cluster_by_infomap.face_cluster_by_infomap import FaceClusterByInfomap
ModuleNotFoundError: No module named 'mmcm.vision.human.face_cluster_by_infomap'

(musev) root@ubuntu096058:/data/heming/MuseV/scripts/gradio# python gradio_videocreation_video2video.py
2024-03-28 03:33:27.103658: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-28 03:33:27.650329: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/data/heming/MuseV/scripts/gradio/gradio_videocreation_video2video.py", line 19, in
from mmcm.utils.load_util import load_pyhon_obj
File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcm/init.py", line 5, in
from .vision import *
File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcm/vision/init.py", line 1, in
from .human import (
File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcm/vision/human/init.py", line 1, in
from .face_cluster_by_infomap.face_cluster_by_infomap import FaceClusterByInfomap
ModuleNotFoundError: No module named 'mmcm.vision.human.face_cluster_by_infomap'

(musev) root@ubuntu096058:/data/heming/MuseV# pip list|grep mmcm
mmcm 1.0.0

Has been successfully generated, how to solve the watermark problem?

Hello, this is really an excellent open source project!
Is there any way to prevent watermarks from appearing in the original training set in the generated video?
My original input image was 1024 * 1024, but I set the output to 512 * 512. Is this possibly related?
MuseV__00005

MuseV__00005.mp4

How to reproduce the result of pose2video?

Hello. Thank you for your great work.
I can't reproduce the result when using the pose2video model. Can you provide any advice to help me resolve this issue?

command
python scripts/inference/video2video.py --sd_model_name fantasticmix_v10 --unet_model_name musev_referencenet_pose --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet_pose -test_data_path ./configs/tasks/example.yaml --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --output_dir /projects/MuseV/output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas duffy --fps 12 --time_size 96

configs/tasks/example.yaml
- name: "duffy"
prompt: "(best quality), ((masterpiece)), (highres), illustration, original, extremely detailed wallpaper"
video_path: ./data/source_video/pose-for-Duffy-4.mp4
condition_images: ./data/images/duffy.png
refer_image: ${.condition_images}
ipadapter_image: ${.condition_images}
height: 1280
width: 704
img_length_ratio: 1.0
video_is_middle: True # if true, means video_path is controlnet condition, not natural rgb video

output
https://github.com/TMElyralab/MuseV/assets/34409364/149fec6e-62cc-4865-bd99-5ee5caadbf52

text2video生成结果不正确

你好感谢开源
在执行text2video是,生成视频结果中,会出现多个人物,而不是当前人物在动,请问这这是什么原因,应该怎么修改?
python scripts/inference/text2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet -test_data_path ./configs/tasks/example.yaml --output_dir ./output --n_batch 1 --target_datas yongen --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --time_size 12 --fps 12
结果:

m.majicmixRealv6Fp16_rm.musev_referencenet_case.yongen_w.704_h.1216_t.12_nb.1_s.22451218_p.632d2_w.0.001_ms.8.0_s.0.8_g.3.5_c-i.yonge_r-c.False_w.0.5_V2_r.yon_ip.yon_f.no.mp4

video2video 运行完后,gradio前端获取不到生成的视频

系统环境:win11
问题描述: 通过gradio运行video2video, 在result目录下能找到生成的视频

out_videos.shape (1, 3, 30, 1152, 832)
Save to ./results/vm2v_m=majicmixRealv6Fp16_rm=musev_referencenet_c=clutf0ynv000254vswb8l2rf5_w=832_h=1152_t=30_n=1_vn=10_w=0.001_w=0.5_s=7133457_n=dwpose_body_hand_s=0.8_g=7.5_vs=1.0_vg=3.5_p=84d96_V2_r=clu_ip=clu_f=no.mp4

Traceback (most recent call last):
File "D:\condax\musev\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "D:\condax\musev\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "D:\condax\musev\lib\site-packages\gradio\blocks.py", line 1440, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "D:\condax\musev\lib\site-packages\gradio\blocks.py", line 1341, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "D:\condax\musev\lib\site-packages\gradio\components\video.py", line 281, in postprocess
processed_files = (self._format_video(y), None)
File "D:\condax\musev\lib\site-packages\gradio\components\video.py", line 355, in _format_video
video = self.make_temp_copy_if_needed(video)
File "D:\condax\musev\lib\site-packages\gradio\components\base.py", line 234, in make_temp_copy_if_needed
shutil.copy2(file_path, full_temp_file_path)
File "D:\condax\musev\lib\shutil.py", line 434, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "D:\condax\musev\lib\shutil.py", line 256, in copyfile
with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\moviex\AppData\Local\Temp\gradio\4c8e4ba69b6c12f49c4755c1290e941cafe1d67e\vm2v_mmajicmixRealv6Fp16_rmmusev_referencenet_cclutf0ynv000254vswb8l2rf5_w832_h1152_t30_n1_vn10_w0.001_w0.5_s7133457_ndwpose_body_hand_s0.8_g7.5_vs1.0_vg3.5_p84d96_V2_rclu_ipclu_fno.mp4'

docker 运行报错

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container:

Report error

ou
无标题
Uploading 无标题.jpg…
t_videos.shape (1, 3, 12, 448, 256)
G:\BaiduNetdiskDownload\MuseV-240404\MuseV\scripts\gradio\results\vm2v_m=majicmixRealv6Fp16_rm=musev_referencenet_c=cluslkee5000148un6078qfr1_w=256_h=448_t=12_n=1_vn=10_w=0.001_w=0.5_s=2918172_n=dwpose_body_hand_s=0.8_g=7.5_vs=1.0_vg=3.5_p=1ec06_V2_r=clu_ip=clu_f=no.mp4: No such file or directory
Traceback (most recent call last):
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio_ffmpeg_io.py", line 630, in write_frames
p.stdin.write(bb)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\utils.py", line 641, in wrapper
response = f(*args, **kwargs)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\utils.py", line 641, in wrapper
response = f(*args, **kwargs)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV\scripts\gradio\gradio_video2video.py", line 1013, in online_v2v_inference
save_videos_grid_with_opencv(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV\musev\utils\util.py", line 230, in save_videos_grid_with_opencv
imageio.mimsave(path, outputs, **params)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\v2.py", line 495, in mimwrite
return file.write(ims, is_batch=True, **kwargs)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\core\legacy_plugin_wrapper.py", line 253, in write
writer.append_data(image, metadata)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\core\format.py", line 590, in append_data
return self._append_data(im, total_meta)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\plugins\ffmpeg.py", line 600, in _append_data
self._write_gen.send(im)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio_ffmpeg_io.py", line 637, in write_frames
raise IOError(msg)
OSError: [Errno 32] Broken pipe

FFMPEG COMMAND:
G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio_ffmpeg\binaries\ffmpeg-win64-v4.2.2.exe -y -f rawvideo -vcodec rawvideo -s 256x448 -pix_fmt rgb24 -r 6.00 -i - -an -vcodec libx264 -pix_fmt yuv420p -crf 5 -v warning G:\BaiduNetdiskDownload\MuseV-240404\MuseV\scripts\gradio\results\vm2v_m=majicmixRealv6Fp16_rm=musev_referencenet_c=cluslkee5000148un6078qfr1_w=256_h=448_t=12_n=1_vn=10_w=0.001_w=0.5_s=2918172_n=dwpose_body_hand_s=0.8_g=7.5_vs=1.0_vg=3.5_p=1ec06_V2_r=clu_ip=clu_f=no.mp4

FFMPEG STDERR OUTPUT:

i_test_data {'name': 'cluslpl9j000248unfxzvkihc', 'prompt': '(masterpiece, best quality, highres:1)', 'video_path': 'C:\Users\Mocopi\AppData\Local\Temp\gradio\6353fca934d3b8c613a99a607046f336f601fddf\111.mp4', 'condition_images': './t2v_input_image\cluslpl9j000248unfxzvkihc.jpg', 'refer_image': './t2v_input_image\cluslpl9j000248unfxzvkihc.jpg', 'ipadapter_image': './t2v_input_image\cluslpl9j000248unfxzvkihc.jpg', 'height': -1.0, 'width': -1.0, 'img_length_ratio': 1.0} majicmixRealv6Fp16
{'condition_images': './t2v_input_image\cluslpl9j000248unfxzvkihc.jpg',
'height': -1.0,
'img_length_ratio': 1.0,
'ipadapter_image': './t2v_input_image\cluslpl9j000248unfxzvkihc.jpg',
'name': 'cluslpl9j000248unfxzvkihc',
'prompt': '(masterpiece, best quality, highres:1)',
'prompt_hash': '1ec06',
'refer_image': './t2v_input_image\cluslpl9j000248unfxzvkihc.jpg',
'video_path': 'C:\Users\Mocopi\AppData\Local\Temp\gradio\6353fca934d3b8c613a99a607046f336f601fddf\111.mp4',
'width': -1.0}
test_data_height=448
test_data_width=256
2024-04-10 00:33:56,352- py.warnings:109- WARNING- G:\BaiduNetdiskDownload\MuseV-240404\MuseV\diffusers\src\diffusers\configuration_utils.py:135: FutureWarning: Accessing config attribute vae_scale_factor directly via 'VaeImageProcessor' object attribute is deprecated. Please access 'vae_scale_factor' over 'VaeImageProcessor's config object instead, e.g. 'scheduler.config.vae_scale_factor'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)

2024-04-10 00:33:56,689- py.warnings:109- WARNING- G:\BaiduNetdiskDownload\MuseV-240404\MuseV\musev\models\unet_3d_condition.py:910: FutureWarning: Accessing config attribute temporal_transformer directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'temporal_transformer' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.temporal_transformer'.
if self.temporal_transformer is not None:

2024-04-10 00:33:56,689- py.warnings:109- WARNING- G:\BaiduNetdiskDownload\MuseV-240404\MuseV\musev\models\unet_3d_condition.py:931: FutureWarning: Accessing config attribute temporal_transformer directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'temporal_transformer' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.temporal_transformer'.
if self.temporal_transformer is not None:

2024-04-10 00:35:39,240- py.warnings:109- WARNING- G:\BaiduNetdiskDownload\MuseV-240404\MuseV\diffusers\src\diffusers\pipelines\controlnet\pipeline_controlnet_img2img.py:485: FutureWarning: The decode_latents method is deprecated and will be removed in 1.0.0. Please use VaeImageProcessor.postprocess(...) instead
deprecate("decode_latents", "1.0.0", deprecation_message, standard_warn=False)

out_videos.shape (1, 3, 12, 448, 256)
G:\BaiduNetdiskDownload\MuseV-240404\MuseV\scripts\gradio\results\vm2v_m=majicmixRealv6Fp16_rm=musev_referencenet_c=cluslpl9j000248unfxzvkihc_w=256_h=448_t=12_n=1_vn=10_w=0.001_w=0.5_s=13436546_n=dwpose_body_hand_s=0.8_g=7.5_vs=1.0_vg=3.5_p=1ec06_V2_r=clu_ip=clu_f=no.mp4: No such file or directory
Traceback (most recent call last):
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio_ffmpeg_io.py", line 630, in write_frames
p.stdin.write(bb)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\utils.py", line 641, in wrapper
response = f(*args, **kwargs)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\gradio\utils.py", line 641, in wrapper
response = f(*args, **kwargs)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV\scripts\gradio\gradio_video2video.py", line 1013, in online_v2v_inference
save_videos_grid_with_opencv(
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV\musev\utils\util.py", line 230, in save_videos_grid_with_opencv
imageio.mimsave(path, outputs, **params)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\v2.py", line 495, in mimwrite
return file.write(ims, is_batch=True, **kwargs)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\core\legacy_plugin_wrapper.py", line 253, in write
writer.append_data(image, metadata)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\core\format.py", line 590, in append_data
return self._append_data(im, total_meta)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio\plugins\ffmpeg.py", line 600, in _append_data
self._write_gen.send(im)
File "G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio_ffmpeg_io.py", line 637, in write_frames
raise IOError(msg)
OSError: [Errno 32] Broken pipe

FFMPEG COMMAND:
G:\BaiduNetdiskDownload\MuseV-240404\MuseV.glut\lib\site-packages\imageio_ffmpeg\binaries\ffmpeg-win64-v4.2.2.exe -y -f rawvideo -vcodec rawvideo -s 256x448 -pix_fmt rgb24 -r 6.00 -i - -an -vcodec libx264 -pix_fmt yuv420p -crf 5 -v warning G:\BaiduNetdiskDownload\MuseV-240404\MuseV\scripts\gradio\results\vm2v_m=majicmixRealv6Fp16_rm=musev_referencenet_c=cluslpl9j000248unfxzvkihc_w=256_h=448_t=12_n=1_vn=10_w=0.001_w=0.5_s=13436546_n=dwpose_body_hand_s=0.8_g=7.5_vs=1.0_vg=3.5_p=1ec06_V2_r=clu_ip=clu_f=no.mp4

FFMPEG STDERR OUTPUT:

Cuda Deprecation Notice

I get this notice when I try to use Docker.

2024-04-02 09:51:12 2024-04-02 09:51:12 ========== 2024-04-02 09:51:12 == CUDA == 2024-04-02 09:51:12 ========== 2024-04-02 09:51:12 2024-04-02 09:51:12 CUDA Version 11.7.0 2024-04-02 09:51:12 2024-04-02 09:51:12 Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-04-02 09:51:12 2024-04-02 09:51:12 This container image and its contents are governed by the NVIDIA Deep Learning Container License. 2024-04-02 09:51:12 By pulling and using the container, you accept the terms and conditions of this license: 2024-04-02 09:51:12 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license 2024-04-02 09:51:12 2024-04-02 09:51:12 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. 2024-04-02 09:51:12 2024-04-02 09:51:12 WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. 2024-04-02 09:51:12 Use the NVIDIA Container Toolkit to start this container with GPU support; see 2024-04-02 09:51:12 https://docs.nvidia.com/datacenter/cloud-native/ . 2024-04-02 09:51:12 2024-04-02 09:51:12 ************************* 2024-04-02 09:51:12 ** DEPRECATION NOTICE! ** 2024-04-02 09:51:12 ************************* 2024-04-02 09:51:12 THIS IMAGE IS DEPRECATED and is scheduled for DELETION. 2024-04-02 09:51:12 https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

mmcm issues despite being installed properly (running through docker)

Hi! First I just want to say, thank you for making this opensource and doing such an amazing job with this.

I'm having no problems with text2video, but with video2video I'm having problems. Any of the video2video examples in the github readme don't work.

I'm running the example: python scripts/inference/video2video.py --sd_model_name fantasticmix_v10 --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet -test_data_path ./configs/tasks/example.yaml --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --output_dir ./output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas dance1 --fps 12 --time_size 12

Heres' the full error:

(musev) root@10703020883e:/workspace/MuseV# python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-04-05 20:19:43.562604: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-05 20:19:43.584943: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 20:19:44.056965: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/workspace/MuseV/diffusers/src/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
args
{'add_static_video_prompt': False,
 'context_batch_size': 1,
 'context_frames': 12,
 'context_overlap': 4,
 'context_schedule': 'uniform_v2',
 'context_stride': 1,
 'controlnet_conditioning_scale': 1.0,
 'controlnet_name': 'dwpose_body_hand',
 'cross_attention_dim': 768,
 'enable_zero_snr': False,
 'end_to_end': True,
 'face_image_path': None,
 'facein_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/facein.py',
 'facein_model_name': None,
 'facein_scale': 1.0,
 'fix_condition_images': False,
 'fixed_ip_adapter_image': True,
 'fixed_refer_face_image': True,
 'fixed_refer_image': True,
 'fps': 12,
 'guidance_scale': 7.5,
 'height': None,
 'img_length_ratio': 1.0,
 'img_weight': 0.001,
 'interpolation_factor': 1,
 'ip_adapter_face_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_face_model_name': None,
 'ip_adapter_face_scale': 1.0,
 'ip_adapter_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_model_name': 'musev_referencenet',
 'ip_adapter_scale': 1.0,
 'ipadapter_image_path': None,
 'lcm_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/lcm_model.py',
 'lcm_model_name': None,
 'log_level': 'INFO',
 'motion_speed': 8.0,
 'n_batch': 1,
 'n_cols': 3,
 'n_repeat': 1,
 'n_vision_condition': 1,
 'need_hist_match': False,
 'need_img_based_video_noise': True,
 'need_return_condition': False,
 'need_return_videos': False,
 'need_video2video': False,
 'negative_prompt': 'V2',
 'negprompt_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/negative_prompt.py',
 'noise_type': 'video_fusion',
 'num_inference_steps': 30,
 'output_dir': './output',
 'overwrite': False,
 'pose_guider_model_path': None,
 'prompt_only_use_image_prompt': False,
 'record_mid_video_latents': False,
 'record_mid_video_noises': False,
 'redraw_condition_image': False,
 'redraw_condition_image_with_facein': True,
 'redraw_condition_image_with_ip_adapter_face': True,
 'redraw_condition_image_with_ipdapter': True,
 'redraw_condition_image_with_referencenet': True,
 'referencenet_image_path': None,
 'referencenet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/referencenet.py',
 'referencenet_model_name': 'musev_referencenet',
 'sample_rate': 1,
 'save_filetype': 'mp4',
 'save_images': False,
 'sd_model_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/T2I_all_model.py',
 'sd_model_name': 'fantasticmix_v10',
 'seed': None,
 'strength': 0.8,
 'target_datas': 'dance1',
 'test_data_path': './configs/tasks/example.yaml',
 'time_size': 12,
 'unet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/motion_model.py',
 'unet_model_name': 'musev_referencenet',
 'use_condition_image': True,
 'vae_model_path': './checkpoints/vae/sd-vae-ft-mse',
 'video_guidance_scale': 3.5,
 'video_guidance_scale_end': None,
 'video_guidance_scale_method': 'linear',
 'video_has_condition': True,
 'video_is_middle': False,
 'video_negative_prompt': 'V2',
 'video_num_inference_steps': 10,
 'video_overlap': 1,
 'video_strength': 1.0,
 'vision_clip_extractor_class_name': 'ImageClipVisionFeatureExtractor',
 'vision_clip_model_path': './checkpoints/IP-Adapter/models/image_encoder',
 'w_ind_noise': 0.5,
 'which2video': 'video_middle',
 'width': None,
 'write_info': False}


running model, T2I SD
{'fantasticmix_v10': {'sd': '/workspace/MuseV/configs/model/../../checkpoints/t2i/sd1.5/fantasticmix_v10'}}
lcm:  None None
unet_model_params_dict_src dict_keys(['musev', 'musev_referencenet', 'musev_referencenet_pose'])
unet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
referencenet_model_params_dict_src dict_keys(['musev_referencenet'])
referencenet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
ip_adapter_model_params_dict_src dict_keys(['IPAdapter', 'IPAdapterPlus', 'IPAdapterPlus-face', 'IPAdapterFaceID', 'musev_referencenet', 'musev_referencenet_pose'])
ip_adapter:  musev_referencenet {'ip_image_encoder': '/workspace/MuseV/configs/model/../../checkpoints/IP-Adapter/image_encoder', 'ip_ckpt': '/workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet/ip_adapter_image_proj.bin', 'ip_scale': 1.0, 'clip_extra_context_tokens': 4, 'clip_embeddings_dim': 1024, 'desp': ''}
facein:  None None
ip_adapter_face:  None None
video_negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
n_test_datas 1
2024-04-05 20:19:51,813- musev:997- INFO- vision_clip_extractor, name=ImageClipVisionFeatureExtractor, path=./checkpoints/IP-Adapter/models/image_encoder
test_model_vae_model_path ./checkpoints/vae/sd-vae-ft-mse
Traceback (most recent call last):
  File "/workspace/MuseV/scripts/inference/video2video.py", line 1102, in <module>
    sd_predictor = DiffusersPipelinePredictor(
  File "/workspace/MuseV/musev/pipelines/pipeline_controlnet_predictor.py", line 165, in __init__
    controlnet, controlnet_processor, processor_params = load_controlnet_model(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 856, in load_controlnet_model
    controlnet_processor = ControlnetProcessor(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 71, in __init__
    self.processor = processor_cls()
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/__init__.py", line 141, in __init__
    self.pose_estimation = Wholebody(
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/wholebody.py", line 53, in __init__
    self.detector = init_detector(det_config, det_ckpt, device=device)
NameError: name 'init_detector' is not defined

The error tells me there's no variable called "init_detector", so when we look into the code, it's defined here:
/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/wholebody.py

This line:

    from mmdet.apis import inference_detector, init_detector
except ImportError:
    warnings.warn(
        "The module 'mmdet' is not installed. The package will have limited functionality. Please install it using the command: mim install 'mmdet>=3.1.0'"
    )

Interestingly, I don't get a warning that "mmcv" is not installed, as you'd expect. But, the script believes that init_detector is not defined. So, I took out the line from mmdet.apis import inference_detector, init_detector out of the try statement to see what happened.

I still get an error, but a different one:

(musev) root@10703020883e:/workspace/MuseV# python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-04-05 20:26:05.951185: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-05 20:26:05.973544: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 20:26:06.415460: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/workspace/MuseV/diffusers/src/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
args
{'add_static_video_prompt': False,
 'context_batch_size': 1,
 'context_frames': 12,
 'context_overlap': 4,
 'context_schedule': 'uniform_v2',
 'context_stride': 1,
 'controlnet_conditioning_scale': 1.0,
 'controlnet_name': 'dwpose_body_hand',
 'cross_attention_dim': 768,
 'enable_zero_snr': False,
 'end_to_end': True,
 'face_image_path': None,
 'facein_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/facein.py',
 'facein_model_name': None,
 'facein_scale': 1.0,
 'fix_condition_images': False,
 'fixed_ip_adapter_image': True,
 'fixed_refer_face_image': True,
 'fixed_refer_image': True,
 'fps': 12,
 'guidance_scale': 7.5,
 'height': None,
 'img_length_ratio': 1.0,
 'img_weight': 0.001,
 'interpolation_factor': 1,
 'ip_adapter_face_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_face_model_name': None,
 'ip_adapter_face_scale': 1.0,
 'ip_adapter_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_model_name': 'musev_referencenet',
 'ip_adapter_scale': 1.0,
 'ipadapter_image_path': None,
 'lcm_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/lcm_model.py',
 'lcm_model_name': None,
 'log_level': 'INFO',
 'motion_speed': 8.0,
 'n_batch': 1,
 'n_cols': 3,
 'n_repeat': 1,
 'n_vision_condition': 1,
 'need_hist_match': False,
 'need_img_based_video_noise': True,
 'need_return_condition': False,
 'need_return_videos': False,
 'need_video2video': False,
 'negative_prompt': 'V2',
 'negprompt_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/negative_prompt.py',
 'noise_type': 'video_fusion',
 'num_inference_steps': 30,
 'output_dir': './output',
 'overwrite': False,
 'pose_guider_model_path': None,
 'prompt_only_use_image_prompt': False,
 'record_mid_video_latents': False,
 'record_mid_video_noises': False,
 'redraw_condition_image': False,
 'redraw_condition_image_with_facein': True,
 'redraw_condition_image_with_ip_adapter_face': True,
 'redraw_condition_image_with_ipdapter': True,
 'redraw_condition_image_with_referencenet': True,
 'referencenet_image_path': None,
 'referencenet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/referencenet.py',
 'referencenet_model_name': 'musev_referencenet',
 'sample_rate': 1,
 'save_filetype': 'mp4',
 'save_images': False,
 'sd_model_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/T2I_all_model.py',
 'sd_model_name': 'fantasticmix_v10',
 'seed': None,
 'strength': 0.8,
 'target_datas': 'dance1',
 'test_data_path': './configs/tasks/example.yaml',
 'time_size': 12,
 'unet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/motion_model.py',
 'unet_model_name': 'musev_referencenet',
 'use_condition_image': True,
 'vae_model_path': './checkpoints/vae/sd-vae-ft-mse',
 'video_guidance_scale': 3.5,
 'video_guidance_scale_end': None,
 'video_guidance_scale_method': 'linear',
 'video_has_condition': True,
 'video_is_middle': False,
 'video_negative_prompt': 'V2',
 'video_num_inference_steps': 10,
 'video_overlap': 1,
 'video_strength': 1.0,
 'vision_clip_extractor_class_name': 'ImageClipVisionFeatureExtractor',
 'vision_clip_model_path': './checkpoints/IP-Adapter/models/image_encoder',
 'w_ind_noise': 0.5,
 'which2video': 'video_middle',
 'width': None,
 'write_info': False}


running model, T2I SD
{'fantasticmix_v10': {'sd': '/workspace/MuseV/configs/model/../../checkpoints/t2i/sd1.5/fantasticmix_v10'}}
lcm:  None None
unet_model_params_dict_src dict_keys(['musev', 'musev_referencenet', 'musev_referencenet_pose'])
unet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
referencenet_model_params_dict_src dict_keys(['musev_referencenet'])
referencenet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
ip_adapter_model_params_dict_src dict_keys(['IPAdapter', 'IPAdapterPlus', 'IPAdapterPlus-face', 'IPAdapterFaceID', 'musev_referencenet', 'musev_referencenet_pose'])
ip_adapter:  musev_referencenet {'ip_image_encoder': '/workspace/MuseV/configs/model/../../checkpoints/IP-Adapter/image_encoder', 'ip_ckpt': '/workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet/ip_adapter_image_proj.bin', 'ip_scale': 1.0, 'clip_extra_context_tokens': 4, 'clip_embeddings_dim': 1024, 'desp': ''}
facein:  None None
ip_adapter_face:  None None
video_negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
n_test_datas 1
2024-04-05 20:26:13,677- musev:997- INFO- vision_clip_extractor, name=ImageClipVisionFeatureExtractor, path=./checkpoints/IP-Adapter/models/image_encoder
test_model_vae_model_path ./checkpoints/vae/sd-vae-ft-mse
Traceback (most recent call last):
  File "/workspace/MuseV/scripts/inference/video2video.py", line 1102, in <module>
    sd_predictor = DiffusersPipelinePredictor(
  File "/workspace/MuseV/musev/pipelines/pipeline_controlnet_predictor.py", line 165, in __init__
    controlnet, controlnet_processor, processor_params = load_controlnet_model(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 856, in load_controlnet_model
    controlnet_processor = ControlnetProcessor(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 71, in __init__
    self.processor = processor_cls()
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/__init__.py", line 139, in __init__
    from .wholebody import Wholebody
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/wholebody.py", line 24, in <module>
    from mmdet.apis import inference_detector, init_detector
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/apis/__init__.py", line 2, in <module>
    from .det_inferencer import DetInferencer
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/apis/det_inferencer.py", line 22, in <module>
    from mmdet.evaluation import INSTANCE_OFFSET
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/evaluation/__init__.py", line 3, in <module>
    from .metrics import *  # noqa: F401,F403
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/evaluation/metrics/__init__.py", line 5, in <module>
    from .coco_metric import CocoMetric
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/evaluation/metrics/coco_metric.py", line 16, in <module>
    from mmdet.datasets.api_wrappers import COCO, COCOeval, COCOevalMP
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/__init__.py", line 26, in <module>
    from .utils import get_loading_pipeline
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/utils.py", line 5, in <module>
    from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/transforms/__init__.py", line 6, in <module>
    from .formatting import (ImageToTensor, PackDetInputs, PackReIDInputs,
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/transforms/formatting.py", line 11, in <module>
    from mmdet.structures.bbox import BaseBoxes
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/bbox/__init__.py", line 2, in <module>
    from .base_boxes import BaseBoxes
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/bbox/base_boxes.py", line 9, in <module>
    from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/mask/__init__.py", line 3, in <module>
    from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/mask/structures.py", line 12, in <module>
    from mmcv.ops.roi_align import roi_align
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/ops/__init__.py", line 3, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/opt/conda/envs/musev/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE
(musev) root@10703020883e:/workspace/MuseV# python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-04-05 20:28:21.233723: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-05 20:28:21.261046: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 20:28:21.719853: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/workspace/MuseV/diffusers/src/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
args
{'add_static_video_prompt': False,
 'context_batch_size': 1,
 'context_frames': 12,
 'context_overlap': 4,
 'context_schedule': 'uniform_v2',
 'context_stride': 1,
 'controlnet_conditioning_scale': 1.0,
 'controlnet_name': 'dwpose_body_hand',
 'cross_attention_dim': 768,
 'enable_zero_snr': False,
 'end_to_end': True,
 'face_image_path': None,
 'facein_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/facein.py',
 'facein_model_name': None,
 'facein_scale': 1.0,
 'fix_condition_images': False,
 'fixed_ip_adapter_image': True,
 'fixed_refer_face_image': True,
 'fixed_refer_image': True,
 'fps': 12,
 'guidance_scale': 7.5,
 'height': None,
 'img_length_ratio': 1.0,
 'img_weight': 0.001,
 'interpolation_factor': 1,
 'ip_adapter_face_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_face_model_name': None,
 'ip_adapter_face_scale': 1.0,
 'ip_adapter_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_model_name': 'musev_referencenet',
 'ip_adapter_scale': 1.0,
 'ipadapter_image_path': None,
 'lcm_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/lcm_model.py',
 'lcm_model_name': None,
 'log_level': 'INFO',
 'motion_speed': 8.0,
 'n_batch': 1,
 'n_cols': 3,
 'n_repeat': 1,
 'n_vision_condition': 1,
 'need_hist_match': False,
 'need_img_based_video_noise': True,
 'need_return_condition': False,
 'need_return_videos': False,
 'need_video2video': False,
 'negative_prompt': 'V2',
 'negprompt_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/negative_prompt.py',
 'noise_type': 'video_fusion',
 'num_inference_steps': 30,
 'output_dir': './output',
 'overwrite': False,
 'pose_guider_model_path': None,
 'prompt_only_use_image_prompt': False,
 'record_mid_video_latents': False,
 'record_mid_video_noises': False,
 'redraw_condition_image': False,
 'redraw_condition_image_with_facein': True,
 'redraw_condition_image_with_ip_adapter_face': True,
 'redraw_condition_image_with_ipdapter': True,
 'redraw_condition_image_with_referencenet': True,
 'referencenet_image_path': None,
 'referencenet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/referencenet.py',
 'referencenet_model_name': 'musev_referencenet',
 'sample_rate': 1,
 'save_filetype': 'mp4',
 'save_images': False,
 'sd_model_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/T2I_all_model.py',
 'sd_model_name': 'fantasticmix_v10',
 'seed': None,
 'strength': 0.8,
 'target_datas': 'dance1',
 'test_data_path': './configs/tasks/example.yaml',
 'time_size': 12,
 'unet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/motion_model.py',
 'unet_model_name': 'musev_referencenet',
 'use_condition_image': True,
 'vae_model_path': './checkpoints/vae/sd-vae-ft-mse',
 'video_guidance_scale': 3.5,
 'video_guidance_scale_end': None,
 'video_guidance_scale_method': 'linear',
 'video_has_condition': True,
 'video_is_middle': False,
 'video_negative_prompt': 'V2',
 'video_num_inference_steps': 10,
 'video_overlap': 1,
 'video_strength': 1.0,
 'vision_clip_extractor_class_name': 'ImageClipVisionFeatureExtractor',
 'vision_clip_model_path': './checkpoints/IP-Adapter/models/image_encoder',
 'w_ind_noise': 0.5,
 'which2video': 'video_middle',
 'width': None,
 'write_info': False}


running model, T2I SD
{'fantasticmix_v10': {'sd': '/workspace/MuseV/configs/model/../../checkpoints/t2i/sd1.5/fantasticmix_v10'}}
lcm:  None None
unet_model_params_dict_src dict_keys(['musev', 'musev_referencenet', 'musev_referencenet_pose'])
unet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
referencenet_model_params_dict_src dict_keys(['musev_referencenet'])
referencenet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
ip_adapter_model_params_dict_src dict_keys(['IPAdapter', 'IPAdapterPlus', 'IPAdapterPlus-face', 'IPAdapterFaceID', 'musev_referencenet', 'musev_referencenet_pose'])
ip_adapter:  musev_referencenet {'ip_image_encoder': '/workspace/MuseV/configs/model/../../checkpoints/IP-Adapter/image_encoder', 'ip_ckpt': '/workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet/ip_adapter_image_proj.bin', 'ip_scale': 1.0, 'clip_extra_context_tokens': 4, 'clip_embeddings_dim': 1024, 'desp': ''}
facein:  None None
ip_adapter_face:  None None
video_negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
n_test_datas 1
2024-04-05 20:28:28,992- musev:997- INFO- vision_clip_extractor, name=ImageClipVisionFeatureExtractor, path=./checkpoints/IP-Adapter/models/image_encoder
test_model_vae_model_path ./checkpoints/vae/sd-vae-ft-mse
Traceback (most recent call last):
  File "/workspace/MuseV/scripts/inference/video2video.py", line 1102, in <module>
    sd_predictor = DiffusersPipelinePredictor(
  File "/workspace/MuseV/musev/pipelines/pipeline_controlnet_predictor.py", line 165, in __init__
    controlnet, controlnet_processor, processor_params = load_controlnet_model(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 856, in load_controlnet_model
    controlnet_processor = ControlnetProcessor(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 71, in __init__
    self.processor = processor_cls()
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/__init__.py", line 139, in __init__
    from .wholebody import Wholebody
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/wholebody.py", line 13, in <module>
    from mmpose.apis import inference_topdown
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmpose/apis/__init__.py", line 2, in <module>
    from .inference import (collect_multi_frames, inference_bottomup,
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmpose/apis/inference.py", line 17, in <module>
    from mmpose.models.builder import build_pose_estimator
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmpose/models/__init__.py", line 8, in <module>
    from .heads import *  # noqa
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmpose/models/heads/__init__.py", line 11, in <module>
    from .transformer_heads import EDPoseHead
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmpose/models/heads/transformer_heads/__init__.py", line 2, in <module>
    from .edpose_head import EDPoseHead
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmpose/models/heads/transformer_heads/edpose_head.py", line 14, in <module>
    from mmcv.ops import MultiScaleDeformableAttention
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/ops/__init__.py", line 3, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/opt/conda/envs/musev/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE
(musev) root@10703020883e:/workspace/MuseV# python scripts/inference/video2video.py --sd_model_name fantasticmix_v10  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-04-05 20:40:39.017376: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-05 20:40:39.041670: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 20:40:39.480456: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/opt/conda/envs/musev/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/workspace/MuseV/diffusers/src/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
args
{'add_static_video_prompt': False,
 'context_batch_size': 1,
 'context_frames': 12,
 'context_overlap': 4,
 'context_schedule': 'uniform_v2',
 'context_stride': 1,
 'controlnet_conditioning_scale': 1.0,
 'controlnet_name': 'dwpose_body_hand',
 'cross_attention_dim': 768,
 'enable_zero_snr': False,
 'end_to_end': True,
 'face_image_path': None,
 'facein_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/facein.py',
 'facein_model_name': None,
 'facein_scale': 1.0,
 'fix_condition_images': False,
 'fixed_ip_adapter_image': True,
 'fixed_refer_face_image': True,
 'fixed_refer_image': True,
 'fps': 12,
 'guidance_scale': 7.5,
 'height': None,
 'img_length_ratio': 1.0,
 'img_weight': 0.001,
 'interpolation_factor': 1,
 'ip_adapter_face_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_face_model_name': None,
 'ip_adapter_face_scale': 1.0,
 'ip_adapter_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/ip_adapter.py',
 'ip_adapter_model_name': 'musev_referencenet',
 'ip_adapter_scale': 1.0,
 'ipadapter_image_path': None,
 'lcm_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/lcm_model.py',
 'lcm_model_name': None,
 'log_level': 'INFO',
 'motion_speed': 8.0,
 'n_batch': 1,
 'n_cols': 3,
 'n_repeat': 1,
 'n_vision_condition': 1,
 'need_hist_match': False,
 'need_img_based_video_noise': True,
 'need_return_condition': False,
 'need_return_videos': False,
 'need_video2video': False,
 'negative_prompt': 'V2',
 'negprompt_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/negative_prompt.py',
 'noise_type': 'video_fusion',
 'num_inference_steps': 30,
 'output_dir': './output',
 'overwrite': False,
 'pose_guider_model_path': None,
 'prompt_only_use_image_prompt': False,
 'record_mid_video_latents': False,
 'record_mid_video_noises': False,
 'redraw_condition_image': False,
 'redraw_condition_image_with_facein': True,
 'redraw_condition_image_with_ip_adapter_face': True,
 'redraw_condition_image_with_ipdapter': True,
 'redraw_condition_image_with_referencenet': True,
 'referencenet_image_path': None,
 'referencenet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/referencenet.py',
 'referencenet_model_name': 'musev_referencenet',
 'sample_rate': 1,
 'save_filetype': 'mp4',
 'save_images': False,
 'sd_model_cfg_path': '/workspace/MuseV/scripts/inference/../../configs/model/T2I_all_model.py',
 'sd_model_name': 'fantasticmix_v10',
 'seed': None,
 'strength': 0.8,
 'target_datas': 'dance1',
 'test_data_path': './configs/tasks/example.yaml',
 'time_size': 12,
 'unet_model_cfg_path': '/workspace/MuseV/scripts/inference/../.././configs/model/motion_model.py',
 'unet_model_name': 'musev_referencenet',
 'use_condition_image': True,
 'vae_model_path': './checkpoints/vae/sd-vae-ft-mse',
 'video_guidance_scale': 3.5,
 'video_guidance_scale_end': None,
 'video_guidance_scale_method': 'linear',
 'video_has_condition': True,
 'video_is_middle': False,
 'video_negative_prompt': 'V2',
 'video_num_inference_steps': 10,
 'video_overlap': 1,
 'video_strength': 1.0,
 'vision_clip_extractor_class_name': 'ImageClipVisionFeatureExtractor',
 'vision_clip_model_path': './checkpoints/IP-Adapter/models/image_encoder',
 'w_ind_noise': 0.5,
 'which2video': 'video_middle',
 'width': None,
 'write_info': False}


running model, T2I SD
{'fantasticmix_v10': {'sd': '/workspace/MuseV/configs/model/../../checkpoints/t2i/sd1.5/fantasticmix_v10'}}
lcm:  None None
unet_model_params_dict_src dict_keys(['musev', 'musev_referencenet', 'musev_referencenet_pose'])
unet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
referencenet_model_params_dict_src dict_keys(['musev_referencenet'])
referencenet:  musev_referencenet /workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet
ip_adapter_model_params_dict_src dict_keys(['IPAdapter', 'IPAdapterPlus', 'IPAdapterPlus-face', 'IPAdapterFaceID', 'musev_referencenet', 'musev_referencenet_pose'])
ip_adapter:  musev_referencenet {'ip_image_encoder': '/workspace/MuseV/configs/model/../../checkpoints/IP-Adapter/image_encoder', 'ip_ckpt': '/workspace/MuseV/configs/model/../../checkpoints/motion/musev_referencenet/ip_adapter_image_proj.bin', 'ip_scale': 1.0, 'clip_extra_context_tokens': 4, 'clip_embeddings_dim': 1024, 'desp': ''}
facein:  None None
ip_adapter_face:  None None
video_negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
negprompt V2 badhandv4, ng_deepnegative_v1_75t, (((multiple heads))), (((bad body))), (((two people))), ((extra arms)), ((deformed body)), (((sexy))), paintings,(((two heads))), ((big head)),sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, (((nsfw))), nipples, extra fingers, (extra legs), (long neck), mutated hands, (fused fingers), (too many fingers)
n_test_datas 1
2024-04-05 20:40:46,818- musev:997- INFO- vision_clip_extractor, name=ImageClipVisionFeatureExtractor, path=./checkpoints/IP-Adapter/models/image_encoder
test_model_vae_model_path ./checkpoints/vae/sd-vae-ft-mse
Traceback (most recent call last):
  File "/workspace/MuseV/scripts/inference/video2video.py", line 1102, in <module>
    sd_predictor = DiffusersPipelinePredictor(
  File "/workspace/MuseV/musev/pipelines/pipeline_controlnet_predictor.py", line 165, in __init__
    controlnet, controlnet_processor, processor_params = load_controlnet_model(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 856, in load_controlnet_model
    controlnet_processor = ControlnetProcessor(
  File "/workspace/MuseV/MMCM/mmcm/vision/feature_extractor/controlnet.py", line 71, in __init__
    self.processor = processor_cls()
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/__init__.py", line 139, in __init__
    from .wholebody import Wholebody
  File "/workspace/MuseV/controlnet_aux/src/controlnet_aux/dwpose/wholebody.py", line 25, in <module>
    from mmdet.apis import inference_detector, init_detector
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/apis/__init__.py", line 2, in <module>
    from .det_inferencer import DetInferencer
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/apis/det_inferencer.py", line 22, in <module>
    from mmdet.evaluation import INSTANCE_OFFSET
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/evaluation/__init__.py", line 3, in <module>
    from .metrics import *  # noqa: F401,F403
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/evaluation/metrics/__init__.py", line 5, in <module>
    from .coco_metric import CocoMetric
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/evaluation/metrics/coco_metric.py", line 16, in <module>
    from mmdet.datasets.api_wrappers import COCO, COCOeval, COCOevalMP
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/__init__.py", line 26, in <module>
    from .utils import get_loading_pipeline
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/utils.py", line 5, in <module>
    from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/transforms/__init__.py", line 6, in <module>
    from .formatting import (ImageToTensor, PackDetInputs, PackReIDInputs,
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/datasets/transforms/formatting.py", line 11, in <module>
    from mmdet.structures.bbox import BaseBoxes
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/bbox/__init__.py", line 2, in <module>
    from .base_boxes import BaseBoxes
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/bbox/base_boxes.py", line 9, in <module>
    from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/mask/__init__.py", line 3, in <module>
    from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmdet/structures/mask/structures.py", line 12, in <module>
    from mmcv.ops.roi_align import roi_align
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/ops/__init__.py", line 3, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/opt/conda/envs/musev/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /opt/conda/envs/musev/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE

Any help would be appreciated!

关于checkpoints

你好,想问下windows一键整合包里的checkpoints 能直接放在docker里面使用么,服务器环境不太方便从huggingface下载

Gradio failed: login to server failed: dial tcp 44.237.78.176:7000: i/o timeout

cd scripts/gradio
python app.py

Running on local URL: http://0.0.0.0:7860
2024-04-12 16:32:30,732- httpx:1026- INFO- HTTP Request: GET http://localhost:7860/startup-events "HTTP/1.1 200 OK"
2024-04-12 16:32:30,746- httpx:1026- INFO- HTTP Request: HEAD http://localhost:7860/ "HTTP/1.1 200 OK"
2024-04-12 16:32:31,389- httpx:1026- INFO- HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
2024-04-12 16:32:31,849- httpx:1026- INFO- HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-04-12 16:32:32,460- httpx:1026- INFO- HTTP Request: POST https://api.gradio.app/gradio-initiated-analytics/ "HTTP/1.1 200 OK"
2024-04-12 16:32:42,914- httpx:1026- INFO- HTTP Request: GET https://api.gradio.app/v2/tunnel-request "HTTP/1.1 200 OK"
2024/04/12 16:32:52 [W] [service.go:132] login to server failed: dial tcp 44.237.78.176:7000: i/o timeout

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
2024-04-12 16:32:53,938- httpx:1026- INFO- HTTP Request: POST https://api.gradio.app/gradio-error-analytics/ "HTTP/1.1 200 OK"
2024-04-12 16:32:53,938- httpx:1026- INFO- HTTP Request: POST https://api.gradio.app/gradio-launched-telemetry/ "HTTP/1.1 200 OK"

I ran it in docker, and other demos ran normally. Only the demo of Gradio encountered an error and could not open the web page normally.

Plz some one help me!

ip_adapter_faceid

Traceback (most recent call last):
File "G:\AI\ZHB\MuseV\scripts\gradio\app.py", line 32, in
from gradio_video2video import online_v2v_inference
File "G:\AI\ZHB\MuseV\scripts\gradio\gradio_video2video.py", line 36, in
from musev.models.ip_adapter_face_loader import (
File "G:\AI\ZHB\MuseV\musev\models\ip_adapter_face_loader.py", line 38, in
from ip_adapter.ip_adapter_faceid import ProjPlusModel, MLPProjModel
ImportError: cannot import name 'ProjPlusModel' from 'ip_adapter.ip_adapter_faceid' (G:\AI\ZHB\MuseV\env\lib\site-packages\ip_adapter\ip_adapter_faceid.py)

cannot import name 'ForkProcess' from 'multiprocessing.context'

The latest main branch will report this error when starting. How to solve it?

Traceback (most recent call last):
File "G:\AI\ZHB\MuseV\scripts\gradio\app.py", line 7, in
import spaces
File "G:\AI\ZHB\MuseV\env\lib\site-packages\spaces_init_.py", line 10, in
from .zero.decorator import GPU
File "G:\AI\ZHB\MuseV\env\lib\site-packages\spaces\zero\decorator.py", line 21, in
from .wrappers import regular_function_wrapper
File "G:\AI\ZHB\MuseV\env\lib\site-packages\spaces\zero\wrappers.py", line 14, in
from multiprocessing.context import ForkProcess
ImportError: cannot import name 'ForkProcess' from 'multiprocessing.context' (G:\AI\ZHB\MuseV\env\lib\multiprocessing\context.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.