Comments (16)
Hi, I tried to sample from the pre-trained LatteT2V model by running on CPU. But I have several errors during running the code.
Steps to reproduce the error
- modifying enviroment.yml to follow the requirement
- download the t2v.pt and whole folder from https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models and keep the same structure, name this folder as t2v, so we have t2v/scheduler .... t2v/model_index.json and t2v/t2v.pt
- In t2v.sample.yaml, let ckpt = "t2v/t2v.pt" and pretrained_model_path = "t2v"
- change the name of file transformer_config.json in t2v/transformer to config.json. Because I got RuntimeError: t2v\transformer\config.json does not exist in line 982 in latte_t2v.py, in from_pretrained_2d.
- I got RuntimeError: None does not exist in line 1000 in latte_t2v.py, in from_pretrained_2d. Since we store our diffusion_pytorch_model.safetensors in t2v/vae and there is no .bin file in t2v, there are no .safetensors and .bin files in t2v/transformer folder.
Should I move .safetensors file to t2v/transformer? Could you please review this part?
Thank you for your issue. I did not provide pixart-alpha
model weights within the https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/transformer, so you can follow this link to modify your code.
from latte.
Thank you for your help, I will update the result tomorrow
from latte.
Hi, after commenting line 988 to line 1015(which are from model_files= to model.load_state_dict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error
File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device)
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v.
Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?
Please make sure that the args.pretrained_model_path
contains the text_encoder
folder, which contains the checkpoints. use_fp16
in yaml is a deprecating option for text2video, and if you want to use ft32
for inference, change all torch.float16
in sample_t2v.py
to torch.float32
.
from latte.
Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one
from diffusers import PixArtAlphaPipeline
import torch
videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.
Thank you for not being annoyed by my dumb questions and for your meticulous helpHi, I also get oom error with T4. Did you find a good solution? Thanks.
Could you please provide more details? Thanks~
Thanks,The OOM error with nvidia T4,and the logs are as follows. Should I modify some config with model or use a big memory GPU such as A100?
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.73it/s] Traceback (most recent call last): File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 160, in <module> main(OmegaConf.load(args.config)) File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 36, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2595, in to return super().to(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 19.31 MiB is free. Process 14798 has 2.72 GiB memory in use. Process 37381 has 11.83 GiB memory in use. Of the allocated memory 10.90 GiB is allocated by PyTorch, and 293.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hi, I just tested the GPU memory requirements for the t2v inference. Inferencing on the A100 requires 20916MiB of GPU memory under fp16
precision mode.
from latte.
May I ask how long it takes to run t2v inference to generate a video on 80G A100? @maxin-cn thanks
About 30s to generate one video on 80G A100.
from latte.
When I use the A100 to generate one video, the quality of the generated video is not as good as the one shown in paper.
The quality of the generated video may be related to the seed that is initialized. The publicly available t2v model is our very early model and we are working on improving its stability and releasing a stable t2v model as soon as possible. Please stay tuned~
from latte.
Hi, after commenting line 988 to line 1015(which are from model_files= to model.load_state_dict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error
File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device)
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v.
Where should I find those missing files?
Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?
from latte.
Hi, after commenting line 988 to line 1015(which are from model_files= to model.load_state_dict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error
File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device)
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v.
Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?Please make sure that the
args.pretrained_model_path
contains thetext_encoder
folder, which contains the checkpoints.use_fp16
in yaml is a deprecating option for text2video, and if you want to useft32
for inference, change alltorch.float16
insample_t2v.py
totorch.float32
.
Maybe my question is too simple or dumb for you, because I'm new to this area. According to your Huggingface repo, I believe model-00001-of-00004.safetensors to model-00004-of-00004.safetensors are the checkpoints you mentioned above, but they don't follow the rules of naming given in the error information. I tried to rename them but it didn't work. So how should I rename the files or am I in the wrong way?
Also, I found that you mentioned t2v training is not supported right now in train.py, does it mean the text prompt training part is not available?
from latte.
Hi, after commenting line 988 to line 1015(which are from model_files= to model.load_state_dict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error
File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device)
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v.
Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?Please make sure that the
args.pretrained_model_path
contains thetext_encoder
folder, which contains the checkpoints.use_fp16
in yaml is a deprecating option for text2video, and if you want to useft32
for inference, change alltorch.float16
insample_t2v.py
totorch.float32
.Maybe my question is too simple or dumb for you, because I'm new to this area. According to your Huggingface repo, I believe model-00001-of-00004.safetensors to model-00004-of-00004.safetensors are the checkpoints you mentioned above, but they don't follow the rules of naming given in the error information. I tried to rename them but it didn't work. So how should I rename the files or am I in the wrong way? Also, I found that you mentioned t2v training is not supported right now in train.py, does it mean the text prompt training part is not available?
Please ensure the file structure of pretrained_model_path
is as follows:
├── pretrained_model_path
│ ├── scheduler
│ ├── text_encoder
│ ├── tokenizer
│ ├── transformer
│ ├── vae
You can also download the corresponding checkpoints from PixArt-alpha, and the model weights in t2v_required_models are also from PixArt-alpha.
You do not need to rename the name of the text encoder. If the text encoder loads successfully, the terminal will output the following:
trian.py
currently only supports training on four datasets: FaceForensics, SkyTimelapse, Taichi-HD, and UCF101.
from latte.
Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one
from diffusers import PixArtAlphaPipeline
import torch
videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.
Thank you for not being annoyed by my dumb questions and for your meticulous help
from latte.
Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one
from diffusers import PixArtAlphaPipeline
import torch
videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.Thank you for not being annoyed by my dumb questions and for your meticulous help
Hi, thank you for your feedback. Please see this issue.
from latte.
Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one
from diffusers import PixArtAlphaPipeline
import torch
videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.Thank you for not being annoyed by my dumb questions and for your meticulous help
Hi, I also get oom error with T4. Did you find a good solution? Thanks.
from latte.
Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one
from diffusers import PixArtAlphaPipeline
import torch
videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.
Thank you for not being annoyed by my dumb questions and for your meticulous helpHi, I also get oom error with T4. Did you find a good solution? Thanks.
Could you please provide more details? Thanks~
from latte.
Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one
from diffusers import PixArtAlphaPipeline
import torch
videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.
Thank you for not being annoyed by my dumb questions and for your meticulous helpHi, I also get oom error with T4. Did you find a good solution? Thanks.
Could you please provide more details? Thanks~
Thanks,The OOM error with nvidia T4,and the logs are as follows. Should I modify some config with model or use a big memory GPU such as A100?
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.73it/s] Traceback (most recent call last): File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 160, in <module> main(OmegaConf.load(args.config)) File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 36, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2595, in to return super().to(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 19.31 MiB is free. Process 14798 has 2.72 GiB memory in use. Process 37381 has 11.83 GiB memory in use. Of the allocated memory 10.90 GiB is allocated by PyTorch, and 293.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from latte.
May I ask how long it takes to run t2v inference to generate a video on 80G A100?
@maxin-cn thanks
from latte.
When I use the A100 to generate one video, the quality of the generated video is not as good as the one shown in paper.
from latte.
Related Issues (20)
- Train code of t2v? HOT 2
- 如何复现主页展示的t2v效果? HOT 3
- t2v只支持16帧吗?我改成更多比如32帧就啥都看不到了 HOT 1
- Implementation of compression frame patch embedding (Fig. 3b) HOT 2
- diffusion noise modify HOT 1
- About Training Speed HOT 3
- About resume checkpoint HOT 2
- Extra key in ucf101.pt HOT 7
- Why choose these datasets and why not compare with pika, SVD or Gen2? HOT 1
- trained and sample result very strange (我自己训练复现的效果很奇怪) HOT 20
- What is the difference between Latte and ViViT? HOT 2
- RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' HOT 1
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half' HOT 2
- image_size = [256,512] HOT 4
- CUDA out of memory HOT 4
- Evaluate the FVD? HOT 5
- Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: HOT 2
- FaceForensics数据集 HOT 3
- No positional embeddings in LatteT2V?
- Is autoregression possible? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from latte.