Giter Club home page Giter Club logo

Comments (4)

maxin-cn avatar maxin-cn commented on August 22, 2024

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test **第一个问题:**当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~~~~~~~^~~~~~~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python **第二个问题,**当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

from latte.

likeatingcake avatar likeatingcake commented on August 22, 2024

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test **第一个问题:**当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~~~~~~~^~~~~~~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python **第二个问题,**当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

你好,生成的视频并没有在对应文件夹中找到,好像一直在处理

from latte.

maxin-cn avatar maxin-cn commented on August 22, 2024

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test **第一个问题:**当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~~~~~~~^~~~~~~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python **第二个问题,**当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

你好,生成的视频并没有在对应文件夹中找到,好像一直在处理

Could you tell me what you set this parameter:

per_proc_batch_size: 2

from latte.

likeatingcake avatar likeatingcake commented on August 22, 2024

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test **第一个问题:**当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~~~~~~~^~~~~~~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python **第二个问题,**当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

你好,生成的视频并没有在对应文件夹中找到,好像一直在处理

Could you tell me what you set this parameter:

per_proc_batch_size: 2

ddp sample config

per_proc_batch_size: 2
num_fvd_samples: 2048
我没有修改配置文件的参数

from latte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.