Giter Club home page Giter Club logo

latte's People

Contributors

alonzoleeeooo avatar eltociear avatar maxin-cn avatar tianyma avatar upper9527 avatar wyhsirius avatar xinyuanc91 avatar xszheng2020 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

latte's Issues

为什么会这样报错呢在运行sample.py模块的时候Traceback (most recent call last): File "C:\Users\Dell\Desktop\Project\Latte-main\sample\sample.py", line 29, in <module> from models import get_models File "C:\Users\Dell\Desktop\Project\Latte-main\models\__init__.py", line 7, in <module> from .latte_t2v import LatteT2V File "C:\Users\Dell\Desktop\Project\Latte-main\models\latte_t2v.py", line 11, in <module> from diffusers.models.embeddings import get_1d_sincos_pos_embed_from_grid, ImagePositionalEmbeddings, CaptionProjection, PatchEmbed, CombinedTimestepSizeEmbeddings ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

Traceback (most recent call last):
File "C:\Users\Dell\Desktop\Project\Latte-main\sample\sample.py", line 29, in
from models import get_models
File "C:\Users\Dell\Desktop\Project\Latte-main\models_init_.py", line 7, in
from .latte_t2v import LatteT2V
File "C:\Users\Dell\Desktop\Project\Latte-main\models\latte_t2v.py", line 11, in
from diffusers.models.embeddings import get_1d_sincos_pos_embed_from_grid, ImagePositionalEmbeddings, CaptionProjection, PatchEmbed, CombinedTimestepSizeEmbeddings
ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

Re-implementation err on ffs experiment

Good job, but I have some questions on ffs ckpt inference experiment.
1)I set "ckpt" in ffs.sh to the folder related to https://huggingface.co/maxin-cn/Latte/blob/main/ffs.pt", set "pretrained_model_path" to the folder related to https://huggingface.co/maxin-cn/Latte/tree/main/vae.
But the performance of video generation is bad. Is there anything wrong with my process?

sample.mp4

2)Besides, I edit the code in sample.py. If I keep the code "samples = vae.decode(samples / 0.18215).sample", I will get "Segmentation fault". Therefore, I replace the code with the following. Is there anything wrong with my process?
截屏2024-02-27 下午5 14 47

Question

I have a question, after making the setting for t2v, run t2v.sh, the generation is all noise,

Some errors when running the LatteT2v

Hi, I tried to sample from the pre-trained LatteT2V model by running on CPU. But I have several errors during running the code.

Steps to reproduce the error

  1. modifying enviroment.yml to follow the requirement
  2. download the t2v.pt and whole folder from https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models and keep the same structure, name this folder as t2v, so we have t2v/scheduler .... t2v/model_index.json and t2v/t2v.pt
  3. In t2v.sample.yaml, let ckpt = "t2v/t2v.pt" and pretrained_model_path = "t2v"
  4. change the name of file transformer_config.json in t2v/transformer to config.json. Because I got RuntimeError: t2v\transformer\config.json does not exist in line 982 in latte_t2v.py, in from_pretrained_2d.
  5. I got RuntimeError: None does not exist in line 1000 in latte_t2v.py, in from_pretrained_2d. Since we store our diffusion_pytorch_model.safetensors in t2v/vae and there is no .bin file in t2v, there are no .safetensors and .bin files in t2v/transformer folder.

Should I move .safetensors file to t2v/transformer? Could you please review this part?

GPU Memory cost

What are the minimum requirements for GPU memory for training and inference?

Where is the paper?

Sorry to bother you again, but where can I find the paper . The paper link in the project seems invalid.

Asking for training code for t2v

I was trying to train for the text to video generation can you please provide any code base as in the train.py it is said that T2V training is not supported at this moment so how can I do that ?

please: one step take all 大神们,一步到位啊。

one step take all:

way 1: offline/explicit
1. 4d (time+stereo) strong physical stereo-consistent, any camera pose, offline render, interactive semantic highly-controllable,world.

OR

way 2: online/implicit
binocular stereo-consistent generate, observe pose in-place change, online realtime, interactive semantic highly-controllable,world.
and: the technical path is: train by binocular-video in Unreal Engine or Physical World.

你们组很强,有能力做到这种效果。
视觉游戏的终局。

Training BatchSize

Hi :)! First thank you for your excellent work!

I am trying to reproduce the results of Latte, and I wonder the total batch size for each dataset (local_batch_size * num_gpus), can you share more information on the experiment setups?

I tried the 1e-r lr with 32 total batchsize with the small version Latte-S, but can't generate good results. And so I wonder is the batch size / model size highly relevant to the final results? Thank you!

Latte的实时微信讨论组

我刚试跑了一下t2v demo,效果还是有一定的差距的
在想几个事情,
1、是不是加大训练量能够很大改善,极大缩小跟sora 的差距
2、素材怎么搞得更好质量,比如类似dalle e 视频生成文字。

有没有实时的讨论组啊,比如微信群啥的,要不我建个

latte

Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device

HI I have successfully loaded t2v model using
bash sample/t2v.sh
but it shows the model is running on cpu, how to set it to run on GPU, thanks

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.74it/s]
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.
Processing the (Yellow and black tropical fish dart through the sea.) prompt

Cannot find model:LatteT2V.from_pretrained_2d

~/Latte# bash sample/t2v.sh
/root/miniconda3/envs/latte/lib/python3.12/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/root/miniconda3/envs/latte/lib/python3.12/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Traceback (most recent call last):
  File "/root/Latte/sample/sample_t2v.py", line 160, in <module>
    main(OmegaConf.load(args.config))
  File "/root/Latte/sample/sample_t2v.py", line 30, in main
    transformer_model = get_models(args).to(device, dtype=torch.float16)
                        ^^^^^^^^^^^^^^^^
  File "/root/Latte/models/__init__.py", line 42, in get_models
    return LatteT2V.from_pretrained_2d(pretrained_model_path, subfolder="transformer")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Latte/models/latte_t2v.py", line 992, in from_pretrained_2d
    raise RuntimeError(f"{model_file} does not exist")
RuntimeError: None does not exist

Code:

        model = cls.from_config(config)
        
        model_files = [
            os.path.join(pretrained_model_path, 'diffusion_pytorch_model.bin'),
            os.path.join(pretrained_model_path, 'diffusion_pytorch_model.safetensors')
        ]

        model_file = None

        for fp in model_files:
            if os.path.exists(fp):
                model_file = fp

        if not model_file:
            raise RuntimeError(f"{model_file} does not exist")

the transformer does not have these two models:
image

FVD values of PVDM are strange

Hi, I am the first author of PVDM, and I just checked the FVD values of PVDM are much worse than the values that I reported in the paper. Could you tell me why such differences exist?

Many people tried (and succeeded) to reproduce the values, so it is weird to me.

Import error

ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'
can anyone please help with this errror?

Implementation of compression frame patch embedding (Fig. 3b)

Hi,
Thanks for the great work. I have a few questions:

  1. By default which "patch embedding" is used? Fig.3(a) or (b)?
  2. Is there a parameter to switch between (a) and (b) in a config file?
  3. I'd like to take a look at the implementation of (b) -- compression frame patch embedding. I see PatchEmbed several places and they are from different libs: sometimes from diffuser sometimes from timm. Do you have a pointer to the code where Fig.3(b) is implemented?

Can you provide the code for DDIM sampler

I try to change the ‘sample_method’ hyperparamter to 'DDIM' in Latte/configs/t2v/t2v_sample.yaml, which make the bad performance of the output image. can you provide some scripts for DDIM sampler, or is that model can not work well when using DDIM sampler?

Sicnerely

Train code of t2v?

Do you have any plans to make the training t2v part of the code public? And the best model of T2V.

Bug

ModuleNotFoundError: No module named 'petrel_client',how can i solve this problem, pip seems to do not work

Inference code

Can you provide me with the inference code for text to video?

cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

I create conda environment using conda env create -f environment.yml && conda activate latte, but also
encounter this error when run
bash sample/ffs.sh

diffuser lib is 0.26.3. AND, I have tried to install diffusers from source, but I found no information about CaptionProjection in diffusers lib. Which version you have used?

Bug

Where is the transformer/config.json when sh t2v.sh

Non-consecutive added token '<extra_id_99>' found.

When the code was executed to

text_encoder = T5EncoderModel.from_pretrained(
    args.pretrained_model_path, subfolder="text_encoder", 
    torch_dtype=torch.float16
).to(device)

the following error occurred

ValueError: 
Non-consecutive added token '<extra_id_99>' found.
Should have index 32100 but has index 32000 in saved vocabulary.

What is the reason for this? Is it because the t2v_required_models/tokenizer/spiece.model file on the hugging face is outdated?

Code Reuse

Hi, thanks for your great work, your code and ablation experiments have inspired us a lot. Is it possible for me to make modifications based on your code to adapt it to Open-Sora Plan? Thank you.

run bash sample/t2v.sh error

运行run bash sample/t2v.sh 出现报错Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. 我应该在哪里设置让它运行的时候使用GPU

Translate the following sentence into English: When running run bash sample/t2v.sh, an error occurred saying "Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for float16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference." Where should I set it to use GPU when it runs?

Is there any bug in text2video generation mode?

When using 'args.extras=78', that is, text2video generation mode, I noticed this line https://github.com/maxin-cn/Latte/blob/c4df091565fa6675f39d2fd1f8292295e202a43a/train.py#L221 using pooled-text-embeddings([batch, 768]) instead of text-embeddings([batch, 77, 768]) , which is not compatible with this line https://github.com/maxin-cn/Latte/blob/c4df091565fa6675f39d2fd1f8292295e202a43a/models/latte.py#L241

As a result, I got this error RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x768 and 59136x1152)

diffusion noise modify

Hi, it's a great job, hope you have time to answer my simple question, where can I modify the Gaussian noise parameter in the inference stage or sampling stage and if I change input to a image or video, dose the model have ability to generate a video through this image or generate a awesome video though the existing poor quality video? Thanks

Preprocess of UCF101

thanks for your great work!
i want to know how to generate the /path/to/datasets/UCF101/train_256_list.txt for the UCF101 training。
After downloading the UCF101 videos, according to the paper "We extract 16-frame video clips from these datasets", are there any process scripts we can follow?

TypeError: PatchEmbed.__init__() got an unexpected keyword argument 'bias'

When I use the
bash sample/ffs.sh
meets this error.

Traceback (most recent call last):
File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample.py", line 138, in
main(omega_conf)
File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample.py", line 56, in main
model = get_models(args).to(device)
File "/app/alpaca-lora/voice/clip_proj/Latte/models/init.py", line 44, in get_models
return Latte_models[args.model](
File "/app/alpaca-lora/voice/clip_proj/Latte/models/latte.py", line 465, in Latte_XL_2
return Latte(depth=28, hidden_size=1152, patch_size=2, num_heads=16, **kwargs)
File "/app/alpaca-lora/voice/clip_proj/Latte/models/latte.py", line 233, in init
self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size, bias=True)
TypeError: PatchEmbed.init() got an unexpected keyword argument 'bias'

preprocess dataset & t2v training

I saw in the README that it can be used to train two models, class-conditional and unconditional Latte, using the FaceForensics dataset. Do I need to do any additional preprocessing on the FaceForensics dataset? In what form should I organize the data in the FaceForensics dataset?
In addition, how to train the t2v model?

torchrun --nnodes=1 --nproc_per_node=2 train_with_img.py --config ./configs/sky/sky_img_train.yaml error

A very impressive job. There are several issues when using SkyTimelapse data for image and video pre training

  1. From utils import (clip_gradnorm, create'logger, update_ema,

Requires_grad, cleanup, create_tensorboard,

Write_tensorboard, setup_distributed, fetch files by numbers,

Fetch files by numbers and separation content motion were not found in utils in get_experiment_dir, separation content motion,)

"ImportError: Unable to import the name 'fetch files by num bers' from' utils' (Last/utils. py). After commenting out the corresponding file, it is sufficient. Can you ask what this mainly does? Does it directly use the original video?" Commenting out directly is not a problem, right? "”

  1. If args. dataset=='webvideo2mlaion ':

Traceback (most recent call last):

File "/data/zqzx/latte/latte_main/latte/train_with_img. py", line 361, in

Main (OmegaConf. load (args. config))

File "/data/zqzx/latte/latte_main/latte/train_with_img. py", line 221, in main

Logger. info (f "Dataset contains {len (dataset):,} videos ({args. webvideo_data_path})")

File "/data/miniconde3/envs/yxl/lib/python3.9/site packages/omegaconf/docconfiguration. py", line 355, in getattr_

Self_ Formad_and_raise()

This can be directly solved by adding the corresponding solution to the sky_img_train.yaml corresponding to the actual video, which is. mp4? Or can we think of our own video dataset through this path?

  1. If args. test_run: After commenting it out directly, it can be run now

Thank you.

What is

I'm asking for the lowest amount of GPU video memory (VRAM) necessary to run latte video generation effectively? for both training and inference.

sh sample/t2v.sh error,

sh sample/t2v.sh
Using model!
Traceback (most recent call last):
File "/data/zhangmaolin/code/Latte/sample/sample_t2v.py", line 160, in
main(OmegaConf.load(args.config))
File "/data/zhangmaolin/code/Latte/sample/sample_t2v.py", line 34, in main
vae = AutoencoderKL.from_pretrained(args.pretrained_model_path, subfolder="vae", torch_dtype=torch.float16).to(device)
File "/home/user/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/modeling_utils.py", line 812, in from_pretrained
unexpected_keys = load_model_dict_into_meta(
File "/home/user/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/modeling_utils.py", line 155, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load /data/zhangmaolin/code/Lattle_file/Latte/t2v_required_models because decoder.conv_in.bias expected shape tensor(..., device='meta', size=(64,)), but got torch.Size([512]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: huggingface/diffusers#1619 (comment) as an example.

您好,能帮忙看下这个问题怎么解决吗,我跑t2v.sh的时候报错,期待您的回复

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.