Giter Club home page Giter Club logo

cogview's Introduction

Generate vivid Images for Any (Chinese) text

teaser

News! The paper of ImageReward is accepted by NeurIPS 2023!

News! The codes of ImageReward (paper link) have been released at https://github.com/THUDM/ImageReward! ImageReward is the first general-purpose text-to-image human preference RM.

News! The codes of CogView2 (paper link) have been released at https://github.com/THUDM/CogView2!

News! The demo for a better and faster CogView2 (formal version, March 2022) is available! The lastest model also supports English input, but to translate them into Chinese often could be better.

News! The demo for a better and faster CogView2 (new version) is available!

News! The paper of CogView is accepted by NeurIPS 2021!

CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain.

  • Read our paper CogView: Mastering Text-to-Image Generation via Transformers on ArXiv for a formal introduction. The PB-relax and Sandwich-LN can also help you train large and deep transformers stably (e.g. eliminating NaN losses).
  • Visit our demo at Github Page or Wudao! (Without post-selection or super-resolution, currently only supports simplified Chinese input, but one can translate text from other languages into Chinese for input. Note: Wudao provides faster access for users from China mainland.)
  • Download our pretrained models from Tsinghua Cloud.
  • Cite our paper if you find our work is helpful~
@article{ding2021cogview,
  title={CogView: Mastering Text-to-Image Generation via Transformers},
  author={Ding, Ming and Yang, Zhuoyi and Hong, Wenyi and Zheng, Wendi and Zhou, Chang and Yin, Da and Lin, Junyang and Zou, Xu and Shao, Zhou and Yang, Hongxia and Tang, Jie},
  journal={arXiv preprint arXiv:2105.13290},
  year={2021}
  • Google Colab Two contributors successfully setup up CogView on Colab Links to Colab!

Getting Started

Setup

  • Hardware: Linux servers with Nvidia V100s or A100s are recommended, but it is also okay to run the pretrained models with smaller --max-inference-batch-size or training smaller models on less powerful GPUs.

  • Environment (Option 1): Please first install PyTorch (>=1.7.0) and apex, and then install other dependencies via pip install -r requirements.txt.

  • Environment (Option 2): We prepare a docker image in case that you fail to handle the environments. Pull the image, create a (background) container and get into it via:

    docker pull cogview/cuda111_torch181_deepspeed040
    ./env/start_docker.sh && docker exec -it bg-cogview bash
    
    cd /root/cogview # in the container
    

Download

  1. Download the image tokenizer vqvae_hard_biggerset_011.pt from BAAI website or Tsinghua Cloud. Place the file under pretrained/vqvae.
wget 'https://cloud.tsinghua.edu.cn/f/71607a5dca69417baa8c/?dl=1' -O pretrained/vqvae/vqvae_hard_biggerset_011.pt
  1. Download models from Project Wudao-Wenhui.

    FileName Discription
    cogview-base.tar The pretrained text-to-image model.
    cogview-caption.tar Finetuned image-to-text model, also used for reranking.
    cogview-sr.tar Finetuned super-resolution model. (warning: it runs slow.)

    Uncompress them into pretrained/cogview/. The following command should be modified based on the model name.

    tar -xvf cogview-{base, sr, caption}.tar -C pretrained/cogview/
    
  2. (Only for training tutorial, skip it for inference.) Download a small "bird-and-animal" example dataset from our link at Tsinghua Cloud.

wget https://cloud.tsinghua.edu.cn/f/1e4963ec8ac84941ba68/?dl=1 -O data/bird_animal.bin

Run CogView! (Model Inference)

We encapsulate the generation functions into scripts. See generate_samples.py and arguments.py for details.

Text-to-Image Generation

Write text queries (one per line) into input.txt and run:

./scripts/text2image.sh --debug

The results will in a new folder samples_text2image/.

Arguments useful in inference are mainly:

  • --input-source [path or "interactive"]. The path of the input file, can also be "interactive", which will launch a CLI.
  • --output-path [path]. The folder containing the results.
  • --batch-size [int]. The number of samples will be generated per query.
  • --max-inference-batch-size [int]. Maximum batch size per forward. Reduce it if OOM.
  • --debug. Only save concatenated images for all generated samples, and name them by input text and date.
  • --with-id. When it toggled, you must specify an "id" before each input, e.g. 001\t一个漂亮的女孩, \t denoting TAB (NOT space). It will generate batch-size split images in a folder named "id" for each input. Confict with --debug.
  • --device [int]. Running on which GPU.

Super-resolution

Run the following script and input text\t{image_path}, where {image_path} means the path of a previously generated image.

./scripts/super_resolution.sh

Note: It is only effective for generated images from our Image Tokenizer (due to the token distribution).

Image-to-Text

The input is "one image path per line", and will print the results to stdout.

./scripts/image2text.sh

Note: Not optimized for this task, so it might not very competitive (but okay). We will consider to release a version funetuning for a longer period on this task in the future. (TODO)

Post-selection

This application only takes file inputs, where each line is {text}\t{image_path1}\t{image_path2}\t{image_path3}.... The output is {output_path}/scores.txt, a line of a list of scores, following a line from inputs.

./scripts/post_selection.sh

Note: In the released codes, for simplicity, we did not expose the raw API , which supports some advanced generation modes, e.g. text and part of image.

Training

Here we use a subset of our dataset from bird-and-animal for tutorial. The binary dataset is generated by our cogdata toolkit. Please wait for a formal release with tutorials of cogdata (although it is available now).

Single Node

After downloading the dataset, directly run

./scripts/pretrain_single_node.sh

Multiple Nodes

If you want to train the models on multiple servers inter-connected by infiniband without a shared file system (you may need pdsh to accelerate this process):

  1. On each server, use git clone to download this repo, and make sure the data (LMDB format) are moved into the data subfolder.
  2. On each server, echo "ip1 ip2 <other IPs>" > ./docker/ip_list.txt, and then start the docker by ./env/start_docker.sh.
  3. Get into the docker on the first node container via docker exec -it bg-cogview bash.
  4. Get into /root/cogview and run ./scripts/pretrain_multiple_nodes.sh. You may need to change the config (especially OPTIONS_NCCL) in the shell script.

See the arguments.py for advanced functions for training. TODO

Gallery

more_samples

cogview's People

Contributors

dm-thu avatar sleepychord avatar somefive avatar wenyihong avatar xujz18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cogview's Issues

Where is the code?

In your paper, you claim that the model was released open source. Unfortunately, this repo does not appear to contain any code. Do you have plans to release the code and/or correct the paper?

fused_layer_norm_cuda

游星,我在自己的环境训练上会遇到fused_layer_norm_cuda问题,我重新安了很多遍apex还是解决不了。我用你们提供的docket镜像会报 no GPU resources available错误,是不是我哪里出了错误。

colab

please add a google colab for inference thanks

script to finetune Cogview-base

Hi, I'm trying to finetune Cogview pretrained model. However, when I try to load model weights, I get following error:
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for word_embeddings.weight: copying a param with shape torch.Size([14560, 2560]) from checkpoint, the shape in current model is torch
.Size([14592, 2560]).

Here is my script:

`NUM_WORKERS=1
NUM_GPUS_PER_WORKER=4
MP_SIZE=1

script_path=$(realpath $0)
echo $script_path
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

OPTIONS_NCCL="NCCL_DEBUG=info"
HOST_FILE_PATH="hostfile_single"

config_json="$script_dir/ds_config_zero.json"
gpt_options="
--experiment-name cogview-test_finetune
--img-tokenizer-num-tokens 8192
--dataset-type TokenizedDataset
--model-parallel-size ${MP_SIZE}
--batch-size 4
--num-layers 48
--hidden-size 2560
--num-attention-heads 40
--save ./
--train-iters 2000
--save-interval 800
--resume-dataloader
--train-data /path/to/my/data
--split 90,5,5
--distributed-backend nccl
--lr-decay-style cosine
--warmup .1
--checkpoint-activations
--deepspeed-activation-checkpointing
--max-position-embeddings 1089
--max-memory-length 0
--fp16
--txt-loss-scale 5
--load /path/to/cogview
--no-load-rng
--model-parallel-size 2
--num-workers 16
--is-sparse 0
--finetune
--shuffle
"

gpt_options="${gpt_options}
--deepspeed
--deepspeed_config ${config_json}
"

run_cmd="${OPTIONS_NCCL} deepspeed --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --hostfile ${HOST_FILE_PATH} pretrain_gpt2.py $@ ${gpt_options}"
`

It will be great if you can provide some details for finetuning. Thanks!

Waiting time on the demo

Is there any way to reduce the waiting time in the demo website? I know it is working this way because people are spamming prompts, but this is a bit too much...
image

Attention Analysis

不好意**請問一下 Appendix C.1 的圖怎麼看

縱軸橫軸分別是甚麼意思?

How to finetune the CogView to perform image captioning?

Hello, I wonder how to finetune the CogView model to perform image captioning?
Here is my question:
what is the format of the input text? I notice that the format of input text in your code is [ROI1], text, [BASE], [BOI1], image, [EOI1]. Therefore, what should I change for finetuning to image captioning? Just change the format into [BASE], [BOI1], image, [EOI1], [ROI1], text, or how?

Looking forward to your reply, thanks!

Hello! CUDA out of memory when load the pretrained model of cogview-caption

Hello!Our team plans to load the pre-trained model of cogview-caption to finetune with v100, which is consistent with what you said in the paper about pre-training on the V100. But it turns out that "CUDA out of memory", and the training can't be launched until the model-parallel-size is set to be 4. So, how can we load the pre-trained model and finetune on V100?
@neozhangthe1 @Sleepychord @lykeven @cenyk1230 @Somefive

CUDA Memory Error when finetuning the cogview-base model.

Hello. I try to finetune the provided cogview-base model. Follow the settings in the paper, I set the layer numbers of transformer to 48 and hidden size to 2560. I run the model on a machine with 8 NVIDIA Tesla V100 GPU. But it causes CUDA Memory Error even if I set the batch size to 1.

I also tried to enlarge the MP_SIZE to 2. But the GPU memory is still insufficient.

The settings are as following

#! /bin/bash

# Change for multinode config

NUM_WORKERS=1
NUM_GPUS_PER_WORKER=8
MP_SIZE=1

script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

# OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_SOCKET_IFNAME=bond0 NCCL_IB_GID_INDEX=3 NCCL_NET_GDR_LEVEL=0"
OPTIONS_NCCL="NCCL_DEBUG=info"
HOST_FILE_PATH="hostfile_single"

config_json="$script_dir/ds_config.json"
gpt_options=" \
       --experiment-name cogview-yikai-finetune \
       --img-tokenizer-num-tokens 8192 \
       --dataset-type TokenizedDataset \
       --model-parallel-size ${MP_SIZE} \
       --num-layers 48 \
       --hidden-size 2560 \
       --num-attention-heads 40 \
       --save $main_dir/data/checkpoints \
       --train-iters 20000 \
       --resume-dataloader \
       --train-data ./data/yikai_sticker.lmdb \
       --split 949,50,1 \
       --distributed-backend nccl \
       --lr-decay-style constant \
       --lr 1e-5 \
       --warmup .1 \
       --checkpoint-activations \
       --deepspeed-activation-checkpointing \
       --max-position-embeddings 1089 \
       --max-memory-length 0 \
       --fp16 \
       --batch-size 1 \
       --load pretrained/cogview/cogview-base/
       --finetune \
       --txt-loss-scale 5 
"

gpt_options="${gpt_options}
               --deepspeed \
               --deepspeed_config ${config_json} \
"


run_cmd="${OPTIONS_NCCL} python `which deepspeed` --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --hostfile ${HOST_FILE_PATH} pretrain_gpt2.py $@ ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}

set +x

image

image

Code release date?

Is it possible to give an approximate date that the code and models will be released? By the way, this model looks amazing, thank you for planning to make it open-source.

Thanks, devetec

fused_layer_norm_cuda

Hi, I am very interested in your research. When I reproduced your code, the following problems appeared, and I felt very confused.
Traceback (most recent call last):
File "generate_samples.py", line 326, in
main()
File "generate_samples.py", line 321, in main
model = setup_model(args)
File "generate_samples.py", line 52, in setup_model
model = get_model(args)
File "E:\CogView-main\pretrain_gpt2.py", line 65, in get_model
model = GPT2Model(num_layers=args.num_layers,
File "E:\CogView-main\model\gpt2_modeling.py", line 91, in init
self.transformer = mpu.GPT2ParallelTransformer(num_layers,
File "E:\CogView-main\mpu\sparse_transformer.py", line 460, in init
[get_layer(layer_id) for layer_id in range(num_layers)])
File "E:\CogView-main\mpu\sparse_transformer.py", line 460, in
[get_layer(layer_id) for layer_id in range(num_layers)])
File "E:\CogView-main\mpu\sparse_transformer.py", line 441, in get_layer
return GPT2ParallelTransformerLayer(
File "E:\CogView-main\mpu\sparse_transformer.py", line 283, in init
self.input_layernorm = LayerNorm(hidden_size, eps=layernorm_epsilon)
File "E:\CogView-main\mpu\sparse_transformer.py", line 42, in init
super().init(*args, **kwargs)
File "D:\software\Anaconda3\envs\syq\lib\site-packages\apex\normalization\fuse
d_layer_norm.py", line 133, in init
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
File "D:\software\Anaconda3\envs\syq\lib\importlib_init_.py", line 127, in
import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

The math in this paper

想請問一下關於此篇論文數學的部分:

  1. 加入文字之後的 ELBO 有更詳細的推導過程嗎? 是不是單純把只有 image 的 ELBO 不等式兩邊各加上一個 NLL loss for text 而已?
    因為我看起來 text 並沒有在 VQVAE 訓練過程中起到作用,而這個 ELBO 是給 VQVAE 的,不太懂為什麼會有 text 那項 loss
  2. 不太了解 式(2) 如何變成 式(3) 的

謝謝

vqvae pretrained model

Hi, thank you for your excellent work?
How can I train my own VQVAE model? or VQGAN model?

Got error ''IndexError: tuple index out of range'' running super-res on colab with a tesla v100

/content/CogView
Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1

using dynamic loss scaling
initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
padded vocab (size: 58219) with 21 dummy tokens (new size: 58240)
prepare tokenizer done
building CogView2 model ...
number of parameters on model parallel rank 0: 3928849920
current device: 0
tcmalloc: large alloc 7881007104 bytes == 0x5637e3fb2000 @ 0x7f61e428db6b 0x7f61e42ad379 0x7f6171f1e25e 0x7f6171f1f9d2 0x7f61aff48e7d 0x7f61c0b43120 0x7f61c0781bd9 0x5637152088a8 0x56371527bfd5 0x5637152767ad 0x5637152093ea 0x5637152773b5 0x5637152767ad 0x563715209003 0x563715208b09 0x56371535028d 0x5637152bf1db 0x563715207bb1 0x5637152f8fed 0x56371527b988 0x5637152767ad 0x563715148e2c 0x563715278bb5 0x5637152764ae 0x5637152093ea 0x56371527832a 0x56371520930a 0x5637152773b5 0x56371520930a 0x5637152773b5 0x5637152764ae
Load model file pretrained/cogview/cogview-sr/20000/mp_rank_00_model_states.pt
Working on No. 0 on 0...
Traceback (most recent call last):
File "generate_samples.py", line 326, in
main()
File "generate_samples.py", line 323, in main
generate_images_continually(model, args)
File "generate_samples.py", line 215, in generate_images_continually
for raw_text, seq, output_path in get_context(args, query_template):
File "generate_samples.py", line 132, in get_context
seq = _parse_and_to_tensor(raw_text, img_size=img_size, query_template=query_template)
File "generate_samples.py", line 70, in _parse_and_to_tensor
text = query_template.format(*text.split('\t'))
IndexError: tuple index out of range
/content

A method to prevent generating watermark

You can use "HD photo" style or directly add ",高清图像。"(HD image) at the end of the text (default style).
This will greatly reduce the probability to generate watermarks to nearly zero, but not absolute.
Have fun with CogView.

Why finetune can done on one DGX

In pretrain, paper said use 512 V100, but finetune can be done on single DGX?
Do some pruning or distillation made,
What makes this difference?
And what parallelization you use in training?

Congrats!

Great work with this whole project! Can't wait for the future.

docker pull failed

run docker pull cogview/cuda111_torch181_deepspeed040
result:
Using default tag: latest

Error response from daemon: manifest for cogview/cuda111_torch181_deepspeed040:latest not found

other question:
Could you tell me which version torch and cuda(nvcc -V) you use?

CUDA Memory Error when finetuning the cogview-caption model

I run the model on a machine with 8 NVIDIA Tesla V100 GPU16G
Here is my script: cogview
#! /bin/bash

Change for multinode config

NUM_WORKERS=1
NUM_GPUS_PER_WORKER=8
MP_SIZE=1

script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_SOCKET_IFNAME=bond0 NCCL_IB_GID_INDEX=3 NCCL_NET_GDR_LEVEL=0"

OPTIONS_NCCL="NCCL_DEBUG=info"
HOST_FILE_PATH="hostfile_single"

config_json="$script_dir/ds_config_zero.json"
gpt_options="
--experiment-name cogview-caption
--img-tokenizer-num-tokens 8192
--dataset-type CompactBinaryDataset
--model-parallel-size ${MP_SIZE}
--num-layers 48
--hidden-size 2560
--num-attention-heads 40
--save $main_dir/data/checkpoints
--train-iters 200
--resume-dataloader
--train-data ./data/merge.bin
--split 949,50,1
--distributed-backend nccl
--lr-decay-style constant
--warmup .1
--load pretrained/cogview/cogview-caption/
--finetune
--checkpoint-activations
--deepspeed-activation-checkpointing
--max-position-embeddings 1089
--max-memory-length 0
--fp16
--txt-loss-scale 5
"

gpt_options="${gpt_options}
--deepspeed
--deepspeed_config ${config_json}
"

run_cmd="${OPTIONS_NCCL} deepspeed --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --hostfile ${HOST_FILE_PATH} pretrain_gpt2.py $@ ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}

set +x

deepspeed:
{
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 1,
"steps_per_print": 1,
"gradient_clipping": 0.1,
"zero_optimization": {
"stage":2,
"cpu_offload": false,
"contiguous_gradients": false,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 100000000,
"allgather_bucket_size": 1000000000
},
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 400,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.00005,
"betas": [
0.9,
0.95
],
"eps": 1e-8,
"weight_decay": 4e-2
}
},
"activation_checkpointing": {
"partition_activations": false,
"contiguous_memory_optimization": false
},
"wall_clock_breakdown": false
}
@Sleepychord

Layernorm form in paper

The formulation of layernorm in paper multiply a root_square(d) compare to layernorm introduced in pytorch document. Why add this multiplication? thank you
image

微调图生文

请问在修改哪里的代码可以微调图生文,还是说修改数据集格式。
image

Out of memory when using Text 2 Image

[Wed Jun 16 19:21:01 2021] Memory cgroup out of memory: Killed process 15052 (python3) total-vm:19700460kB, anon-rss:11917232kB, file-rss:89696kB, shmem-rss:12288kB, UID:0 pgtables:25896kB oom_score_adj:0

Is what I can get after the process has been killed.
Is there a way to optimize this to run on a GPU with lower ram? I'm using a Tesla T4 on Google Colab.

Thanks.

pretrain and finetune loss and lr

麻烦问两个问题,感谢
1 请问loss在预训练的下降情况是怎么样的,各自初始值是多少呢,最终各自收敛到什么情况呢,finetune大概是什么情况呢
2 lr一般和batch关系比较紧密,那请问finetune过程中你们的batch和lr大概多少呢

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

使用的环境是由作者提供的docker镜像
使用的显卡是 Tesla P100-PCIE 16GB
在运行./scripts/text2image.sh --debug报错
报错代码如下:
`Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1

using dynamic loss scaling
initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
padded vocab (size: 58219) with 21 dummy tokens (new size: 58240)
prepare tokenizer done
building CogView2 model ...
number of parameters on model parallel rank 0: 3928849920
current device: 1
Load model file pretrained/cogview/cogview-base/142000/mp_rank_00_model_states.pt
Working on No. 0 on 0...
show raw text: 一只可爱的小猫。
Traceback (most recent call last):
File "generate_samples.py", line 329, in
main()
File "generate_samples.py", line 326, in main
generate_images_continually(model, args)
File "generate_samples.py", line 221, in generate_images_continually
generate_images_once(model, args, raw_text, seq, num=args.batch_size, output_path=output_path)
File "generate_samples.py", line 166, in generate_images_once
output_tokens_list.append(filling_sequence(model, seq.clone(), args))
File "/root/cogview/generation/sampling.py", line 128, in filling_sequence
logits, *mems = model(tokens, position_ids, attention_mask, txt_indices_bool, img_indices_bool, is_sparse=args.is_sparse, *mems)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/root/cogview/fp16/fp16.py", line 65, in forward
return fp16_to_fp32(self.module(
(fp32_to_fp16(inputs)), **kwargs))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/model/gpt2_modeling.py", line 112, in forward
transformer_output = self.transformer(embeddings, position_ids, attention_mask, txt_indices_bool, img_indices_bool, is_sparse, *mems)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/sparse_transformer.py", line 604, in forward
hidden_states = layer(*args, mem=mem_i)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/sparse_transformer.py", line 322, in forward
attention_output = self.attention(layernorm_output1, ltor_mask, pivot_idx, is_sparse, mem)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/sparse_transformer.py", line 166, in forward
output = self.dense(context_layer)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/layers.py", line 319, in forward
output_parallel = F.linear(input_parallel, self.weight)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
`
希望有人能为我解答这个问题,谢谢

EOFError: Ran out of input while loading VQVAETokenizer

It happens both locally and on the colab
Full traceback:
Traceback (most recent call last): File "generate_samples.py", line 326, in <module> main() File "generate_samples.py", line 318, in main tokenizer = prepare_tokenizer(args) File "generate_samples.py", line 276, in prepare_tokenizer tokenizer = get_tokenizer(args) File "/home/bohdan_pytaichuk/CogView/CogViewMain/CogView/data_utils/unified_tokenizer.py", line 202, in get_tokenizer get_tokenizer.tokenizer = UnifiedTokenizer( File "/home/bohdan_pytaichuk/CogView/CogViewMain/CogView/data_utils/unified_tokenizer.py", line 30, in __init__ self.img_tokenizer = VQVAETokenizer(model_path=img_tokenizer_path, device=self.device) File "/home/bohdan_pytaichuk/CogView/CogViewMain/CogView/data_utils/vqvae_tokenizer.py", line 38, in __init__ ckpt = torch.load(model_path, map_location=torch.device(device)) File "/home/bohdan_pytaichuk/env/lib/python3.8/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/bohdan_pytaichuk/env/lib/python3.8/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input

Filter activating with anything

I don't know how long it has been like this, but it seems that in the demo, the sensitive material filter started activating with any kind of prompt. I tested "Surrealism", "3D Game", "Dancing Skeleton Toy" and "Golden Sphere". This prompts used to work, but now are activating the filter.

Can't apply to download model.

Good afternoon,

I tried to register on wudaoai and put in my information so I could download the model, but Wudaoai did not accept my phone number, probably because I am from Brazil. I had to look for a Chinese phone number online to apply, but I doubt they will allow me to download the model because of this. What can I do?

Colab error

I found a colab file reference in a closed issue. The last cell (inference) shows this error. Does anyone know of a solution or a more recent Colab noteboook? The web-based version of Cogview works but takes a while to process queued requests.
I used the "insert code" icon when editing this and for some reason, the output is all connected together making it unreadable.

/content/CogView Traceback (most recent call last): File "generate_samples.py", line 28, in <module> from utils import Timers File "/content/CogView/utils.py", line 25, in <module> from fp16 import FP16_Optimizer File "/content/CogView/fp16/__init__.py", line 15, in <module> from .fp16util import ( File "/content/CogView/fp16/fp16util.py", line 21, in <module> import mpu File "/content/CogView/mpu/__init__.py", line 35, in <module> from .layers import ColumnParallelLinear File "/content/CogView/mpu/layers.py", line 28, in <module> from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm ModuleNotFoundError: No module named 'apex'

New cogview model

Will the new one have it's own repository? Or will it be released here along with model + generation script?

Awesome work as always!

docker pull error, "You have reached your pull rate limit"

Is there any possibility to share the Dockerfile?

I was trying to use the docker since it's difficult to build the apex library. Whereas when I used docker pull like the below,
"docker pull cogview/cuda111_torch181_deepspeed040"

I got the following error message:
"Using default tag: latest
Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"

Therefore, an original Dockerfile may be better.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.