thudm / cogview Goto Github PK

Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".

License: Apache License 2.0

Python 97.49% Shell 2.51%

text-to-image transformers pretrained-models pytorch

cogview's Introduction

Generate vivid Images for Any (Chinese) text

News! The paper of ImageReward is accepted by NeurIPS 2023!

News! The codes of ImageReward (paper link) have been released at https://github.com/THUDM/ImageReward! ImageReward is the first general-purpose text-to-image human preference RM.

News! The codes of CogView2 (paper link) have been released at https://github.com/THUDM/CogView2!

News! The demo for a better and faster CogView2 (formal version, March 2022) is available! The lastest model also supports English input, but to translate them into Chinese often could be better.

News! The demo for a better and faster CogView2 (new version) is available!

News! The paper of CogView is accepted by NeurIPS 2021!

CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain.

Read our paper CogView: Mastering Text-to-Image Generation via Transformers on ArXiv for a formal introduction. The PB-relax and Sandwich-LN can also help you train large and deep transformers stably (e.g. eliminating NaN losses).
Visit our demo at Github Page or Wudao! (Without post-selection or super-resolution, currently only supports simplified Chinese input, but one can translate text from other languages into Chinese for input. Note: Wudao provides faster access for users from China mainland.)
Download our pretrained models from Tsinghua Cloud.
Cite our paper if you find our work is helpful~

@article{ding2021cogview,
  title={CogView: Mastering Text-to-Image Generation via Transformers},
  author={Ding, Ming and Yang, Zhuoyi and Hong, Wenyi and Zheng, Wendi and Zhou, Chang and Yin, Da and Lin, Junyang and Zou, Xu and Shao, Zhou and Yang, Hongxia and Tang, Jie},
  journal={arXiv preprint arXiv:2105.13290},
  year={2021}

Google Colab Two contributors successfully setup up CogView on Colab !

Getting Started

Setup

Hardware: Linux servers with Nvidia V100s or A100s are recommended, but it is also okay to run the pretrained models with smaller --max-inference-batch-size or training smaller models on less powerful GPUs.
Environment (Option 1): Please first install PyTorch (>=1.7.0) and apex, and then install other dependencies via pip install -r requirements.txt.
Environment (Option 2): We prepare a docker image in case that you fail to handle the environments. Pull the image, create a (background) container and get into it via:
```
docker pull cogview/cuda111_torch181_deepspeed040
./env/start_docker.sh && docker exec -it bg-cogview bash

cd /root/cogview # in the container
```

Download

Download the image tokenizer vqvae_hard_biggerset_011.pt from BAAI website or Tsinghua Cloud. Place the file under pretrained/vqvae.

wget 'https://cloud.tsinghua.edu.cn/f/71607a5dca69417baa8c/?dl=1' -O pretrained/vqvae/vqvae_hard_biggerset_011.pt

Download models from Project Wudao-Wenhui.

FileName	Discription
cogview-base.tar	The pretrained text-to-image model.
cogview-caption.tar	Finetuned image-to-text model, also used for reranking.
cogview-sr.tar	Finetuned super-resolution model. (warning: it runs slow.)

Uncompress them into pretrained/cogview/. The following command should be modified based on the model name.

tar -xvf cogview-{base, sr, caption}.tar -C pretrained/cogview/

(Only for training tutorial, skip it for inference.) Download a small "bird-and-animal" example dataset from our link at Tsinghua Cloud.

wget https://cloud.tsinghua.edu.cn/f/1e4963ec8ac84941ba68/?dl=1 -O data/bird_animal.bin

Run CogView! (Model Inference)

We encapsulate the generation functions into scripts. See generate_samples.py and arguments.py for details.

Text-to-Image Generation

Write text queries (one per line) into input.txt and run:

./scripts/text2image.sh --debug

The results will in a new folder samples_text2image/.

Arguments useful in inference are mainly:

--input-source [path or "interactive"]. The path of the input file, can also be "interactive", which will launch a CLI.
--output-path [path]. The folder containing the results.
--batch-size [int]. The number of samples will be generated per query.
--max-inference-batch-size [int]. Maximum batch size per forward. Reduce it if OOM.
--debug. Only save concatenated images for all generated samples, and name them by input text and date.
--with-id. When it toggled, you must specify an "id" before each input, e.g. 001\t一个漂亮的女孩, \t denoting TAB (NOT space). It will generate batch-size split images in a folder named "id" for each input. Confict with --debug.
--device [int]. Running on which GPU.

Super-resolution

Run the following script and input text\t{image_path}, where {image_path} means the path of a previously generated image.

./scripts/super_resolution.sh

Note: It is only effective for generated images from our Image Tokenizer (due to the token distribution).

Image-to-Text

The input is "one image path per line", and will print the results to stdout.

./scripts/image2text.sh

Note: Not optimized for this task, so it might not very competitive (but okay). We will consider to release a version funetuning for a longer period on this task in the future. (TODO)

Post-selection

This application only takes file inputs, where each line is {text}\t{image_path1}\t{image_path2}\t{image_path3}.... The output is {output_path}/scores.txt, a line of a list of scores, following a line from inputs.

./scripts/post_selection.sh

Note: In the released codes, for simplicity, we did not expose the raw API , which supports some advanced generation modes, e.g. text and part of image.

Training

Here we use a subset of our dataset from bird-and-animal for tutorial. The binary dataset is generated by our cogdata toolkit. Please wait for a formal release with tutorials of cogdata (although it is available now).

Single Node

After downloading the dataset, directly run

./scripts/pretrain_single_node.sh

Multiple Nodes

If you want to train the models on multiple servers inter-connected by infiniband without a shared file system (you may need pdsh to accelerate this process):

On each server, use git clone to download this repo, and make sure the data (LMDB format) are moved into the data subfolder.
On each server, echo "ip1 ip2 <other IPs>" > ./docker/ip_list.txt, and then start the docker by ./env/start_docker.sh.
Get into the docker on the first node container via docker exec -it bg-cogview bash.
Get into /root/cogview and run ./scripts/pretrain_multiple_nodes.sh. You may need to change the config (especially OPTIONS_NCCL) in the shell script.

See the arguments.py for advanced functions for training. TODO

Gallery

cogview's People

Contributors

Stargazers

Watchers

Forkers

johnpaulbin qiyd81 c1a1o1 dumpmemory earlbabson mbyase ml-lab gaodexiaozheng scn64 qcwthu jp-krow fuzzbuck trendingtechnology johngore123 joshwolff1 dhariyat rcgopi100 derek2sun xxcharles liulangxing pgmct albertbj yangrui747 dreamgaoyang helixngc7293 daiyang11 stevenlol starmemda danczs ms-xie sorrowyn jaycase1 iva-mzsun pingponglabs biandh exgc baai-wudao tarpelite chenjun0210 aikoo91 matteopilotto sysu19351182 chenganhsieh umair-alam mfkiwl molly260 wintersurvival sue0515 musicnova muccul sts0mrg0 ultimanubis hordaway craii neuroidss celsopitta lyp2333 ericguo5513 zzitaileo shism2 stansi kapitsa2811 cj17sui ariafyy clarkenwill micklexqg gyhandy wjgaas songbaitalk suntaochun hphphp123321 ishine qmwgz smartevin vincentkan vakeotoqort nuass assassindesign yuhongmo stardxxx zihaoanllm jjandnn yisi000 f200ten blue03 v2beach torgbuiedunyenyo liannice luoyang871111 peternara shafiahmed nanjim winjia jingliang0412 marcus-arcadius iammeimeia kalufinnle hesamgit sddai uniblackfire

cogview's Issues

Where is the code?

In your paper, you claim that the model was released open source. Unfortunately, this repo does not appear to contain any code. Do you have plans to release the code and/or correct the paper?

Why BAAI pretrained models can't be downloaded directly?

不是很明白为什么还需要申请等各种复杂流程，所以到底是否开放呢？清华源也全部失效了

fused_layer_norm_cuda

游星，我在自己的环境训练上会遇到fused_layer_norm_cuda问题，我重新安了很多遍apex还是解决不了。我用你们提供的docket镜像会报 no GPU resources available错误，是不是我哪里出了错误。

colab

please add a google colab for inference thanks

什么时候开源finetune 高清的代码

@Sleepychord

script to finetune Cogview-base

Hi, I'm trying to finetune Cogview pretrained model. However, when I try to load model weights, I get following error:
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for word_embeddings.weight: copying a param with shape torch.Size([14560, 2560]) from checkpoint, the shape in current model is torch
.Size([14592, 2560]).

Here is my script:

`NUM_WORKERS=1
NUM_GPUS_PER_WORKER=4
MP_SIZE=1

script_path=$(realpath $0)
echo $script_path
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

OPTIONS_NCCL="NCCL_DEBUG=info"
HOST_FILE_PATH="hostfile_single"

config_json="$script_dir/ds_config_zero.json"
gpt_options="
--experiment-name cogview-test_finetune
--img-tokenizer-num-tokens 8192
--dataset-type TokenizedDataset
--model-parallel-size ${MP_SIZE}
--batch-size 4
--num-layers 48
--hidden-size 2560
--num-attention-heads 40
--save ./
--train-iters 2000
--save-interval 800
--resume-dataloader
--train-data /path/to/my/data
--split 90,5,5
--distributed-backend nccl
--lr-decay-style cosine
--warmup .1
--checkpoint-activations
--deepspeed-activation-checkpointing
--max-position-embeddings 1089
--max-memory-length 0
--fp16
--txt-loss-scale 5
--load /path/to/cogview
--no-load-rng
--model-parallel-size 2
--num-workers 16
--is-sparse 0
--finetune
--shuffle
"

gpt_options="${gpt_options}
--deepspeed
--deepspeed_config ${config_json}
"

run_cmd="${OPTIONS_NCCL} deepspeed --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --hostfile ${HOST_FILE_PATH} pretrain_gpt2.py $@ ${gpt_options}"
`

It will be great if you can provide some details for finetuning. Thanks!

期待发布

Waiting time on the demo

Is there any way to reduce the waiting time in the demo website? I know it is working this way because people are spamming prompts, but this is a bit too much...

How to set style in the code?

I tried the demo in https://wudao.aminer.cn/CogView/ and can change the generated image style to simple comics or HD photography.

But in the code, I don't know how to set the style for the text2image task.

can add english version of webpage demo

Attention Analysis

不好意**請問一下 Appendix C.1 的圖怎麼看

縱軸橫軸分別是甚麼意思?

How to finetune the CogView to perform image captioning?

Hello, I wonder how to finetune the CogView model to perform image captioning?
Here is my question:
what is the format of the input text? I notice that the format of input text in your code is [ROI1], text, [BASE], [BOI1], image, [EOI1]. Therefore, what should I change for finetuning to image captioning? Just change the format into [BASE], [BOI1], image, [EOI1], [ROI1], text, or how?

Looking forward to your reply, thanks!

Filter unsafe inputs (better)

Guys, can you do something to prohibit inputs like this one

Even worse, someone’s already trying to generate underage porn:
https://user-images.githubusercontent.com/188197/122180141-d4754600-ce90-11eb-96f6-f8cbacee8da1.mp4

Maybe add some Google-authed signup with the possibility of banning to at least make it harder for creeps to use the model for such things?

Hello! CUDA out of memory when load the pretrained model of cogview-caption

Hello！Our team plans to load the pre-trained model of cogview-caption to finetune with v100, which is consistent with what you said in the paper about pre-training on the V100. But it turns out that "CUDA out of memory", and the training can't be launched until the model-parallel-size is set to be 4. So, how can we load the pre-trained model and finetune on V100?
@neozhangthe1 @Sleepychord @lykeven @cenyk1230 @Somefive

Distilled model?

Would a distilled model work for CogView?

https://github.com/IntelLabs/distiller

Hi！ How should I prepare the dataset_type of TextCodedataset in your code?

Hi！ How should I prepare the dataset_type of TextCodedataset in your code if I want to finetune the network in our data. I notice that TextCodedataset consists of text and code. What does the code mean? And how should I get it?

@neozhangthe1 @Sleepychord @lykeven @cenyk1230 @Somefive

CUDA Memory Error when finetuning the cogview-base model.

Hello. I try to finetune the provided cogview-base model. Follow the settings in the paper, I set the layer numbers of transformer to 48 and hidden size to 2560. I run the model on a machine with 8 NVIDIA Tesla V100 GPU. But it causes CUDA Memory Error even if I set the batch size to 1.

I also tried to enlarge the MP_SIZE to 2. But the GPU memory is still insufficient.

The settings are as following

#! /bin/bash

# Change for multinode config

NUM_WORKERS=1
NUM_GPUS_PER_WORKER=8
MP_SIZE=1

script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

# OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_SOCKET_IFNAME=bond0 NCCL_IB_GID_INDEX=3 NCCL_NET_GDR_LEVEL=0"
OPTIONS_NCCL="NCCL_DEBUG=info"
HOST_FILE_PATH="hostfile_single"

config_json="$script_dir/ds_config.json"
gpt_options=" \
       --experiment-name cogview-yikai-finetune \
       --img-tokenizer-num-tokens 8192 \
       --dataset-type TokenizedDataset \
       --model-parallel-size ${MP_SIZE} \
       --num-layers 48 \
       --hidden-size 2560 \
       --num-attention-heads 40 \
       --save $main_dir/data/checkpoints \
       --train-iters 20000 \
       --resume-dataloader \
       --train-data ./data/yikai_sticker.lmdb \
       --split 949,50,1 \
       --distributed-backend nccl \
       --lr-decay-style constant \
       --lr 1e-5 \
       --warmup .1 \
       --checkpoint-activations \
       --deepspeed-activation-checkpointing \
       --max-position-embeddings 1089 \
       --max-memory-length 0 \
       --fp16 \
       --batch-size 1 \
       --load pretrained/cogview/cogview-base/
       --finetune \
       --txt-loss-scale 5 
"

gpt_options="${gpt_options}
               --deepspeed \
               --deepspeed_config ${config_json} \
"


run_cmd="${OPTIONS_NCCL} python `which deepspeed` --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --hostfile ${HOST_FILE_PATH} pretrain_gpt2.py $@ ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}

set +x

Code release date?

Is it possible to give an approximate date that the code and models will be released? By the way, this model looks amazing, thank you for planning to make it open-source.

Thanks, devetec

fused_layer_norm_cuda

Hi, I am very interested in your research. When I reproduced your code, the following problems appeared, and I felt very confused.
Traceback (most recent call last):
File "generate_samples.py", line 326, in
main()
File "generate_samples.py", line 321, in main
model = setup_model(args)
File "generate_samples.py", line 52, in setup_model
model = get_model(args)
File "E:\CogView-main\pretrain_gpt2.py", line 65, in get_model
model = GPT2Model(num_layers=args.num_layers,
File "E:\CogView-main\model\gpt2_modeling.py", line 91, in init
self.transformer = mpu.GPT2ParallelTransformer(num_layers,
File "E:\CogView-main\mpu\sparse_transformer.py", line 460, in init
[get_layer(layer_id) for layer_id in range(num_layers)])
File "E:\CogView-main\mpu\sparse_transformer.py", line 460, in
[get_layer(layer_id) for layer_id in range(num_layers)])
File "E:\CogView-main\mpu\sparse_transformer.py", line 441, in get_layer
return GPT2ParallelTransformerLayer(
File "E:\CogView-main\mpu\sparse_transformer.py", line 283, in init
self.input_layernorm = LayerNorm(hidden_size, eps=layernorm_epsilon)
File "E:\CogView-main\mpu\sparse_transformer.py", line 42, in init
super().init(*args, **kwargs)
File "D:\software\Anaconda3\envs\syq\lib\site-packages\apex\normalization\fuse
d_layer_norm.py", line 133, in init
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
File "D:\software\Anaconda3\envs\syq\lib\importlib_init_.py", line 127, in
import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

The math in this paper

想請問一下關於此篇論文數學的部分:

加入文字之後的 ELBO 有更詳細的推導過程嗎? 是不是單純把只有 image 的 ELBO 不等式兩邊各加上一個 NLL loss for text 而已?
因為我看起來 text 並沒有在 VQVAE 訓練過程中起到作用，而這個 ELBO 是給 VQVAE 的，不太懂為什麼會有 text 那項 loss
不太了解式(2) 如何變成式(3) 的

謝謝

vqvae pretrained model

Hi, thank you for your excellent work?
How can I train my own VQVAE model? or VQGAN model?

Got error ''IndexError: tuple index out of range'' running super-res on colab with a tesla v100

/content/CogView
Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1

using dynamic loss scaling
initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
padded vocab (size: 58219) with 21 dummy tokens (new size: 58240)
prepare tokenizer done
building CogView2 model ...
number of parameters on model parallel rank 0: 3928849920
current device: 0
tcmalloc: large alloc 7881007104 bytes == 0x5637e3fb2000 @ 0x7f61e428db6b 0x7f61e42ad379 0x7f6171f1e25e 0x7f6171f1f9d2 0x7f61aff48e7d 0x7f61c0b43120 0x7f61c0781bd9 0x5637152088a8 0x56371527bfd5 0x5637152767ad 0x5637152093ea 0x5637152773b5 0x5637152767ad 0x563715209003 0x563715208b09 0x56371535028d 0x5637152bf1db 0x563715207bb1 0x5637152f8fed 0x56371527b988 0x5637152767ad 0x563715148e2c 0x563715278bb5 0x5637152764ae 0x5637152093ea 0x56371527832a 0x56371520930a 0x5637152773b5 0x56371520930a 0x5637152773b5 0x5637152764ae
Load model file pretrained/cogview/cogview-sr/20000/mp_rank_00_model_states.pt
Working on No. 0 on 0...
Traceback (most recent call last):
File "generate_samples.py", line 326, in
main()
File "generate_samples.py", line 323, in main
generate_images_continually(model, args)
File "generate_samples.py", line 215, in generate_images_continually
for raw_text, seq, output_path in get_context(args, query_template):
File "generate_samples.py", line 132, in get_context
seq = _parse_and_to_tensor(raw_text, img_size=img_size, query_template=query_template)
File "generate_samples.py", line 70, in _parse_and_to_tensor
text = query_template.format(*text.split('\t'))
IndexError: tuple index out of range
/content

请教，现有图文对，存到lmdb，训练的时候按照TextCodeDataset 输入，那lmdb的k,v 应该怎么构造呢

源码是 text, code = row[0], row[1].flatten()，那么lmdb 的value 是个tuple（TXT，Img_byte）? @neozhangthe1

A method to prevent generating watermark

You can use "HD photo" style or directly add "，高清图像。"(HD image) at the end of the text (default style).
This will greatly reduce the probability to generate watermarks to nearly zero, but not absolute.
Have fun with CogView.

Why finetune can done on one DGX

In pretrain, paper said use 512 V100, but finetune can be done on single DGX?
Do some pruning or distillation made,
What makes this difference?
And what parallelization you use in training?

Congrats!

Great work with this whole project! Can't wait for the future.

docker pull failed

run docker pull cogview/cuda111_torch181_deepspeed040
result：
Using default tag: latest

Error response from daemon: manifest for cogview/cuda111_torch181_deepspeed040:latest not found

other question：
Could you tell me which version torch and cuda（nvcc -V) you use?

CUDA Memory Error when finetuning the cogview-caption model

I run the model on a machine with 8 NVIDIA Tesla V100 GPU16G
Here is my script: cogview
#! /bin/bash

Change for multinode config

NUM_WORKERS=1
NUM_GPUS_PER_WORKER=8
MP_SIZE=1

script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)

OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_SOCKET_IFNAME=bond0 NCCL_IB_GID_INDEX=3 NCCL_NET_GDR_LEVEL=0"

OPTIONS_NCCL="NCCL_DEBUG=info"
HOST_FILE_PATH="hostfile_single"

config_json="$script_dir/ds_config_zero.json"
gpt_options="
--experiment-name cogview-caption
--img-tokenizer-num-tokens 8192
--dataset-type CompactBinaryDataset
--model-parallel-size ${MP_SIZE}
--num-layers 48
--hidden-size 2560
--num-attention-heads 40
--save $main_dir/data/checkpoints
--train-iters 200
--resume-dataloader
--train-data ./data/merge.bin
--split 949,50,1
--distributed-backend nccl
--lr-decay-style constant
--warmup .1
--load pretrained/cogview/cogview-caption/
--finetune
--checkpoint-activations
--deepspeed-activation-checkpointing
--max-position-embeddings 1089
--max-memory-length 0
--fp16
--txt-loss-scale 5
"

gpt_options="${gpt_options}
--deepspeed
--deepspeed_config ${config_json}
"

run_cmd="${OPTIONS_NCCL} deepspeed --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --hostfile ${HOST_FILE_PATH} pretrain_gpt2.py $@ ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}

set +x

deepspeed:
{
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 1,
"steps_per_print": 1,
"gradient_clipping": 0.1,
"zero_optimization": {
"stage":2,
"cpu_offload": false,
"contiguous_gradients": false,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 100000000,
"allgather_bucket_size": 1000000000
},
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 400,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.00005,
"betas": [
0.9,
0.95
],
"eps": 1e-8,
"weight_decay": 4e-2
}
},
"activation_checkpointing": {
"partition_activations": false,
"contiguous_memory_optimization": false
},
"wall_clock_breakdown": false
}
@Sleepychord

Layernorm form in paper

The formulation of layernorm in paper multiply a root_square(d) compare to layernorm introduced in pytorch document. Why add this multiplication? thank you

ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) on google colab

Hello! I used the link to this notebook. When I ran the last cell, this error occurred.(with apex and pytorch>=1.7.0 installed)

I didn't change any code in previous cells, is there anything else need to do before running the last cell?

微调图生文

请问在修改哪里的代码可以微调图生文，还是说修改数据集格式。

the Alibaba item-title image tokens dataset from our link at Tianchi(TODO)

the link to the Alibaba item-title image tokens dataset at Tianchi(TODO) seems to be unavailable

Out of memory when using Text 2 Image

[Wed Jun 16 19:21:01 2021] Memory cgroup out of memory: Killed process 15052 (python3) total-vm:19700460kB, anon-rss:11917232kB, file-rss:89696kB, shmem-rss:12288kB, UID:0 pgtables:25896kB oom_score_adj:0

Is what I can get after the process has been killed.
Is there a way to optimize this to run on a GPU with lower ram? I'm using a Tesla T4 on Google Colab.

Thanks.

Can not download Wudao-Wenhui

Hi, I just found that the link of pretrained weight Wudao-Wenhui can not be open. Any chance update that link?

Is there any plan to release the subsets of training data?

How long will it take for a pre-training model application to be approved？

@neozhangthe1 @Sleepychord @lykeven @cenyk1230

The demo website is broken?

請問為什麼那個 demo 網站輸入任何文字都說有非法內容

pretrain and finetune loss and lr

麻烦问两个问题，感谢
1 请问loss在预训练的下降情况是怎么样的，各自初始值是多少呢，最终各自收敛到什么情况呢，finetune大概是什么情况呢
2 lr一般和batch关系比较紧密，那请问finetune过程中你们的batch和lr大概多少呢

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

使用的环境是由作者提供的docker镜像
使用的显卡是 Tesla P100-PCIE 16GB
在运行./scripts/text2image.sh --debug报错
报错代码如下：
`Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1

using dynamic loss scaling
initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
padded vocab (size: 58219) with 21 dummy tokens (new size: 58240)
prepare tokenizer done
building CogView2 model ...
number of parameters on model parallel rank 0: 3928849920
current device: 1
Load model file pretrained/cogview/cogview-base/142000/mp_rank_00_model_states.pt
Working on No. 0 on 0...
show raw text: 一只可爱的小猫。
Traceback (most recent call last):
File "generate_samples.py", line 329, in
main()
File "generate_samples.py", line 326, in main
generate_images_continually(model, args)
File "generate_samples.py", line 221, in generate_images_continually
generate_images_once(model, args, raw_text, seq, num=args.batch_size, output_path=output_path)
File "generate_samples.py", line 166, in generate_images_once
output_tokens_list.append(filling_sequence(model, seq.clone(), args))
File "/root/cogview/generation/sampling.py", line 128, in filling_sequence
logits, *mems = model(tokens, position_ids, attention_mask, txt_indices_bool, img_indices_bool, is_sparse=args.is_sparse, *mems)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/root/cogview/fp16/fp16.py", line 65, in forward
return fp16_to_fp32(self.module((fp32_to_fp16(inputs)), **kwargs))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/model/gpt2_modeling.py", line 112, in forward
transformer_output = self.transformer(embeddings, position_ids, attention_mask, txt_indices_bool, img_indices_bool, is_sparse, *mems)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/sparse_transformer.py", line 604, in forward
hidden_states = layer(*args, mem=mem_i)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/sparse_transformer.py", line 322, in forward
attention_output = self.attention(layernorm_output1, ltor_mask, pivot_idx, is_sparse, mem)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/sparse_transformer.py", line 166, in forward
output = self.dense(context_layer)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/cogview/mpu/layers.py", line 319, in forward
output_parallel = F.linear(input_parallel, self.weight)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
`
希望有人能为我解答这个问题，谢谢

EOFError: Ran out of input while loading VQVAETokenizer

It happens both locally and on the colab
Full traceback:
Traceback (most recent call last): File "generate_samples.py", line 326, in <module> main() File "generate_samples.py", line 318, in main tokenizer = prepare_tokenizer(args) File "generate_samples.py", line 276, in prepare_tokenizer tokenizer = get_tokenizer(args) File "/home/bohdan_pytaichuk/CogView/CogViewMain/CogView/data_utils/unified_tokenizer.py", line 202, in get_tokenizer get_tokenizer.tokenizer = UnifiedTokenizer( File "/home/bohdan_pytaichuk/CogView/CogViewMain/CogView/data_utils/unified_tokenizer.py", line 30, in __init__ self.img_tokenizer = VQVAETokenizer(model_path=img_tokenizer_path, device=self.device) File "/home/bohdan_pytaichuk/CogView/CogViewMain/CogView/data_utils/vqvae_tokenizer.py", line 38, in __init__ ckpt = torch.load(model_path, map_location=torch.device(device)) File "/home/bohdan_pytaichuk/env/lib/python3.8/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/bohdan_pytaichuk/env/lib/python3.8/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input

why most images has mark "shutterstock.com"

hello,
I'm confused, why most generated images contain this tag -- "shutterstock.com"?

Filter activating with anything

I don't know how long it has been like this, but it seems that in the demo, the sensitive material filter started activating with any kind of prompt. I tested "Surrealism", "3D Game", "Dancing Skeleton Toy" and "Golden Sphere". This prompts used to work, but now are activating the filter.

Would you like to release the tutorials about how to train a new cogview on other language?

I want to train a new cogview with small size on other language, I found the pretrain script, and has some questions:

how to use vae to get the vqvae_hard_biggerset_011.pt?
how to generate the pretrain file ali_vqvae_hard_biggerset_011.lmdb? And would you like to release the ali_vqvae_hard_biggerset_011.lmdb in the future?
Thanks a lot

Can't apply to download model.

Good afternoon,

I tried to register on wudaoai and put in my information so I could download the model, but Wudaoai did not accept my phone number, probably because I am from Brazil. I had to look for a Chinese phone number online to apply, but I doubt they will allow me to download the model because of this. What can I do?

Colab error

I found a colab file reference in a closed issue. The last cell (inference) shows this error. Does anyone know of a solution or a more recent Colab noteboook? The web-based version of Cogview works but takes a while to process queued requests.
I used the "insert code" icon when editing this and for some reason, the output is all connected together making it unreadable.

/content/CogView Traceback (most recent call last): File "generate_samples.py", line 28, in <module> from utils import Timers File "/content/CogView/utils.py", line 25, in <module> from fp16 import FP16_Optimizer File "/content/CogView/fp16/__init__.py", line 15, in <module> from .fp16util import ( File "/content/CogView/fp16/fp16util.py", line 21, in <module> import mpu File "/content/CogView/mpu/__init__.py", line 35, in <module> from .layers import ColumnParallelLinear File "/content/CogView/mpu/layers.py", line 28, in <module> from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm ModuleNotFoundError: No module named 'apex'

New cogview model

Will the new one have it's own repository? Or will it be released here along with model + generation script?

Awesome work as always!

about evaluation

Hi,
How do you get "26.0" FID on mscoco using DM-GAN? Because the official result reported in https://github.com/MinfengZhu/DM-GAN is 26.55.
I ran DM-GAN myself and managed to get a similar result(26.54), instead of "26.0".

Is there any way to run without apex?

what's the training resource？

Can you explain the training resource? It seems omitted in the paper.

docker pull error, "You have reached your pull rate limit"

Is there any possibility to share the Dockerfile?

I was trying to use the docker since it's difficult to build the apex library. Whereas when I used docker pull like the below,
"docker pull cogview/cuda111_torch181_deepspeed040"

I got the following error message:
"Using default tag: latest
Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"

Therefore, an original Dockerfile may be better.