dome272 / maskgit-pytorch Goto Github PK

View Code? Open in Web Editor NEW

398.0 398.0 34.0 74 KB

Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)

License: MIT License

Python 100.00%

maskgit-pytorch's People

Contributors

Stargazers

Watchers

maskgit-pytorch's Issues

Lost Datasets landscape and flowers

When I running training_transformer.py and training_vqgan.py. I don't hava dataset landscape and flowers.

Can writter release the dateset in repository？

codebook loss

does the beta weighted wrong loss term of embedding loss which should be commitment loss in the vanilla va-vae?

sample_good() function in transformer.py

Hi!

I think the shape of logits from self.tokens_to_logits is [batch, 257, 1026] because you defined self.tok_emb = nn.Embedding(args.num_codebook_vectors + 2, args.dim).

However, the number of codebook's embedding is 1024 so that it occurs errors.
Haven't you seen these errors during sampling?
Did I miss something here?

How to adapt to image extrapolation？

I don't know how to adapt this method to the image extrapolation task, since the tokens are a fixed number, like 32x32? Want to ask you how to achieve?

Thank you for the implementation! I would like to tune the second-stage transformer, but my VQGAN trained on Flickr landscape dataset is not so good. I see there is a load function of 'vq_flickr.pt' and you have much better landscape results. Could you kindly share that? Thanks!

Have problems in sampling image

Good repo! When I try to run training_transformer.py and want to generate image using line 59, I ran in to problem: there is no codebook in self.vqgan.codebook in transformer.py so I assume I should comment this line (198) and uncomment line 199.
However, if I do this, I will run into problems:
, thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [164,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed...
Would you mind clarifying what should I do for sampling images? I use default hyperparameters and duplicate one image from landscape dataset for 12 times as dataset.

and here's the error log.
Traceback (most recent call last):
File "training_transformer.py", line 143, in
train_transformer = TrainTransformer(args)
File "training_transformer.py", line 36, in init
self.train(args)
File "training_transformer.py", line 48, in train
logits, target = self.model(imgs)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaiwen/Desktop/interview/copilot4d/test/transformer.py", line 57, in forward
_, z_indices = self.encode_to_z(x)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/kaiwen/Desktop/interview/copilot4d/test/transformer.py", line 42, in encode_to_z
quant_z, , (, _, indices) = self.vqgan.encode(x)
File "/home/kaiwen/Desktop/interview/copilot4d/test/vq_f16.py", line 27, in encode
h = self.encoder(x)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaiwen/Desktop/interview/copilot4d/test/vq_modules.py", line 267, in forward
hs = [self.conv_in(x)]
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

Could you please provide a test file for image outpainting?

Hi dome, thanks for ur implementation about MaskGIT, and there is still a question in the training process. Should I train VQGAN first and then train Transformer? Using the same dataset? Thanks Reply!

Isn't loss only supposed to be calculated on masked tokens?

In the training loop we have:

imgs = imgs.to(device=args.device)
logits, target = self.model(imgs)
loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), target.reshape(-1))
loss.backward()

However, the output of the transformer is:

  _, z_indices = self.encode_to_z(x)
.
.
.
  a_indices = mask * z_indices + (~mask) * masked_indices

  a_indices = torch.cat((sos_tokens, a_indices), dim=1)

  target = torch.cat((sos_tokens, z_indices), dim=1)

  logits = self.transformer(a_indices)

  return logits, target

which means the returned target is the original unmasked image tokens.

The MaskGIT paper seems to suggest that loss was only calculated on the masked tokens

can you give a pretrained model to test?

good job!

Training on Colab - CUDA out of memory

Hi, I would like to ask if someone have tried to train the model on Colab. Yesterday I tried to launch training with GPU, but it runs out of memory. It instantly fills almost all 15Gbs. I tried with smaller batches (6 and 8) but same problem.
Also I replaced the model inside the training of VQGAN , with the same used as inference to the transformer (vq_f16).

Additionally, If @dome272 could upload pretrained weights for both model I would be grateful (I need for my exam project at uni help ahaha)

Many Thanks

Learning rate & its scheduling

I cannot find the specific value of learning rate and how the author schedule to change the learning rate over epochs.
How do you implement and reproduce the results in the paper?

Is Each VQGAN model of class TrainVQGAN and class VQmodel different?

I am going through your MaskGIT code to study how to implement it, Thank you!
But I have a question about VQGAN for tokenization.
I think VQGAN for tokenization and VQGAN in training_vqgan.py are different to each other because the parameters of those are not same with each other.
If I mistake it, let me know, please. Thx!

Is there any plan to release the pretrained checkpoint for ImageNet 256/512?

Hi, it was mind-blowing to see your works.
I am wondering if you have any plan to release all the checkpoints for ImageNet 256/512.
Please let me know if you have any plans.
Thank you for your devotion to the community.

Question about the mask token id and sos token id

Hi,
In transformer.py, I find mask_token_id is set to be args.num_image_tokens. Shouldn't it be the args.num_codebook_vectors? I think we don't want the mask token id to be one of those in the codebook. Similar thing for the sos token id.

vq_gan reconstruction results blurry using default code

Hi, thank you for this very interesting work! I'm currently trying to train the vq-gan part on my few-shot dataset (e.g. ~300 dog or cat images) with resolution 256x256. However using the default settings on the code, after training for 200 epochs the reconstruction results still look kinda blurry (as shown below, first row is real image, second row is reconstructed image after training)

And after comparing the code with the setup in the paper, I currently found two differences:

the default embedding dimension is 256 in the code, where it is 768 used in the paper
the non-local block is single head attention, where the paper used 8-head attention

I'm not sure whether these differences may cause the blurry results of this extent? or are there any other factors I need to pay attention to ? Thanks!

About Class-conditional Image Synthesis

Hi, thanks for your open source. It is a great work.
I want to ask a question about this paper. The Bi-directional Transformers is trained without any conditional input, it just try to predict the masked token. But when we inference it, such as use the model to do a Class-conditional Image Synthesis task. How the class condition information can be used?

the plan for completing the following parts?

Hi dome, thanks for ur implementation about MaskGIT. Is there any plan to complete the whole project?

Issue about generated images

I have also tried to re-produce the MaskGIT recently. After training 150 epoch on ImageNet, our model can only achieve 8.4% accuracy on token classification. During sampling, we find our model will generate monochrome image (nearly white). Do you meet similar problem?

dome272 / maskgit-pytorch Goto Github PK

maskgit-pytorch's People

Contributors

Stargazers

Watchers

Forkers

maskgit-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org