Giter Club home page Giter Club logo

maskgit-pytorch's People

Contributors

dome272 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maskgit-pytorch's Issues

Lost Datasets landscape and flowers

When I running training_transformer.py and training_vqgan.py. I don't hava dataset landscape and flowers.

Can writter release the dateset in repository?

codebook loss

does the beta weighted wrong loss term of embedding loss which should be commitment loss in the vanilla va-vae?

sample_good() function in transformer.py

Hi!

I think the shape of logits from self.tokens_to_logits is [batch, 257, 1026] because you defined self.tok_emb = nn.Embedding(args.num_codebook_vectors + 2, args.dim).

However, the number of codebook's embedding is 1024 so that it occurs errors.
Haven't you seen these errors during sampling?
Did I miss something here?

How to adapt to image extrapolation?

I don't know how to adapt this method to the image extrapolation task, since the tokens are a fixed number, like 32x32? Want to ask you how to achieve?

Pretrained model for VQGAN

Thank you for the implementation! I would like to tune the second-stage transformer, but my VQGAN trained on Flickr landscape dataset is not so good. I see there is a load function of 'vq_flickr.pt' and you have much better landscape results. Could you kindly share that? Thanks!

out_2

Have problems in sampling image

Good repo! When I try to run training_transformer.py and want to generate image using line 59, I ran in to problem: there is no codebook in self.vqgan.codebook in transformer.py so I assume I should comment this line (198) and uncomment line 199.
However, if I do this, I will run into problems:
, thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [164,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed...
Would you mind clarifying what should I do for sampling images? I use default hyperparameters and duplicate one image from landscape dataset for 12 times as dataset.

and here's the error log.
Traceback (most recent call last):
File "training_transformer.py", line 143, in
train_transformer = TrainTransformer(args)
File "training_transformer.py", line 36, in init
self.train(args)
File "training_transformer.py", line 48, in train
logits, target = self.model(imgs)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaiwen/Desktop/interview/copilot4d/test/transformer.py", line 57, in forward
_, z_indices = self.encode_to_z(x)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/kaiwen/Desktop/interview/copilot4d/test/transformer.py", line 42, in encode_to_z
quant_z, , (, _, indices) = self.vqgan.encode(x)
File "/home/kaiwen/Desktop/interview/copilot4d/test/vq_f16.py", line 27, in encode
h = self.encoder(x)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaiwen/Desktop/interview/copilot4d/test/vq_modules.py", line 267, in forward
hs = [self.conv_in(x)]
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/kaiwen/anaconda3/envs/ddpm/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

Isn't loss only supposed to be calculated on masked tokens?

In the training loop we have:

imgs = imgs.to(device=args.device)
logits, target = self.model(imgs)
loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), target.reshape(-1))
loss.backward()

However, the output of the transformer is:

  _, z_indices = self.encode_to_z(x)
.
.
.
  a_indices = mask * z_indices + (~mask) * masked_indices

  a_indices = torch.cat((sos_tokens, a_indices), dim=1)

  target = torch.cat((sos_tokens, z_indices), dim=1)

  logits = self.transformer(a_indices)

  return logits, target

which means the returned target is the original unmasked image tokens.

The MaskGIT paper seems to suggest that loss was only calculated on the masked tokens

image

Training on Colab - CUDA out of memory

Hi, I would like to ask if someone have tried to train the model on Colab. Yesterday I tried to launch training with GPU, but it runs out of memory. It instantly fills almost all 15Gbs. I tried with smaller batches (6 and 8) but same problem.
Also I replaced the model inside the training of VQGAN , with the same used as inference to the transformer (vq_f16).

Additionally, If @dome272 could upload pretrained weights for both model I would be grateful (I need for my exam project at uni help ahaha)

Many Thanks

Learning rate & its scheduling

I cannot find the specific value of learning rate and how the author schedule to change the learning rate over epochs.
How do you implement and reproduce the results in the paper?

Is Each VQGAN model of class TrainVQGAN and class VQmodel different?

I am going through your MaskGIT code to study how to implement it, Thank you!
But I have a question about VQGAN for tokenization.
I think VQGAN for tokenization and VQGAN in training_vqgan.py are different to each other because the parameters of those are not same with each other.
If I mistake it, let me know, please. Thx!

Question about the mask token id and sos token id

Hi,
In transformer.py, I find mask_token_id is set to be args.num_image_tokens. Shouldn't it be the args.num_codebook_vectors? I think we don't want the mask token id to be one of those in the codebook. Similar thing for the sos token id.

vq_gan reconstruction results blurry using default code

Hi, thank you for this very interesting work! I'm currently trying to train the vq-gan part on my few-shot dataset (e.g. ~300 dog or cat images) with resolution 256x256. However using the default settings on the code, after training for 200 epochs the reconstruction results still look kinda blurry (as shown below, first row is real image, second row is reconstructed image after training)
100_20
199_60
199_10

And after comparing the code with the setup in the paper, I currently found two differences:

  1. the default embedding dimension is 256 in the code, where it is 768 used in the paper
  2. the non-local block is single head attention, where the paper used 8-head attention

I'm not sure whether these differences may cause the blurry results of this extent? or are there any other factors I need to pay attention to ? Thanks!

About Class-conditional Image Synthesis

Hi, thanks for your open source. It is a great work.
I want to ask a question about this paper. The Bi-directional Transformers is trained without any conditional input, it just try to predict the masked token. But when we inference it, such as use the model to do a Class-conditional Image Synthesis task. How the class condition information can be used?

Issue about generated images

Hi

I have also tried to re-produce the MaskGIT recently. After training 150 epoch on ImageNet, our model can only achieve 8.4% accuracy on token classification. During sampling, we find our model will generate monochrome image (nearly white). Do you meet similar problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.