daa233 / generative-inpainting-pytorch Goto Github PK
View Code? Open in Web Editor NEWA PyTorch reimplementation for paper Generative Image Inpainting with Contextual Attention (https://arxiv.org/abs/1801.07892)
License: MIT License
A PyTorch reimplementation for paper Generative Image Inpainting with Contextual Attention (https://arxiv.org/abs/1801.07892)
License: MIT License
I'm confused why the input for f and b on line 194 are the same
Hi,
I Implement your code on my datasets with mask size is 50%, l1 loss is 5.6% lower than the paper reported 8.6%.The result seems great. Is it possible or any problem ?
Hey, I got the following error while running the training script on MNIST (with 3 channels - converted to RGB).
torch.Size([16, 128, 4, 4])
torch.Size([1, 9, 3, 3])
2020-06-30 06:07:48,118 ERROR Given transposed=1, weight of size [16, 128, 4, 4], expected input[1, 9, 3, 3] to have 16 channels, but got 9 channels instead
Traceback (most recent call last):
File "train.py", line 177, in <module>
main()
File "train.py", line 173, in main
raise e
File "train.py", line 116, in main
losses, inpainted_result, offset_flow = trainer(x, bboxes, mask, ground_truth, compute_g_loss)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/generative-inpainting-pytorch/trainer.py", line 40, in forward
x1, x2, offset_flow = self.netG(x, masks)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/generative-inpainting-pytorch/model/networks.py", line 27, in forward
x_stage2, offset_flow = self.fine_generator(x, x_stage1, mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/generative-inpainting-pytorch/model/networks.py", line 164, in forward
x, offset_flow = self.contextul_attention(x, x, mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/generative-inpainting-pytorch/model/networks.py", line 328, in forward
yi = F.conv_transpose2d(yi, wi_center, stride=self.rate, padding=1) / 4. # (B=1, C=128, H=64, W=64)
RuntimeError: Given transposed=1, weight of size [16, 128, 4, 4], expected input[1, 9, 3, 3] to have 16 channels, but got 9 channels instead
I got a similar error in the original tensorflow repo as well. The 2 dimensions printed are a result of print statements I inserted
print(wi_center.shape)
print(yi.shape)
I am attaching a part of the YAML file as well
# data parameters
dataset_name: MNIST
data_with_subfolder: False
train_data_path: training_data/training
val_data_path:
resume:
batch_size: 32
image_shape: [28, 28, 3]
mask_shape: [16, 16]
mask_batch_same: True
max_delta_shape: [12, 12]
margin: [0, 0]
discounted_mask: True
spatial_discounting_gamma: 0.9
random_crop: False
mask_type: hole # hole | mosaic
mosaic_unit_size: 6
When I try to do index an element of Dataset, I get a torch tensor of size [3,28,28] which seems ok (although the channels dimension is last in the yaml file).
Any help would be great!
Dear author:
I have a little question for you:
In the CoarseGenerator, the network input is:
x = self.conv1(torch.cat([x, ones, mask], dim=1))
In the FineGenerator, the network input is:
xnow = torch.cat([x1_inpaint, ones, mask], dim=1)
My question is that they torch.cat "mask": 0 indicates the outside known region and 1 indicates the inside missing region.
Didn't it mean that this operation masks the known region and shows the missing region? Is it right?
Thank you for your answer。
Hello, I used your code, used 5000 pictures, set the batchsize to 16, ran 50000 times and 100000 times respectively. Why is the effect of 100000 times worse?If you know, please reply me, thank you very much!
when I visualize with tensorboard ı can see that the training has like 35k iterations but do you have any idea about epoch num ?
Hello,
I am wondering if you are planning to release the code used to convert the TensorFlow model to PyTorch or, alternatively, converted snapshots for the other datasets used in the original paper (e.g. CelebA, Places).
Keep up the great work!
can you also provide the original (uncut) images?
Hello, I am Daisy.
I was really surprised to find out that you've finished this pytorch version of the paper recently. Thanks a lot. However, when I tried to train the model, I can't read the image file right. The error is like this.
ERROR num_samples should be a positive integeral value, but got num_samples=0
May I ask if the format of your training images are like "n02128925_9771.JPEG"? I download them from ILSVRC2012 website->Images-> Training images (Task 1 & 2).(http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads).
Thanks again!!
@daisy91530 I copied your question here in case someone meets the same problem.
Hello, sorry to bother you. Could you please explain why you keep only the first mask in the batch, in your implementation of the contextual attention module?
hi ,
Could you please tell me which gpu you are using to trian the model ?
I use 1 1080ti 11Gb with batch size 8 or less , or 3 1080ti 11Gb with batch size 36 or less, OOM occured.
Hello,
I have a very basal question please, could i use a image of 1920*1080 to do test with trained data? thank you very much for helping
Thanks for your reimplementation.
When I run the test_contextual_attention function in the network.py
. Error occured. RuntimeError: Given transposed=1, weight of size 4096 3 4 4, expected input[1, 4032, 166, 250] to have 4096 channels, but got 4032 channels instead
In my setting, --imagA is the bnw_butterfly.png
and --imageB is the bike.jpg
for the offical CA repo, in the folder of examples/style_transfer
.
Could you please help me with it? Thanks.
Best,
大神,请问训练时出现这错误应该怎么修改呀,求大神指点
Hi,
Thank you for your work, it is very helpful.
I was wondering if you could explain why you use x2_inpaint.detach()
when running a forward pass through the discriminator for calculating the discriminator's loss, but do not call detach()
when calling the discriminator for calculating the generator's loss.
In this case, gradient will not be computed for the D loss for the fake images, but will for the G loss.
I understand why you use detach()
when calculating the gradient penalty (as you only want to use the gradient for the GP term and not recalculate gradient for the discriminator again, but I cannot seem to understand why it is used as described above.
Thank you
Can you please specify license? Preferably MIT
cos similarity need calculate norm of the patch. But in the code, I am not find where to calculate the norm of the xi.
max_wi = torch.max(torch.sqrt(reduce_sum(torch.pow(wi, 2),axis=[1, 2, 3],keepdim=True)),escape_NaN)
wi_normed = wi / max_wi
yi = F.conv2d(xi, wi_normed, stride=1) # [1, L, H, W]
I am confuse about this question, thank you very much~
Thanks for your code, it's amazing! But how should i get attention Visualization? What command should i run?
Hello,thanks that you can provide the coda for us.
I read the code and I have some questions.
Is your code only suitable for 256*256 images?
I change the image_size in config.But in contextual layer it broke.
RuntimeError: Given transposed=1, weight of size [7238, 128, 4, 4], expected input[1, 7038, 46, 153] to have 7238 channels, but got 7038 channels instead.
In Yu's code, it can handle every size of images,so could you tell me what's the difference between yours and Yu's?
Thanks!
Hi DAA233, I found some differences between your code and Jiahuiyu's code, my question is, why at end of each gen_conv layer in Jiahuiyu's code tensor will be divided into two parts and two different activation functions will be used, do you have any idea about it? because it seems like that you just remove this part
x, y = tf.split(x, 2, 3); x = activation(x) y = tf.nn.sigmoid(y) x = x * y return x
to generate mask image, the code is :
result = x * (1. - mask)
but the x is three dimensions, while mask is one dimensions,
so is it right? can the operation be carried out?
Hello, I am Chen Longwhen.When I train the model, I get an error.The error like this.
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383
My experiment is:
Pytorch 1.1.0
torchvision 0.3.0
Thanks for your brilliant code
One question. In official implementation, both fine and coarse generator share one single encoder network. But in your implementation, you use different encoders for these two generator. Could this operation provide better performance?
Thanks~
In networks.py line 79, in the coarse model, i found that you have used "interpolate" function, but in tensorflow version, they used an deconv layer, is there any reason?
I'm learning this paper, thanks a lot !
Dear author:
Thanks for your re-implementation, it's helpful!
When I tried to train with multi gpus, I noticed you have not convert the batch norm to synced batch norm. I wonder if you realized it or not? Thank you.
I am currently attempting to train the GAN but the masked images (x) and the inpainted results are the same. Do you have any idea what could be causing this?
I trained the model on my own dataset and got the error:
I found that this line resulted in error.
generative-inpainting-pytorch/train.py
Line 135 in ee1fd75
Thank you for your code.But not I use your code to train CelebA datasets, it shows a bad inpainting effect in the validation of every 1000 iterations during the training phase.(hole_benchmark folder)。
Have you ever trained on this datasets(CelebA)? Can a well-trained model be provided?
Can you post some result image in the README file, thanks.
Hi~
What train dataset did you use? I mean in ImageNet2012 there are many sub-datasets, I want to know which did you use?
I was looking into the code and there is a flow argument but I can't find an example that uses it so I don't know how I can make it work.
Thanks for your help
Hello!
I've met a bug which is hard to solve.
I've done many modification on your proposed code, and everything is fine.
Last week I do a modification on the original code in a new dataset, and I run the proposed code as a baseline. The original code works fine. Bug the modified code met this bug.
The dataset is not corrupted. And no matter how I check the code and datset
, the loss is NaN when iter<10000. Which the strange thing is when I re-run the original code, the same bug happened. But when I read the last week's original code, the training stage is all fine.
Can you run the original code ? I don't know why the loss =NaN. Can you help me solve this bug? It makes me crazy.
Hi~ First, Your implemention is awesome, thank you! But I have some question.
1.What's the meaning of 'mm' in function 'contextual_attention'
2.I'am confused that after 'xi' and 'wi_normed' conv, then conv two times
It's kind of you if you could help me!
When testing with your data I'm getting the training gradient function CloneBackward for interpolates and AddmmBackward for disc_interpolates but I'm not getting any gradient function (I printed out the tensors, that's how I know) when using my data. By any chance can you speculate what might be the problem, as in how is the calc_gradient_penalty producing a gradient function for these variables automatically? and how can I force a gradient function (maybe even manually) for them?
Thanks for your help
Dear author:
Thanks for your re-implementation, it's helpful! Now I have a little question for you:
In the training phase, the training image will be scaled to 256*256, the code in dataset.py is:
if self.random_crop:
imgw, imgh = img.size
if imgh < self.image_shape[0] or imgw < self.image_shape[1]:
img = transforms.Resize(min(self.image_shape))(img)
img = transforms.RandomCrop(self.image_shape)(img)
else:
img = transforms.Resize(self.image_shape)(img)
img = transforms.RandomCrop(self.image_shape)(img)
In the testing phase, the testing image will be scaled to 256*256, the code in test_single.py is:
x = transforms.Resize(config['image_shape'][:-1])(x)
x = transforms.CenterCrop(config['image_shape'][:-1])(x)
mask = transforms.Resize(config['image_shape'][:-1])(mask)
mask = transforms.CenterCrop(config['image_shape'][:-1])(mask)
The scaling standards are the same between them?
Thank you for your answer。
I see from the author new implementation that it can now work on both rectangle and free-form of mask, not sure your implementation can also do the same?
Thanks,
Since the validation set section is currently commented out, how do you think it is best used for this case of neural network?
Thanks
Hey,
Do you have the pretrained weights for the Places2 dataset? It would be great if you can share that!
First off, thank you for providing this implementation! I am new to pytorch and ML in general and I'm about to train on my own dataset, but I wanted to ask if there's a simple way to integrate early stopping? Greatly appreciate any advice you can provide. Thank you in advance :)
Can you help me to visualize train loss with tensorboardx?
Hi! thank you for your code. According to the original paper, Algorithm 1 updates the two critics for 5 time every iteration. However, in your implementation, it updates both generator and critics at the same time every iteration.
Please tell me about that. I am sorry if my interpretation is mistaken.
Thank you very much for your code.
When I run test_contextual_attention() with two images from Yu's webpage, I didn't get his result.
The third in the first row is what I got, the third in the second row is from Yu's webpage.
Do you have any idea for the reason? I really appreciate your answer.
Also, when I test using two same images, I got stange result. The left image is for test. I use this image for foreground and background image. The left is what I got. I have check the codes, and didn't find the reason.
For my understanding, if using two same images, the reconstructed should be very similar to the ground truth, right?
Thank you very much for your time.
Hi @daa233
Thanks for you great work.
I have got the following error in torch.stack after completing the 3k iterations of training model. I unable to found the error.
Kindly help me out the error.
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.