daa233 / generative-inpainting-pytorch Goto Github PK

A PyTorch reimplementation for paper Generative Image Inpainting with Contextual Attention (https://arxiv.org/abs/1801.07892)

License: MIT License

Python 100.00%

image-inpainting generative-adversarial-network attention-model deep-neural-networks pytorch

generative-inpainting-pytorch's People

Contributors

Stargazers

Watchers

Forkers

wingillis zphang baogiadoan johnhany ricardo-garciar miaojiang1987 frankiegu kunato steffen-wolf zengyh1900 zhangxgu xiaoqiangzhou kirepreshanth fhvilshoj mijiacang gogobd zeyuxiao1997 xiankgx eugenezabrotsky andrewhuman pinglmlcv donghaozhang roumenguha pandinosaurus koma0322 ihammad90 diego-pedroso freegliboracle lenkerr barcodereader wayneszq zhaoqiangshen tangxiran palc001 ahuirecome nicksum107 mastryukov1990 jobsfan aakanksha-rana 18724799167 daniilboiko jacobwjs geeker9 pzzhai cj12358 xy-gao huichuanliu zhumingxu lifrary fdan llinwing wcy-cs dalifreire laoyangui sergeyprokudin scmoritz wyf0912 mrsandmanrus justhungryman jasonsgache nicolalandro linxinqiang90 prabhukiran8790 filipposimonazzi 5l1v3r1 am-official sharansmenon pixiedust18 wzhy0576 shumile66 adriacarrasquilla bakabreak runner42195 merryguoguo farukcankaya pmsoftware78 chappyer gz475 ajishpradeep chenhuayou kimdn mikewang928 2100877953 quick2063706271 taeuk-jang struan-robertson akshat01112001 amangupta2303 joelm207 tumble-weed xli0429 jeewonkimm2 jindl465 jerrywuzijie

generative-inpainting-pytorch's Issues

network.py

I'm confused why the input for f and b on line 194 are the same

Possible behave better on another datasets？

Hi,
I Implement your code on my datasets with mask size is 50%, l1 loss is 5.6% lower than the paper reported 8.6%.The result seems great. Is it possible or any problem ?

Error while training MNIST

Hey, I got the following error while running the training script on MNIST (with 3 channels - converted to RGB).

torch.Size([16, 128, 4, 4])
torch.Size([1, 9, 3, 3])
2020-06-30 06:07:48,118 ERROR Given transposed=1, weight of size [16, 128, 4, 4], expected input[1, 9, 3, 3] to have 16 channels, but got 9 channels instead
Traceback (most recent call last):
  File "train.py", line 177, in <module>
    main()
  File "train.py", line 173, in main
    raise e
  File "train.py", line 116, in main
    losses, inpainted_result, offset_flow = trainer(x, bboxes, mask, ground_truth, compute_g_loss)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/generative-inpainting-pytorch/trainer.py", line 40, in forward
    x1, x2, offset_flow = self.netG(x, masks)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/generative-inpainting-pytorch/model/networks.py", line 27, in forward
    x_stage2, offset_flow = self.fine_generator(x, x_stage1, mask)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/generative-inpainting-pytorch/model/networks.py", line 164, in forward
    x, offset_flow = self.contextul_attention(x, x, mask)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/generative-inpainting-pytorch/model/networks.py", line 328, in forward
    yi = F.conv_transpose2d(yi, wi_center, stride=self.rate, padding=1) / 4.  # (B=1, C=128, H=64, W=64)
RuntimeError: Given transposed=1, weight of size [16, 128, 4, 4], expected input[1, 9, 3, 3] to have 16 channels, but got 9 channels instead

I got a similar error in the original tensorflow repo as well. The 2 dimensions printed are a result of print statements I inserted

print(wi_center.shape)
print(yi.shape)

I am attaching a part of the YAML file as well

# data parameters
dataset_name: MNIST
data_with_subfolder: False
train_data_path: training_data/training
val_data_path:
resume:
batch_size: 32
image_shape: [28, 28, 3]
mask_shape: [16, 16]
mask_batch_same: True
max_delta_shape: [12, 12]
margin: [0, 0]
discounted_mask: True
spatial_discounting_gamma: 0.9
random_crop: False
mask_type: hole     # hole | mosaic
mosaic_unit_size: 6

When I try to do index an element of Dataset, I get a torch tensor of size [3,28,28] which seems ok (although the channels dimension is last in the yaml file).
Any help would be great!

question about input of CoarseGenerator and FineGenerator

Dear author:
I have a little question for you：
In the CoarseGenerator, the network input is:
x = self.conv1(torch.cat([x, ones, mask], dim=1))

In the FineGenerator, the network input is:
xnow = torch.cat([x1_inpaint, ones, mask], dim=1)

My question is that they torch.cat "mask": 0 indicates the outside known region and 1 indicates the inside missing region.
Didn't it mean that this operation masks the known region and shows the missing region? Is it right？

Thank you for your answer。

Same model, different result?

Hi，I use the pre-trained model offered by you, and I change the "network.py" file to test the "test_single.py", why I get different results with yours?

question about train

Hello, I used your code, used 5000 pictures, set the batchsize to 16, ran 50000 times and 100000 times respectively. Why is the effect of 100000 times worse?If you know, please reply me, thank you very much!

Epoch num

when I visualize with tensorboard ı can see that the training has like 35k iterations but do you have any idea about epoch num ?

Code to convert TensorFlow models

Hello,

I am wondering if you are planning to release the code used to convert the TensorFlow model to PyTorch or, alternatively, converted snapshots for the other datasets used in the original paper (e.g. CelebA, Places).

Keep up the great work!

Uncut images

can you also provide the original (uncut) images?

Problems with test results

Thank you very much for sharing. When I was testing my own graph, there was a fog in the results, and the results were not clear.

ERROR num_samples should be a positive integeral value, but got num_samples=0

Hello, I am Daisy.
I was really surprised to find out that you've finished this pytorch version of the paper recently. Thanks a lot. However, when I tried to train the model, I can't read the image file right. The error is like this.

ERROR num_samples should be a positive integeral value, but got num_samples=0

May I ask if the format of your training images are like "n02128925_9771.JPEG"? I download them from ILSVRC2012 website->Images-> Training images (Task 1 & 2).(http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads).
Thanks again!!

@daisy91530 I copied your question here in case someone meets the same problem.

Why keep only the first mask in the batch?

generative-inpainting-pytorch/model/networks.py

Line 262 in ee1fd75

m = m[0] # m shape: [L, C, k, k]

Hello, sorry to bother you. Could you please explain why you keep only the first mask in the batch, in your implementation of the contextual attention module?

cuda out of memory

hi ,
Could you please tell me which gpu you are using to trian the model ?
I use 1 1080ti 11Gb with batch size 8 or less , or 3 1080ti 11Gb with batch size 36 or less, OOM occured.

test image resolution

Hello,

I have a very basal question please, could i use a image of 1920*1080 to do test with trained data? thank you very much for helping

Problem in contextual attention module

Thanks for your reimplementation.

When I run the test_contextual_attention function in the network.py. Error occured. RuntimeError: Given transposed=1, weight of size 4096 3 4 4, expected input[1, 4032, 166, 250] to have 4096 channels, but got 4032 channels instead
In my setting, --imagA is the bnw_butterfly.png and --imageB is the bike.jpg for the offical CA repo, in the folder of examples/style_transfer.

Could you please help me with it? Thanks.

Best,

在训练时遇到问题File "D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 572, in next raise StopIteration StopIteration

大神，请问训练时出现这错误应该怎么修改呀，求大神指点

Question about computing losses

Hi,

Thank you for your work, it is very helpful.

I was wondering if you could explain why you use x2_inpaint.detach() when running a forward pass through the discriminator for calculating the discriminator's loss, but do not call detach() when calling the discriminator for calculating the generator's loss.

In this case, gradient will not be computed for the D loss for the fake images, but will for the G loss.

I understand why you use detach() when calculating the gradient penalty (as you only want to use the gradient for the GP term and not recalculate gradient for the discriminator again, but I cannot seem to understand why it is used as described above.

Thank you

License?

Can you please specify license? Preferably MIT

Is the real cos similarity being used in Contextual Attention?

cos similarity need calculate norm of the patch. But in the code, I am not find where to calculate the norm of the xi.
max_wi = torch.max(torch.sqrt(reduce_sum(torch.pow(wi, 2),axis=[1, 2, 3],keepdim=True)),escape_NaN)
wi_normed = wi / max_wi
yi = F.conv2d(xi, wi_normed, stride=1) # [1, L, H, W]
I am confuse about this question, thank you very much~

How should i get attention Visualization?

Thanks for your code, it's amazing! But how should i get attention Visualization? What command should i run?

question about test

Hello,thanks that you can provide the coda for us.
I read the code and I have some questions.
Is your code only suitable for 256*256 images？
I change the image_size in config.But in contextual layer it broke.

RuntimeError: Given transposed=1, weight of size [7238, 128, 4, 4], expected input[1, 7038, 46, 153] to have 7238 channels, but got 7038 channels instead.

In Yu's code, it can handle every size of images,so could you tell me what's the difference between yours and Yu's？
Thanks！

question about convalution layer

Hi DAA233, I found some differences between your code and Jiahuiyu's code, my question is, why at end of each gen_conv layer in Jiahuiyu's code tensor will be divided into two parts and two different activation functions will be used, do you have any idea about it? because it seems like that you just remove this part
x, y = tf.split(x, 2, 3); x = activation(x) y = tf.nn.sigmoid(y) x = x * y return x

mask is not three dimensions?

to generate mask image, the code is :
result = x * (1. - mask)
but the x is three dimensions, while mask is one dimensions,
so is it right? can the operation be carried out?

Error:RuntimeError: cuda runtime error (11)

Hello, I am Chen Longwhen.When I train the model, I get an error.The error like this.
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383
My experiment is:
Pytorch 1.1.0
torchvision 0.3.0

Difference between official inplementation

Thanks for your brilliant code

One question. In official implementation, both fine and coarse generator share one single encoder network. But in your implementation, you use different encoders for these two generator. Could this operation provide better performance?

Thanks~

StopIteration

Thank you for your code. I have a question that when i run train.py to this place，it will stopIteration.

Why use F.interpolate in networks.py?

In networks.py line 79, in the coarse model, i found that you have used "interpolate" function, but in tensorflow version, they used an deconv layer, is there any reason?
I'm learning this paper, thanks a lot !

Output of the inpainter gives a gray boundary

Most of the generated images from running "python test_tf_model.py --image examples/imagenet/imagenet_patches_ILSVRC2012_val_00008210_input.png --mask examples/center_mask_256.png --output examples/output.png --model-path torch_model.p" gives a constant gray boundary for all inputs.

Sync Batch norm not used?

Dear author:
Thanks for your re-implementation, it's helpful!
When I tried to train with multi gpus, I noticed you have not convert the batch norm to synced batch norm. I wonder if you realized it or not？ Thank you.

Attempting to train the GAN but the inpainted result is the same as the masked images (as shown with viz_images)

I am currently attempting to train the GAN but the masked images (x) and the inpainted results are the same. Do you have any idea what could be causing this?

Error backward loss generator

I trained the model on my own dataset and got the error:

I found that this line resulted in error.

generative-inpainting-pytorch/train.py

Line 135 in ee1fd75

losses['g'].backward()

Could you help me fix this?
Thank you!

What is the meaning of this parameter n_critic？

Why is it necessary to calculate the loss of the generator every five times instead of calculating it every time

bad inpainting results on CelebA?

Thank you for your code.But not I use your code to train CelebA datasets, it shows a bad inpainting effect in the validation of every 1000 iterations during the training phase.(hole_benchmark folder)。
Have you ever trained on this datasets(CelebA)? Can a well-trained model be provided?

Experiment Result

Can you post some result image in the README file, thanks.

About train dataset.

Hi~
What train dataset did you use? I mean in ImageNet2012 there are many sub-datasets, I want to know which did you use?

How can I make use of the FLOW argument?

I was looking into the code and there is a flow argument but I can't find an example that uses it so I don't know how I can make it work.

Thanks for your help

an accidental bug：loss=NaN

Hello!
I've met a bug which is hard to solve.
I've done many modification on your proposed code, and everything is fine.
Last week I do a modification on the original code in a new dataset, and I run the proposed code as a baseline. The original code works fine. Bug the modified code met this bug.

The dataset is not corrupted. And no matter how I check the code and datset

, the loss is NaN when iter<10000. Which the strange thing is when I re-run the original code, the same bug happened. But when I read the last week's original code, the training stage is all fine.

Can you run the original code ? I don't know why the loss =NaN. Can you help me solve this bug? It makes me crazy.

This is my config.yaml ,which is almost similar with yours.

About contextual attention.

Hi~ First, Your implemention is awesome, thank you! But I have some question.
1.What's the meaning of 'mm' in function 'contextual_attention'
2.I'am confused that after 'xi' and 'wi_normed' conv, then conv two times
It's kind of you if you could help me!

Training Gradient Function is Missing

When testing with your data I'm getting the training gradient function CloneBackward for interpolates and AddmmBackward for disc_interpolates but I'm not getting any gradient function (I printed out the tensors, that's how I know) when using my data. By any chance can you speculate what might be the problem, as in how is the calc_gradient_penalty producing a gradient function for these variables automatically? and how can I force a gradient function (maybe even manually) for them?

Thanks for your help

question about test

Dear author:
Thanks for your re-implementation, it's helpful! Now I have a little question for you：
In the training phase, the training image will be scaled to 256*256, the code in dataset.py is:
if self.random_crop:
imgw, imgh = img.size
if imgh < self.image_shape[0] or imgw < self.image_shape[1]:
img = transforms.Resize(min(self.image_shape))(img)
img = transforms.RandomCrop(self.image_shape)(img)
else:
img = transforms.Resize(self.image_shape)(img)
img = transforms.RandomCrop(self.image_shape)(img)

In the testing phase, the testing image will be scaled to 256*256, the code in test_single.py is:
x = transforms.Resize(config['image_shape'][:-1])(x)
x = transforms.CenterCrop(config['image_shape'][:-1])(x)
mask = transforms.Resize(config['image_shape'][:-1])(mask)
mask = transforms.CenterCrop(config['image_shape'][:-1])(mask)

The scaling standards are the same between them?
Thank you for your answer。

can this work on free form of mask?

I see from the author new implementation that it can now work on both rectangle and free-form of mask, not sure your implementation can also do the same?
Thanks,

Using the validation set

Since the validation set section is currently commented out, how do you think it is best used for this case of neural network?

Thanks

Places2 pretrained weights

Hey,
Do you have the pretrained weights for the Places2 dataset? It would be great if you can share that!

Early Stopping Integration

First off, thank you for providing this implementation! I am new to pytorch and ML in general and I'm about to train on my own dataset, but I wanted to ask if there's a simple way to integrate early stopping? Greatly appreciate any advice you can provide. Thank you in advance :)

Visualize train loss

Can you help me to visualize train loss with tensorboardx?

Leaf Variable has been moved to graph interior

Any idea why this error is occurring?

the critics updating

Hi! thank you for your code. According to the original paper, Algorithm 1 updates the two critics for 5 time every iteration. However, in your implementation, it updates both generator and critics at the same time every iteration.
Please tell me about that. I am sorry if my interpretation is mistaken.

about testing attention layer

Thank you very much for your code.
When I run test_contextual_attention() with two images from Yu's webpage, I didn't get his result.
The third in the first row is what I got, the third in the second row is from Yu's webpage.

Do you have any idea for the reason? I really appreciate your answer.
Also, when I test using two same images, I got stange result. The left image is for test. I use this image for foreground and background image. The left is what I got. I have check the codes, and didn't find the reason.
For my understanding, if using two same images, the reconstructed should be very similar to the ground truth, right?

Thank you very much for your time.