brownvc / ganimorph Goto Github PK

Source code and information for the ECCV 2018 paper: Gokaslan et al., 'Improving Shape Deformation in Unsupervised Image-to-Image Translation'

License: Other

Python 100.00%

ganimorph's People

Stargazers

Watchers

ganimorph's Issues

The Versions of Packages

Thanks for sharing your code with us. It will save our time if you can specify the versions of packages :)

Thanks for your work, it's great.
I have one question about the kernel size which you set in down_sample and up_sample layer, why did you choose 4x4 but in resblock you choose 3x3? It make me confused and actually 4x4 kernel usually doesn't been optimized.

difference from paper in discriminator

the paper show last layer in discriminator has kernel 1x1. but in the code it is 4x4

ERR Deadline exceeded while waiting for data from the queue!

I rename my dataset folder as trainA, trainB, testA and TestB. Then I install the tensorpack with the required version. However, the training process is stopped by "ERR Deadline exceeded while waiting for data from the queue!".
Here is the traceback:

0%| |0/1000[00:00<?,?it/s] Start Epoch 1 ...
2018-08-24 14:30:39.133698: W tensorflow/core/kernels/queue_base.cc:295] _0_input_queue: Skipping cancelled enqueue attempt with queue not closed
[0824 14:30:39 @input_source.py:204] ERR Deadline exceeded while waiting for data from the queue!
Traceback (most recent call last):
File "/media/he/80FE99D1FE99BFB8/ganimorph/ganimorph/tensorpack/train/input_source.py", line 200, in run
sess.run(self.op, feed_dict=feed, options=opt)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
DeadlineExceededError: Timed out waiting for notification
[0824 14:30:39 @input_source.py:212] EnqueueThread Exited.
[0824 14:30:50 @base.py:200] Training was stopped.
2018-08-24 14:30:50.966173: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: FIFOQueue '_0_input_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: input_deque = QueueDequeueV2component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
2018-08-24 14:30:50.966501: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: FIFOQueue '_0_input_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: input_deque = QueueDequeueV2component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
4%|3 |35/1000[00:36<16:52, 0.95it/s]
Prefetch process exited.
Prefetch process exited.
Prefetch process exited.
Prefetch process exited.

Process finished with exit code 0

Could you please tell me how I can fix it?

Toy dataset

Dear authors,

Appreciating the work!
And would it be possible for you to share the toy dataset to run the demo?

Thanks in advance

good hyperparams for large set

hi,
i got 54500 images for trainA and 54500 for trainB, when i would set steps_per_epoch= to 54500, it would take on a K80/12GB around 54 hours for one epoch ? what is a good value for larger datasets ?

When i train the 54500 dataset with 100, or 500, or 5000 steps_per_epoch the strange thing happens that every input image gives exactly the same output image ? is it because i dont use 54500, all images for steps_per_epoch?

thanks

How to run inference?

I am trying to get human → anime conversion to run. My current code is as follows:

import argparse
from pathlib import Path

import numpy as np
import skimage.io
from tensorpack import SaverRestore, PredictConfig, OfflinePredictor

from model import Model

parser = argparse.ArgumentParser()
parser.add_argument('--model_path', help='path to trained model', required=True)
parser.add_argument('--input_image_path', help='path to load input image from', required=True, type=Path)
parser.add_argument('--output_image_path', help='path to save output images', required=True, type=Path)
args = parser.parse_args()

if __name__ == '__main__':
    pred_config = PredictConfig(
        model=Model(),
        session_init=SaverRestore(args.model_path),
        input_names=['inputB'],
        output_names=['gen/A/deconv3/output:0'],
    )
    predictor = OfflinePredictor(pred_config)

    image = skimage.io.imread(args.input_image_path)
    image = image.astype(np.float32) / 255

    inputB = image.copy()[np.newaxis, ...]

    outputA, = predictor(inputB)
    outputA = (outputA[0].transpose((1, 2, 0)) * 255).astype(np.uint8)

    args.output_image_path.mkdir(exist_ok=True, parents=True)
    skimage.io.imsave(args.output_image_path / 'a.png', outputA)

As input, I am using this Brad Pitt photo. However, the output I am getting with getchu_anime/JNet_dilsc_rsep_sl_r0316-084543/model-140000.index model is:

which looks like I miss some normalization or maybe retrieve a wrong tensor.

Same code with good_anime/model-260000.index:

With good_anime/JNet_dilsc_rsep_sl0304-120055/model-180000.index:

What am I doing wrong?

Feature map loss

Hi. Because the data is unpaired. So, Is the feature map correct, eq. (1)?

Feature Matching Loss. To increase the stability of the model, our objective
function uses a feature matching loss

The eq. 1 take MSE for the feature in D network. However, the data is unpaired, so the real feature from A and fake feature from A also unpaired. Is it correct using direct MSE to compute the feature map between real A and fake A features?

Getting results from pretrained models?

Hi, First of all I would like to thanks authors for this amazing work. I am new to this field. Can you pls let me know that how to run your pretrained model good_cat2dog_face for getting results? The folder has two files model-160000.data-00000-of-00001 and model-160000.index.
User @muxgt has shared his notebook in which he mentioned the command "!python /content/ganimorph/main.py --data /content/ganimorph/datasets/cat_dog_face --load /content/ganimorph/checkpoints/model-160000" I have tried this also, but giving error.
Pls help me regarding this matter. Thanks in advance.

Where is model.get_inputs_desc() defined?

Hi @Skylion007 ,

Thanks for publishing your code. Upon trying to run the code, I get the following error-

I see that there is no function called 'get_inputs_desc' in model.py? How to make it work then? Could you please advise?

No definition about moving_averages

Hi, I find that one interesting loss in the paper is the schedule loss normalization (SLN), but I didn't see the code implementation for this part (please check the below code line), it seems that "moving_averages" is missing in the current version of code. Looking forward the reply~ Thanks in advanced.

ganimorph/model.py

Line 106 in cf94339

moving_averages.assign_moving_average(

Deadline exceeded while waiting for data from queue

Regarding #5 , seemed like it is related to this line in tensorpack.
I'm guessing the model saving op cost more than 10 seconds and results in a DeadlineExceededError.

I modified 10000 to 20000 and it worked....

How to make the code run default on GPU?

Hi @Skylion007

I'm observing that the default trainer function SeparateGANTrainer is running on CPU, which is quite slow. Is there a way to run it on multi-GPU?

Although, I see that you've provided GPU supported trainers such as MultiGPUGANTrainer, but I'm wondering why these functions do not support the 2:1 training strategy you are using in SeparateGANTrainer. Is there a wayout?

Also, please let me know if I'm misunderstanding anything. Thanks!

Are the cropped cats and dogs datasets available?

Still Coming Soon?

Is this a working implementation?

Any brief setup and train/test hints?

Comparing Different models

Hi,
Could you please tell me how you compared different models? Did you use the same learning rate, number of epochs, Number of decay epochs, image size, optimizer among all models? Also, did you collect test results using the final saved generator or did you use the best results testing all saved generators at different epochs?

Dataset structure

I would like to try the model on my own dataset. What is the required structure of the folders?

tensorflow 2.0 implementation

hi,
i started a tensorflow 2.0 implementation of ganimorph. First i learn more about GANs, second it seemed more usable with gcp and more GPUs/TPUs (which would reduce training time) or hyperparametertuning by gcp and other stuff. It could also be trained/developed for free with https://colab.research.google.com/notebooks/welcome.ipynb (GPU,TPU support) , where you can clone the github repository from inside colab notebook. very simple.

Would be nice if one could review the code or help writing it :)

Or download/clone it:
https://github.com/flobotics/colab.git

flo

how does this run ?

hi,
i try your ganimorph implementation , but i dont get it run . You write that you use tensorpack with commit-id 01cab873feced688070d3ab113530f03c2525723 but in code you use FeedfreeTrainerBase , which only exists till v-0.5 , your commit id refers to v-0.9.01 ? what is correct.

thanks

Pale images in pretrained model

Hi! First of all thanks for your work! I'm trying to reproduce your results with the pretrained model on Cats2Dogs. I'm using Google Colab and in TensorBoard the result looks like this:

Since there is no "test" option I put steps_per_epoch = 0 and visualising after each epoch. Didn't change anything else in the code.
Do you know what the problem may be?

It is a great work! Can you share your trained model firstly?

runtime error

Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node LossA/GAN_loss/discrim/ArithmeticOptimizer/HoistCommonFactor_Add_loss is missing output properties at position :0 (num_outputs=0)

how can i fix it? thx

question to updated code

hi,
i saw your code changes for tensorpack. You write its for tensorpack==0.8.9, but then i get the error:
ImportError: cannot import name 'MovingAverageSummary'

If i install tensorpack==0.9.4 it works. (with tensorflow 1.13)

But with multi-gpu it is not faster than with one gpu ? I changed line 78 in GAN.py to
def init(self, input, model, num_gpu=2):

and in main.py i changed SeparateGANTrainer( to GANTrainer(

Is that correct, or do i something wrong ? Is it possible to use more than 2 GPUs ? e.g. 8 GPUs ?
thanks
flo