Giter Club home page Giter Club logo

kkanshul / finegan Goto Github PK

View Code? Open in Web Editor NEW
277.0 15.0 43.0 14.27 MB

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-grained Object Generation and Discovery

Home Page: http://krsingh.cs.ucdavis.edu/krishna_files/papers/finegan/index.html

License: BSD 2-Clause "Simplified" License

Python 100.00%
gans computer-vision pytorch fine-grained disentangled-representations image-generation image-manipulation deep-learning generative-adversarial-network

finegan's Introduction

FineGAN

Pytorch implementation for learning to synthesize images in a hierarchical, stagewise manner by disentangling background, object shape and object appearance.



FineGAN: Unsupervised Hierarchical Disentanglement for Fine-grained Object Generation and Discovery

Krishna Kumar Singh*, Utkarsh Ojha*, Yong Jae Lee
project | arxiv | demo video | talk video
CVPR 2019 (Oral Presentation)

Architecture


Requirements

  • Linux
  • Python 2.7
  • Pytorch 0.4.1
  • TensorboardX 1.2
  • NVIDIA GPU + CUDA CuDNN

Getting started

Clone the repository

git clone https://github.com/kkanshul/finegan
cd finegan

Setting up the data

Note: You only need to download the data if you wish to train your own model.

Download the formatted CUB data from this link and extract it inside the data directory

cd data
unzip birds.zip
cd ..

Downloading pretrained models

Pretrained generator models for CUB, Stanford Dogs are available at this link. Download and extract them in the models directory.

cd models
unzip netG.zip
cd ../code/

Evaluating the model

In cfg/eval.yml:

  • Specify the model path in TRAIN.NET_G.
  • Specify the output directory to save the generated images in SAVE_DIR.
  • Specify the number of super and fine-grained categories in SUPER_CATEGORIES and FINE_GRAINED_CATEGORIES according to our paper.
  • Specify the option for using 'tied' latent codes in TIED_CODES:
    • if True, specify the child code in TEST_CHILD_CLASS. The background and parent codes are derived through the child code in this case.
    • if False, i.e. no relationship between parent, child or background code, specify each of them in TEST_PARENT_CLASS, TEST_CHILD_CLASS and TEST_BACKGROUND_CLASS respectively.
  • Run python main.py --cfg cfg/eval.yml --gpu 0

Training your own model

In cfg/train.yml:

  • Specify the dataset location in DATA_DIR.
    • NOTE: If you wish to train this on your own (different) dataset, please make sure it is formatted in a way similar to the CUB dataset that we've provided.
  • Specify the number of super and fine-grained categories that you wish for FineGAN to discover, in SUPER_CATEGORIES and FINE_GRAINED_CATEGORIES.
  • Specify the training hyperparameters in TRAIN.
  • Run python main.py --cfg cfg/train.yml --gpu 0

Sample generation results of FineGAN

1. Stage wise image generation results

2. Grouping among the generated images (child).

Citation

If you find this code useful in your research, consider citing our work:

@inproceedings{singh-cvpr2019,
  title = {FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery},
  author = {Krishna Kumar Singh and Utkarsh Ojha and Yong Jae Lee},
  booktitle = {CVPR},
  year = {2019}
}

Acknowledgement

We thank the authors of StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks for releasing their source code.

Contact

For any questions regarding our paper or code, contact Krishna Kumar Singh and Utkarsh Ojha.

finegan's People

Contributors

kkanshul avatar utkarshojha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finegan's Issues

About fine-tuned InceptionV3 .pth file

Hi, what a wonderfull work!!!
When i was eval the network picture, IS value is up to 4.* without fine-tuned inception net in respective datasets.
Could u release the fine-tuned inceptionV3 network pth file on respective dataset(birds, cars, dogs)?
thx!!!

bounding box to mask

hello,
i'm interested in your work but i'm confused about how you transfer the bounding box to the bird mask.
after reading your code, i still cannot figure out the process. what does "warped_bbox" mean in your code?
looking for your reply
the related code is shown below

`x1 = self.warped_bbox[0][i]
x2 = self.warped_bbox[2][i]
y1 = self.warped_bbox[1][i]
y2 = self.warped_bbox[3][i]

                a1 = max(torch.tensor(0).float().cuda(), torch.ceil((x1 - self.recp_field)/self.patch_stride))
                a2 = min(torch.tensor(self.n_out - 1).float().cuda(), torch.floor((self.n_out - 1) - ((126 - self.recp_field) - x2)/self.patch_stride)) + 1
                b1 = max(torch.tensor(0).float().cuda(), torch.ceil((y1 - self.recp_field)/self.patch_stride))
                b2 = min(torch.tensor(self.n_out - 1).float().cuda(), torch.floor((self.n_out - 1) - ((126 - self.recp_field) - y2)/self.patch_stride)) + 1

                if (x1 != x2 and y1 != y2):
                        weights_real[i, :, a1.type(torch.int) : a2.type(torch.int) , b1.type(torch.int) : b2.type(torch.int)] = 0.0`

About fine-tuned inception model file

Hi! When i test your fine-tuned model file, it happens some errors:
Error(s) in loading state_dict for Inception3: Missing key(s) in state_dict: "AuxLogits.fc.weight", "AuxLogits.fc.bias", "fc.weight", "fc.bias". Unexpected key(s) in state_dict: "fc_new.weight", "fc_new.bias", "AuxLogits.fc_new.weight", "AuxLogits.fc_new.bias".
Could u mind send a fine-tuned inception model file(inceptionv2.py) for me, thx!!!

Other Question~, how to fine-tuned the inception model on bird datasets?

The update of parameters in G

Hi, I found that you do this after every time you call train_Gnet function.

 for p, avg_p in zip(self.netG.parameters(), avg_param_G):
     avg_p.mul_(0.999).add_(0.001, p.data)

avg_param_G is the final parameters of your saved model. It looks like you update your model again after opt.step(). Why don't you save the final model directly but use this avg_param_G?

Could you explain this to me? Thank you!

Unsupervised clustering on real images

In the paper it's mentioned that from the automatically learned representations , clustering is done on real images. Can you please show or direct,e to the example/code used for doing it?

some questions about dataset

Thanks for your work!
I can‘t download your formatted CUB data. So I download the original CUB data and resize images to 128×128 for train. I don't know whether it's right. Could you share your method of preparing dataset.

And another question, how do you get the real background images during background stage? Because the discriminator D_b needs both real background images and fake background images.

Score for background and foreground

Hi, thanks for releasing code. It's a great job. But I have some tiny questions about the classification score of background. In line 337 of model.py, the background score is set to 0
classi_score = self.uncond_logits1(x_code) # Background vs Foreground classification score (0 - background and 1 - foreground)
It's consistent with your statement in your paper. But in lines 227 and 330 of trainer.py, the score of background seems to be set to 1.
errD_real_uncond_classi = criterion(real_logits[0], weights_real) # Background/foreground classification loss
errG_classi = criterion_one(outputs[0], real_labels) # Background/Foreground classification loss for the fake background image (on patch level)
So, I am confused, could you point anything wrong about my understanding, that will be very helpful, thank you.

high res version

Thank you for sharing.
Did you try high res ver (like 256x or 512x)?
If not, what difficulties can be considered?

About the data and pretrain models

Could u pls upload the formatted CUB data and pretrained models to the Baidu Cloud(https://pan.baidu.com)? It's very hard for us to download large files in China. Sorry for that, it's just unstable to download something from Google driver. I don't know if it's convenient for u to enter this website (https://pan.baidu.com). If possible, pls upload it for us. Thank u very much.

Reproduce Table 1 IS and FID score on Birds

Hi, nice work!

I am evaluating your released model on Birds dataset. Particularly, I want to get the number claimed in your paper to make sure that everything I have done is correct. However, in your paper, it is claimed IS=52.53±0.45 and FID=11.25. But I got IS=43.20±0.54 and FID=22.08. I think I might have made some mistakes in details.

What I did

I generate 30K 128x128 child images with your released model on Birds, wherein 150 images for each child category.
(1) I compute the IS with your released finetuned inception model. The generated images are resized to 299x299 normalized to be within [-1, 1] before fed into the network. Mean and std are computed over 10 splits.
(2) I compute the FID with default inception model using

calculate_fid_given_paths([/path/to/generated/images, /path/to/real/images], batch_size=1, cuda=True, dims=2048)

Note that /path/to/real/images are original CUB images without any cropping or resizing.

Evaluation codes

IS: https://github.com/sbarratt/inception-score-pytorch
FID: https://github.com/mseitzer/pytorch-fid

Questions

  1. Do you use the finetuned inception model for computing FID?
  2. Should I use the images that are cropped with 1.5x bounding boxes from original images to compute FID?
  3. Should I first resize the real images to 128x128 and then feed them into the inception network (which automatically resizes input to 299x299) to compute FID?
  4. Is the number I got lies in the normal variation? I suppose the quality of generated images may be different for different times of generation.

B.T.W. I am also curious about the results of LR-GAN you got in Table1. Do you train LR-GAN on the original CUB images or on the cropped images?

B.T.W. I am using pytorch==1.3.0. Not sure if there is any version issue.

FID-calculate-method

Thank you for sharing.
I have a question about how to calculate the fid score between 'birds' dataset. Did u use the fine-tuned inceoption network to calculate the fid score or use other official fid calculate function ?

Training for parent discriminator

Hi, thanks for publishing your code. But after I reading the code, it makes me confused that the parent discriminator seems not to be updated anymore after initialized. It seems to be different from the paper. Have I ever missed anything important?
Looking forward to your answer, thanks!

About test image generation

Im confused about your paper word 'We evaluate image generation using Inception Score (IS) [42] and Frechet Inception Distance (FID) [20], which are computed on 30K randomly generated images (equal number of images for each child code c)'
How can i generate 30k test image on CUB200 dataset? : )
Because in CUB200, parent class is 20, child class is 200, background class is 200.So that the final image is about 20x200x200= 800K. How can i get the 30K test image?

Also, how calc the FID value of CUB200 dataset? (e.g how to get the CUB200 mu and sigma?)

Hope for your reply~~~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.