Giter Club home page Giter Club logo

t2f's Introduction

! Attention !

This project is unfortunately not being worked upon

Please head over to the following much cooler project that takes the idea of Text-2-Image generation to the next level:

DallE: Original PyTorch

⭐ [NEW] ⭐

T2F - 2.0 Teaser (coming soon ...)

2.0 Teaser

Please note that all the faces in the above samples are generated ones. The T2F 2.0 will be using MSG-GAN for the image generation module instead of ProGAN. Please refer link for more info about MSG-GAN. This update to the repository will be comeing soon 👍.

T2F

Text-to-Face generation using Deep Learning. This project combines two of the recent architectures StackGAN and ProGAN for synthesizing faces from textual descriptions.
The project uses Face2Text dataset which contains 400 facial images and textual captions for each of them. The data can be obtained by contacting either the RIVAL group or the authors of the aforementioned paper.

Some Examples:

Examples

Architecture:

Architecture Diagram

The textual description is encoded into a summary vector using an LSTM network. The summary vector, i.e. Embedding (psy_t) as shown in the diagram is passed through the Conditioning Augmentation block (a single linear layer) to obtain the textual part of the latent vector (uses VAE like reparameterization technique) for the GAN as input. The second part of the latent vector is random gaussian noise. The latent vector so produced is fed to the generator part of the GAN, while the embedding is fed to the final layer of the discriminator for conditional distribution matching. The training of the GAN progresses exactly as mentioned in the ProGAN paper; i.e. layer by layer at increasing spatial resolutions. The new layer is introduced using the fade-in technique to avoid destroying previous learning.

Running the code:

The code is present in the implementation/ subdirectory. The implementation is done using the PyTorch framework. So, for running this code, please install PyTorch version 0.4.0 before continuing.

Code organization:
configs: contains the configuration files for training the network. (You can use any one, or create your own)
data_processing: package containing data processing and loading modules
networks: package contains network implementation
processed_annotations: directory stores output of running process_text_annotations.py script
process_text_annotations.py: processes the captions and stores output in processed_annotations/ directory. (no need to run this script; the pickle file is included in the repo.)
train_network.py: script for running the training the network

Sample configuration:

# All paths to different required data objects
images_dir: "../data/LFW/lfw"
processed_text_file: "processed_annotations/processed_text.pkl"
log_dir: "training_runs/11/losses/"
sample_dir: "training_runs/11/generated_samples/"
save_dir: "training_runs/11/saved_models/"

# Hyperparameters for the Model
captions_length: 100
img_dims:
  - 64
  - 64

# LSTM hyperparameters
embedding_size: 128
hidden_size: 256
num_layers: 3  # number of LSTM cells in the encoder network

# Conditioning Augmentation hyperparameters
ca_out_size: 178

# Pro GAN hyperparameters
depth: 5
latent_size: 256
learning_rate: 0.001
beta_1: 0
beta_2: 0
eps: 0.00000001
drift: 0.001
n_critic: 1

# Training hyperparameters:
epochs:
  - 160
  - 80
  - 40
  - 20
  - 10

# % of epochs for fading in the new layer
fade_in_percentage:
  - 85
  - 85
  - 85
  - 85
  - 85

batch_sizes:
  - 16
  - 16
  - 16
  - 16
  - 16

num_workers: 3
feedback_factor: 7  # number of logs generated per epoch
checkpoint_factor: 2  # save the models after these many epochs
use_matching_aware_discriminator: True  # use the matching aware discriminator

Use the requirements.txt to install all the dependencies for the project.

$ workon [your virtual environment]
$ pip install -r requirements.txt

Sample run:

$ mkdir training_runs
$ mkdir training_runs/generated_samples training_runs/losses training_runs/saved_models
$ train_network.py --config=configs/11.comf

Other links:

blog: https://medium.com/@animeshsk3/t2f-text-to-face-generation-using-deep-learning-b3b6ba5a5a93
training_time_lapse video: https://www.youtube.com/watch?v=NO_l87rPDb8
ProGAN package (Seperate library): https://github.com/akanimax/pro_gan_pytorch

#TODO:

1.) Create a simple demo.py for running inference on the trained models

t2f's People

Contributors

ahmedhani avatar akanimax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

t2f's Issues

Use this repo or wait for v2?

Hi,
Thank you so much for your work! I just obtained the v2 of the dataset which has 10 times the images(4000 now) and wanted to get started on this task. Would it still be a good idea to use this repo or is T2F v2 right around the corner? Or can you suggest the changes I can do to bring this implementation as close to v2 as possible?
Thanks

Mode collapse

I replace ProGAN with MSG-StyleGAN as you mentioned before. I used 400 images from RIVAL group and mode collapse always happen. Any idea for this?
Thanks.

Broken code in train_network.py

I installed pro_gan_pytorch from your other github repo and ran train_network.py. This is the error I got.
Traceback (most recent call last): File "train_network.py", line 427, in <module> main(parse_arguments()) File "train_network.py", line 307, in main from pro_gan_pytorch.PRO_GAN import ConditionalProGAN ModuleNotFoundError: No module named 'pro_gan_pytorch'

Find/Create an open dataset

The closed nature of dataset used creates troubles for random contributors including me wishing to debug and improve.
If there is an open alternative it should be linked, if not - [collaboratively] created.

Image generated not good, not even comparable.

@akanimax @AhmedHani I trained the model till 6th depths with epoch(640,320,160,80,40,20) and batch size of 16 in each depth but the output in final pth file wasn't up to the mark. Plus it generates 16 images of a single description. Can you help it out like what exactly are those 16 images and why not a single image for a single description. The image generated isn't good as well. What can I do to improve that ? Can you provide the trained model of augmentation,encoder,gen and dis if image generated is good like you have provided for celeb using msg.
I tried for various depths and resolution but the output isn't clear.

Description:
he is an old man with a wrinkled face , gray hair and no beard he has dark eyes and seems happy about something

Image:
he is an old man with a wrinkled face , gray hair and no beard   he has dark eyes and seems happy about something

How do we give input to the model, and where are the processed images stored?

I have trained the model, but now I need to test it.
I took the demo.py as inspiration for the new demo, and am trying to give my custom caption as input. However I do not know how to do so.

# load the model for the demo
gen = th.nn.DataParallel(pg.Generator(depth=9))
gen.load_state_dict(th.load("GAN_GEN_SHADOW_8.pth", map_location=str(device)))

How do I change the above code for making my trained model work?

TypeError: __init__() got an unexpected keyword argument 'embedding_size'

When I run train_network.py,

I get the error:


Traceback (most recent call last):
  File "train_network.py", line 428, in <module>
    main(parse_arguments())
  File "train_network.py", line 381, in main
    device=device
TypeError: __init__() got an unexpected keyword argument 'embedding_size'

Does anyone know how to solve this?

Slack Group for GAN / Deep RL enthusiasts.

Dear watchers,

I have created a slack group for GAN and Deep RL enthusiasts. I hope we could discuss about problems faced while running code or training a GAN in general or even new potential project ideas. My hope is that if I am not available, then perhaps someone who has faced the same problem in the group could the ones in need. Proactive participation in the group will really benefit us all. I hope this group helps.

link to the group -> https://join.slack.com/t/amlrldl/shared_invite/enQtNDcyMTIxODg3NjIzLTA3MTlmMDg0YmExYjY5OTgyZTg4MTg5ZGE1YzRlYjljZmM4MzI0MTg1OTcxOTc5NDQ4ZTcwMGVkZjBjZmU5ZWM

Best regards,
Animesh

p.s. This issue will be closed in a week

Code error

Traceback (most recent call last):
File "C:/Users/zhoug/Desktop/T2F-master/implementation/train_network.py", line 427, in
main(parse_arguments())
File "C:/Users/zhoug/Desktop/T2F-master/implementation/train_network.py", line 380, in main
device=device
TypeError: init() got an unexpected keyword argument 'embedding_size'

Could you please tell me how to deal with this ERROR?

TypeError: __init__() got an unexpected keyword argument 'embedding_size'

When I run train_network.py,

I get the error:

Traceback (most recent call last):
File "train_network.py", line 428, in
main(parse_arguments())
File "train_network.py", line 381, in main
device=device
TypeError: init() got an unexpected keyword argument 'embedding_size'

Does anyone know how to solve this?

python version incompatibility

For numpy we need python version 2.7 and for tensorflow we need 3.5., 3.6., 3.7.* . this makes the python version to be installed in conflict. what to be done in this senerio?

Generator Evaluation Metric

How would we go if we wanted to implement an evaluation metric for generator part?

I have tried to load a pre-trained discriminator addition to the discriminator that trained during training. And tested the generator with pre-trained discriminator at the end of each epoch. But I am not sure if this is a good way to measure the performance of generator.

Are there any feasible methods? I have done a little literature survey to see what are the methods of evaluating gans, but they are usually for datasets with certain classes. Since we do not have classes in T2F (or do we?) I have hard time implementing methods such as Inception Score , Frechet Inception Distance etc.

One method that I found is CrossLID (https://arxiv.org/abs/1905.00643). Which also has implementation on GitHub. However I did not try to implement it yet as I am unsure if it is suitable for this dataset-model.

I am getting this error and I checked out 1 parameter was passed in Losses.py. Can you help

Traceback (most recent call last):
File "train_network.py", line 427, in
main(parse_arguments())
File "train_network.py", line 380, in main
device=device
File "/home/mukesh/env1/local/lib/python2.7/site-packages/pro_gan_pytorch/PRO_GAN.py", line 523, in init
self.loss = self.__setup_loss(loss)
File "/home/mukesh/env1/local/lib/python2.7/site-packages/pro_gan_pytorch/PRO_GAN.py", line 552, in __setup_loss
loss = losses.CondWGAN_GP(self.device, self.dis, self.drift, use_gp=True)
File "/home/mukesh/env1/local/lib/python2.7/site-packages/pro_gan_pytorch/Losses.py", line 136, in init
super().init(device, dis)
TypeError: super() takes at least 1 argument (0 given)

missing json

hi @akanimax,I downloaded lfw dataset but failed to find clean.json. It seemed that you have cleaned the lfw dataset,can you tell me how, or provide clean.json file? thanks very much.

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

I am getting this error and don't know how to solve it, can you help please?

Starting the training process ...
Traceback (most recent call last):
File "train_network.py", line 426, in
main(parse_arguments())
File "train_network.py", line 420, in main
use_matching_aware_dis=config.use_matching_aware_discriminator
File "train_network.py", line 138, in train_networks
fixed_embeddings = encoder(fixed_captions)
File "C:\Users\Giacobbe\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "D:\Download\T2F-master\T2F-master\implementation\networks\TextEncoder.py", line 42, in forward
output, (_, _) = self.network(x)
File "C:\Users\Giacobbe\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Giacobbe\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "C:\Users\Giacobbe\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Giacobbe\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\sparse.py", line 117, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:\Users\Giacobbe\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\functional.py", line 1506, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.