Giter Club home page Giter Club logo

alae's Introduction


[CVPR2020] Adversarial Latent Autoencoders

Stanislav PidhorskyiDonald A. Adjeroh Gianfranco Doretto

Official repository of the paper

pytorch version

Google Drive folder with models and qualitative results

ALAE

Adversarial Latent Autoencoders
Stanislav Pidhorskyi, Donald Adjeroh, Gianfranco Doretto

Abstract: Autoencoder networks are unsupervised approaches aiming at combining generative and representational properties by learning simultaneously an encoder-generator map. Although studied extensively, the issues of whether they have the same generative power of GANs, or learn disentangled representations, have not been fully addressed. We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE). It is a general architecture that can leverage recent improvements on GAN training procedures. We designed two autoencoders: one based on a MLP encoder, and another based on a StyleGAN generator, which we call StyleALAE. We verify the disentanglement properties of both architectures. We show that StyleALAE can not only generate 1024x1024 face images with comparable quality of StyleGAN, but at the same resolution can also produce face reconstructions and manipulations based on real images. This makes ALAE the first autoencoder able to compare with, and go beyond the capabilities of a generator-only type of architecture.

Citation

  • Stanislav Pidhorskyi, Donald A. Adjeroh, and Gianfranco Doretto. Adversarial Latent Autoencoders. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [to appear]
@InProceedings{pidhorskyi2020adversarial,
 author   = {Pidhorskyi, Stanislav and Adjeroh, Donald A and Doretto, Gianfranco},
 booktitle = {Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)},
 title    = {Adversarial Latent Autoencoders},
 year     = {2020},
 note     = {[to appear]},
}

preprint on arXiv: 2004.04467

To run the demo

To run the demo, you will need to have a CUDA capable GPU, PyTorch >= v1.3.1 and cuda/cuDNN drivers installed. Install the required packages:

pip install -r requirements.txt

Download pre-trained models:

python training_artifacts/download_all.py

Run the demo:

python interactive_demo.py

You can specify yaml config to use. Configs are located here: https://github.com/podgorskiy/ALAE/tree/master/configs. By default, it uses one for FFHQ dataset. You can change the config using -c parameter. To run on celeb-hq in 256x256 resolution, run:

python interactive_demo.py -c celeba-hq256

However, for configs other then FFHQ, you need to obtain new principal direction vectors for the attributes.

Repository organization

Running scripts

The code in the repository is organized in such a way that all scripts must be run from the root of the repository. If you use an IDE (e.g. PyCharm or Visual Studio Code), just set Working Directory to point to the root of the repository.

If you want to run from the command line, then you also need to set PYTHONPATH variable to point to the root of the repository.

For example, let's say we've cloned repository to ~/ALAE directory, then do:

$ cd ~/ALAE
$ export PYTHONPATH=$PYTHONPATH:$(pwd)

pythonpath

Now you can run scripts as follows:

$ python style_mixing/stylemix.py

Repository structure

Path Description
ALAE Repository root folder
├  configs Folder with yaml config files.
│  ├  bedroom.yaml Config file for LSUN bedroom dataset at 256x256 resolution.
│  ├  celeba.yaml Config file for CelebA dataset at 128x128 resolution.
│  ├  celeba-hq256.yaml Config file for CelebA-HQ dataset at 256x256 resolution.
│  ├  celeba_ablation_nostyle.yaml Config file for CelebA 128x128 dataset for ablation study (no styles).
│  ├  celeba_ablation_separate.yaml Config file for CelebA 128x128 dataset for ablation study (separate encoder and discriminator).
│  ├  celeba_ablation_z_reg.yaml Config file for CelebA 128x128 dataset for ablation study (regress in Z space, not W).
│  ├  ffhq.yaml Config file for FFHQ dataset at 1024x1024 resolution.
│  ├  mnist.yaml Config file for MNIST dataset using Style architecture.
│  └  mnist_fc.yaml Config file for MNIST dataset using only fully connected layers (Permutation Invariant MNIST).
├  dataset_preparation Folder with scripts for dataset preparation.
│  ├  prepare_celeba_hq_tfrec.py To prepare TFRecords for CelebA-HQ dataset at 256x256 resolution.
│  ├  prepare_celeba_tfrec.py To prepare TFRecords for CelebA dataset at 128x128 resolution.
│  ├  prepare_mnist_tfrec.py To prepare TFRecords for MNIST dataset.
│  ├  split_tfrecords_bedroom.py To split official TFRecords from StyleGAN paper for LSUN bedroom dataset.
│  └  split_tfrecords_ffhq.py To split official TFRecords from StyleGAN paper for FFHQ dataset.
├  dataset_samples Folder with sample inputs for different datasets. Used for figures and for test inputs during training.
├  make_figures Scripts for making various figures.
├  metrics Scripts for computing metrics.
├  principal_directions Scripts for computing principal direction vectors for various attributes. For interactive demo.
├  style_mixing Sample inputs and script for producing style-mixing figures.
├  training_artifacts Default place for saving checkpoints/sample outputs/plots.
│  └  download_all.py Script for downloading all pretrained models.
├  interactive_demo.py Runnable script for interactive demo.
├  train_alae.py Runnable script for training.
├  train_alae_separate.py Runnable script for training for ablation study (separate encoder and discriminator).
├  checkpointer.py Module for saving/restoring model weights, optimizer state and loss history.
├  custom_adam.py Customized adam optimizer for learning rate equalization and zero second beta.
├  dataloader.py Module with dataset classes, loaders, iterators, etc.
├  defaults.py Definition for config variables with default values.
├  launcher.py Helper for running multi-GPU, multiprocess training. Sets up config and logging.
├  lod_driver.py Helper class for managing growing/stabilizing network.
├  lreq.py Custom Linear, Conv2d and ConvTranspose2d modules for learning rate equalization.
├  model.py Module with high-level model definition.
├  model_separate.py Same as above, but for ablation study.
├  net.py Definition of all network blocks for multiple architectures.
├  registry.py Registry of network blocks for selecting from config file.
├  scheduler.py Custom schedulers with warm start and aggregating several optimizers.
├  tracker.py Module for plotting losses.
└  utils.py Decorator for async call, decorator for caching, registry for network blocks.

Configs

In this codebase yacs is used to handle configurations.

Most of the runnable scripts accept -c parameter that can specify config files to use. For example, to make reconstruction figures, you can run:

python make_figures/make_recon_figure_paged.py
python make_figures/make_recon_figure_paged.py -c celeba
python make_figures/make_recon_figure_paged.py -c celeba-hq256
python make_figures/make_recon_figure_paged.py -c bedroom

The Default config is ffhq.

Datasets

Training is done using TFRecords. TFRecords are read using DareBlopy, which allows using them with Pytorch.

In config files as well as in all preparation scripts, it is assumed that all datasets are in /data/datasets/. You can either change path in config files, either create a symlink to where you store datasets.

The official way of generating CelebA-HQ can be challenging. Please refer to this page: https://github.com/suvojit-0x55aa/celebA-HQ-dataset-download You can get the pre-generated dataset from: https://drive.google.com/drive/folders/11Vz0fqHS2rXDb5pprgTjpD7S2BAJhi1P

Pre-trained models

To download pre-trained models run:

python training_artifacts/download_all.py

Note: There used to be problems with downloading models from Google Drive due to download limit. Now, the script is setup in a such way that if it fails to download data from Google Drive it will try to download it from S3.

If you experience problems, try deleting all *.pth files, updating dlutils package (pip install dlutils --upgrade) and then run download_all.py again. If that does not solve the problem, please open an issue. Also, you can try downloading models manually from here: https://drive.google.com/drive/folders/1tsI1q1u8QRX5t7_lWCSjpniLGlNY-3VY?usp=sharing

In config files, OUTPUT_DIR points to where weights are saved to and read from. For example: OUTPUT_DIR: training_artifacts/celeba-hq256

In OUTPUT_DIR it saves a file last_checkpoint which contains path to the actual .pth pickle with model weight. If you want to test the model with a specific weight file, you can simply modify last_checkpoint file.

Generating figures

Style-mixing

To generate style-mixing figures run:

python style_mixing/stylemix.py -c <config>

Where instead of <config> put one of: ffhq, celeba, celeba-hq256, bedroom

Reconstructions

To generate reconstruction with multiple scale images:

python make_figures/make_recon_figure_multires.py -c <config>

To generate reconstruction from all sample inputs on multiple pages:

python make_figures/make_recon_figure_paged.py -c <config>

There are also:

python make_figures/old/make_recon_figure_celeba.py
python make_figures/old/make_recon_figure_bed.py

To generate reconstruction from test set of FFHQ:

python make_figures/make_recon_figure_ffhq_real.py

To generate interpolation figure:

python make_figures/make_recon_figure_interpolation.py -c <config>

To generate traversals figure:

(For datasets other then FFHQ, you will need to find principal directions first)

python make_figures/make_traversarls.py -c <config>

Generations

To make generation figure run:

make_generation_figure.py -c <config>

Training

In addition to installing required packages:

pip install -r requirements.txt

You will need to install DareBlopy:

pip install dareblopy

To run training:

python train_alae.py -c <config>

It will run multi-GPU training on all available GPUs. It uses DistributedDataParallel for parallelism. If only one GPU available, it will run on single GPU, no special care is needed.

The recommended number of GPUs is 8. Reproducibility on a smaller number of GPUs may have issues. You might need to adjust the batch size in the config file depending on the memory size of the GPUs.

Running metrics

In addition to installing required packages and DareBlopy, you need to install TensorFlow and dnnlib from StyleGAN.

Tensorflow must be of version 1.10:

pip install tensorflow-gpu==1.10

It requires CUDA version 9.0.

Perhaps, the best way is to use Anaconda to handle this, but I prefer installing CUDA 9.0 from pop-os repositories (works on Ubuntu):

sudo echo "deb http://apt.pop-os.org/proprietary bionic main" | sudo tee -a /etc/apt/sources.list.d/pop-proprietary.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key 204DD8AEC33A7AFF
sudo apt update

sudo apt install system76-cuda-9.0
sudo apt install system76-cudnn-9.0

Then just set LD_LIBRARY_PATH variable:

export LD_LIBRARY_PATH=/usr/lib/cuda-9.0/lib64

Dnnlib is a package used in StyleGAN. You can install it with:

pip install https://github.com/podgorskiy/dnnlib/releases/download/0.0.1/dnnlib-0.0.1-py3-none-any.whl

All code for running metrics is heavily based on those from StyleGAN repository. It also uses the same pre-trained models:

https://github.com/NVlabs/stylegan#licenses

inception_v3_features.pkl and inception_v3_softmax.pkl are derived from the pre-trained Inception-v3 network by Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. The network was originally shared under Apache 2.0 license on the TensorFlow Models repository.

vgg16.pkl and vgg16_zhang_perceptual.pkl are derived from the pre-trained VGG-16 network by Karen Simonyan and Andrew Zisserman. The network was originally shared under Creative Commons BY 4.0 license on the Very Deep Convolutional Networks for Large-Scale Visual Recognition project page.

vgg16_zhang_perceptual.pkl is further derived from the pre-trained LPIPS weights by Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The weights were originally shared under BSD 2-Clause "Simplified" License on the PerceptualSimilarity repository.

Finally, to run metrics:

python metrics/fid.py -c <config>       # FID score on generations
python metrics/fid_rec.py -c <config>   # FID score on reconstructions
python metrics/ppl.py -c <config>       # PPL score on generations
python metrics/lpips.py -c <config>     # LPIPS score of reconstructions

alae's People

Contributors

jxcodetw avatar mattybv3 avatar podgorskiy avatar rolisz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alae's Issues

The display pane width is zero at start

Windows 10, Anaconda, PyTorch 1.4.0, CUDA 10.1 (for GeForce RTX 2080 with Max-Q Design), Visual Studio C++ 14.0 (for bimpy complication).

I run without any errors, yet the screen presents no results:

image

The narrow vertical line is not expandable.

EDIT:

On python interactive_demo.py -c celeba-hq256 I've found it is possible to find the lower right corner and resize it manually. On the original one, it is out of my reach.

image

Error on style_ranges[row]

I added 9 images in the src and dst folder of /ALAE/style_mixing/test_images/set_ffhq/dst
updated src_len and dst_len in the code.
But got following error.

2020-04-29 22:11:35,659 logger INFO: Trainable parameters generator: 2020-04-29 22:11:35,660 logger INFO: Trainable parameters discriminator: 2020-04-29 22:11:35,660 logger INFO: Loading checkpoint from training_artifacts/ffhq/model_157.pth 2020-04-29 22:11:35,918 logger INFO: Model trained for 157 epochs Traceback (most recent call last): File "style_mixing/stylemix.py", line 192, in <module> world_size=gpu_count, write_log=False) File "/home/ubuntu/projects/ALAE/launcher.py", line 131, in run _run(0, world_size, fn, defaults, write_log, no_cuda, args) File "/home/ubuntu/projects/ALAE/launcher.py", line 96, in _run fn(**matching_args) File "style_mixing/stylemix.py", line 42, in main _main(cfg, logger) File "style_mixing/stylemix.py", line 180, in _main style = mix_styles(src_latents, row_latents, style_ranges[row]) IndexError: list index out of range

Training Time

How long did it take to train the style_alae 1024

Async call to save_pic seems to cause some stability issue

train_alae.py seems to crash from time to time with the below error:

902it [04:31, 3.29it/s]Exception ignored in: <function Image.del at 0x7ff8eb6549d8>
Traceback (most recent call last):
File "/usr/lib/python3.7/tkinter/init.py", line 3507, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
Abort

As a workaround, I deleted the line @utils.async_func before def save_pic(x_rec) and this seems to solve the problem. It looks like some thread safety issue with the PIL Image destructor.
Update: I still got a crash with the workaround.

load pretrained model from lastest LOD

Dear @podgorskiy :
Thank you for your great work ALAE! when I Continue to train the LOD=7 , I load checkpoint from lod_6.pth, but the loss is 100 times larger than LOD6 , and the sample image is bad . not a normal RGB image. It seems not load checkpoint well. When I sample images after each loss.backward ,I found the first picture(LOD=7, after the first loss.backward) is normal, as the time goes, the picture is broken dowm. It seems checkpoint is load well, but somthine about LOD6 and LOD7 is lost, Counld you help me ? Thank you !

what is interactive_slider.py

Hello, I am about to find some new directions in W space. But, what is interactive_slider.py in principal_directions/README.md? I didn't find such a file in the repository. Does anybody know? Many thanks.

Hitting error: 'WarmupMultiStepLR' has no attribute 'verbose'

I'm trying to trying on MNIST and hitting the above error. Here's the backtrace:

Traceback (most recent call last):
  File "train_alae.py", line 352, in <module>
    run(train, get_cfg_defaults(), description='StyleGAN', default_config='configs/mnist.yaml',
  File "/home/james/src/spliqsml/ALAE/launcher.py", line 131, in run
    _run(0, world_size, fn, defaults, write_log, no_cuda, args)
  File "/home/james/src/spliqsml/ALAE/launcher.py", line 96, in _run
    fn(**matching_args)
  File "train_alae.py", line 186, in train
    scheduler = ComboMultiStepLR(optimizers=
  File "/home/james/src/spliqsml/ALAE/scheduler.py", line 91, in __init__
    self.schedulers[name] = WarmupMultiStepLR(opt, lr=base_lr, **kwargs)
  File "/home/james/src/spliqsml/ALAE/scheduler.py", line 52, in __init__
    self.step(last_epoch)
  File "/home/james/anaconda3/envs/tf2/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 166, in step
    self.print_lr(self.verbose, i, lr, epoch)
AttributeError: 'WarmupMultiStepLR' object has no attribute 'verbose'

Kinda strange, since it seems like that's in your code, but called by torch's lr_scheduler... ??

tensorflow 2.3.1
libcudart.so.10.1

Went through all the setup on new conda env and ran with: python train_alae.py -c mnist

Any help appreciated.

VRAM

How much vram is needed to train the 1024 from scratch?

Exception: process 0 terminated with signal SIGABRT

Hello, I'm trying to get this running in a conda environment and running into SIGABRT when starting training.

I've installed all the listed dependencies and created the tfrecords for my dataset by modifying prepare_celeba_hq_tfrecords.py to grab images from my own folder instead. This all seemed to go fine, but when training I get the following error:

Traceback (most recent call last):
  File "train_alae.py", line 375, in <module>
    run(train, get_cfg_defaults(), description="StyleGAN", default_config="configs/ffhq.yaml", world_size=gpu_count)
  File "/home/hans/code/ALAE/launcher.py", line 122, in run
    mp.spawn(_run, args=(world_size, fn, defaults, write_log, no_cuda, args), nprocs=world_size, join=True)
  File "/home/hans/.conda/envs/alae/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/hans/.conda/envs/alae/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/hans/.conda/envs/alae/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 108, in join
    (error_index, name)
Exception: process 0 terminated with signal SIGABRT

The traceback isn't very useful, but I believe it's happening on the model.train() call just before for x_orig in tqdm(batches): in train_alae.py.

I'm running this on Ubuntu 18.10 with 2x1080Ti GPUs with NVidia driver version: 435.21 / CUDA version: 10.1.

I've tried installing cudatoolkit=9.0 with conda, but then the environment can only solve up to pytorch=1.1. With cudatoolkit=9.2 I was able to get pytorch=1.3, but with the same result.

My conda env:

# packages in environment at /home/hans/.conda/envs/alae:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.9.0                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
blas                      1.0                         mkl  
ca-certificates           2020.1.1                      0  
certifi                   2020.4.5.1               py36_0  
cudatoolkit               10.1.243             h6bb024c_0  
cycler                    0.10.0                   pypi_0    pypi
dareblopy                 0.0.2                    pypi_0    pypi
dlutils                   0.0.12                   pypi_0    pypi
dnnlib                    0.0.1                    pypi_0    pypi
freetype                  2.9.1                h8a8886c_1  
future                    0.18.2                   pypi_0    pypi
gast                      0.3.3                    pypi_0    pypi
grpcio                    1.28.1                   pypi_0    pypi
imageio                   2.8.0                    pypi_0    pypi
intel-openmp              2020.0                      166  
joblib                    0.14.1                   pypi_0    pypi
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.2.0                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_0  
markdown                  3.2.1                    pypi_0    pypi
matplotlib                3.2.1                    pypi_0    pypi
mkl                       2020.0                      166  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.2                  he6710b0_0  
ninja                     1.9.0            py36hfd86e86_0  
numpy                     1.14.5                   pypi_0    pypi
olefile                   0.46                     py36_0  
openssl                   1.1.1g               h7b6447c_0  
packaging                 20.3                     pypi_0    pypi
pillow                    7.0.0            py36hb39fc2d_0  
pip                       20.0.2                   py36_1  
protobuf                  3.11.3                   pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
python                    3.6.10               hcf32534_1  
python-dateutil           2.8.1                    pypi_0    pypi
pytorch                   1.5.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
pyyaml                    5.3.1                    pypi_0    pypi
readline                  8.0                  h7b6447c_0  
scikit-learn              0.22.2.post1             pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
setuptools                39.1.0                   pypi_0    pypi
six                       1.14.0                   py36_0  
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.31.1               h62c20be_1  
tensorboard               1.10.0                   pypi_0    pypi
tensorflow-gpu            1.10.0                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.8                hbc83047_0  
torchvision               0.6.0                py36_cu101    pytorch
tqdm                      4.45.0                   pypi_0    pypi
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.34.2                   py36_0  
xz                        5.2.5                h7b6447c_0  
yacs                      0.1.7                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

Typo in paper

I'm sorry I know this is not a place for this but it is the only way I knew I how to contact.

In the Paper
It is written "possible with SyleGAN alone," 2 lines above Acknowledgments.

Nice work BTW.

Question about output size of D network

Output of original StyleGAN's discriminator is a scalar, predicting whether the given image is real or fake. However, the output shape of your D network is batch x (2 * dlatent_size) in the line below.

ALAE/net.py

Line 893 in 5d8362f

outputs = 2 * dlatent_size if i == mapping_layers - 1 else mapping_fmaps

Therefore, you selected one element among 2*dlatent_size elements as the final output of D network (which is used for loss function) in the line below (Z_).

ALAE/model.py

Line 111 in 5d8362f

return Z[:, :1], Z_[:, 1, 0]

I'm curious why the output shape of D network is batch x (2 * dlatent_size), since only one element is used for training and the others are useless.

Plus, I can't understand why the output of D network is reshaped like this.

ALAE/net.py

Line 903 in 5d8362f

return x.view(x.shape[0], 2, x.shape[2] // 2)

google colab interactive_demo error

Hi,
Congratulations & thanks for this paper! I am running interactive_demo on colab and getting non-stop error

ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to compile fragment shader! ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to link shader program! ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to compile vertex shader! ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to compile fragment shader! ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to link shader program! ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to compile vertex shader! ERROR: ImGui_ImplOpenGL3_CreateDeviceObjects: failed to compile fragment shader!
Any solution?
Greetings!

Training on Google CoLab

Hey, the paper is amazing. I really loved it. I am trying to recreate the MNIST experiment that you did.
I only have Google CoLab as compute resource.

I prepared mnist TFRecords using available prepare_mnist_tfrecords.py. The created data dir is in the root directory.

When I try to train with mnist_fc.yaml as config file, training seems to start. But after showing this log nothing happens.

image

I tried to debug the issue,
batches = make_dataloader(cfg, logger, dataset, lod2batch.get_per_GPU_batch_size(), local_rank)

After this line print('debug') is not printing debug. Inside make_dataloader function. The same is not happening after,
batches = db.data_loader(iter(dataset), BatchCollator(local_rank), len(dataset) // GPU_batch_size)

Am I not providing something to the training script? I have installed dareblopy in my runtime.

To reproduce the experiment you can check out this notebook.

What's happening? Any suggestion.
Thank you.

Custom image

Unable to generate female version of custom image.
Can any one help me?
Thank you in advance.

Reconstructed image of custom image is image of completely different person with same default settings.

Fine-tuning trained model on new dataset

A big advantage of StyleGAN is the seamless fine-tuning process, where a previous checkpoint can be used as a starting point for training on a new dataset (say for example fine-tune the FFHQ model on paintings).

Is this possible for ALAE too? Do you have any pointers or feedback on how to approach it?

Hi There

Just wanted to see if this is working or not

Trying to train on Google CoLab

Hey, it's a super interesting paper and reading experience of the same was awesome. Really interesting work.

I am trying to replicate your work to get more insight. I tried training on Google CoLab, Here's the link to the notebook: https://colab.research.google.com/drive/14CpH6eU4XsHPN_y4lhfpEZGHPpXT0AxL

  • The training script seem to take all the parameters from the config file.
  • It's getting the name of the GPU.
  • Load checkpoints.
  • Start from epoch 157.
  • Transition start but then it ends.

image

I am not able to understand what's happening. In case I am doing something wrong do correct me. If I am not following the instructions properly do point me to the step I am missing.

Thanks in advance :D

Dependency incompatibility when trying to Calculate Principal Directions on Google Colab

I've used the command %pip install tensorflow-gpu==1.10 following your ReadMe.
The command is run after

%pip install -r requirements.txt
%pip install dareblopy

Then I copied the find_principal_directions.py to the Colab like this:

from dataloader import *
import numpy as np
import tensorflow as tf
import principal_directions.classifier


def parse_tfrecord_np(record):
    ex = tf.train.Example()
    ex.ParseFromString(record)
    shape = ex.features.feature['shape'].int64_list.value
    data = ex.features.feature['data'].bytes_list.value[0]
    dlat = ex.features.feature['dlat'].bytes_list.value[0]
    lat = ex.features.feature['lat'].bytes_list.value[0]
    return np.fromstring(data, np.uint8).reshape(shape), np.fromstring(dlat, np.float32), np.fromstring(lat, np.float32)


class Predictions:
    def __init__(self, cfg, minibatch_gpu):
        self.minibatch_size = minibatch_gpu
        self.cfg = cfg

    def evaluate(self, logger, mapping, decoder, lod, attrib_idx):
        result_expr = []

        rnd = np.random.RandomState(5)

        with tf.Graph().as_default(), tf.Session() as sess:
            ds = tf.data.TFRecordDataset("principal_directions/generated_data.000")
            ds = ds.batch(self.minibatch_size)
            batch = ds.make_one_shot_iterator().get_next()

            classifier = principal_directions.classifier.make_classifier(attrib_idx)
            i = 0
            while True:
                try:
                    records = sess.run(batch)
                    images = []
                    dlats = []
                    lats = []
                    for r in records:
                        im, dlat, lat = parse_tfrecord_np(r)

                        # plt.imshow(im.transpose(1, 2, 0), interpolation='nearest')
                        # plt.show()

                        images.append(im)
                        dlats.append(dlat)
                        lats.append(lat)
                    images = np.stack(images)
                    dlats = np.stack(dlats)
                    lats = np.stack(lats)
                    logits = classifier.run(images, None, num_gpus=1, assume_frozen=True)
                    logits = torch.tensor(logits)
                    predictions = torch.softmax(torch.cat([logits, -logits], dim=1), dim=1)

                    result_dict = dict(latents=lats, dlatents=dlats)
                    result_dict[attrib_idx] = predictions.cpu().numpy()
                    result_expr.append(result_dict)
                    i += 1
                except tf.errors.OutOfRangeError:
                    break

        results = {key: np.concatenate([value[key] for value in result_expr], axis=0) for key in result_expr[0].keys()}

        np.save("principal_directions/wspace_att_%d" % attrib_idx, results)


def main(cfg, logger):
    torch.cuda.set_device(0)
    model = Model(
        startf=cfg.MODEL.START_CHANNEL_COUNT,
        layer_count=cfg.MODEL.LAYER_COUNT,
        maxf=cfg.MODEL.MAX_CHANNEL_COUNT,
        latent_size=cfg.MODEL.LATENT_SPACE_SIZE,
        truncation_psi=cfg.MODEL.TRUNCATIOM_PSI,
        truncation_cutoff=cfg.MODEL.TRUNCATIOM_CUTOFF,
        mapping_layers=cfg.MODEL.MAPPING_LAYERS,
        channels=cfg.MODEL.CHANNELS,
        generator=cfg.MODEL.GENERATOR,
        encoder=cfg.MODEL.ENCODER)

    model.cuda(0)
    model.eval()
    model.requires_grad_(False)

    decoder = model.decoder
    encoder = model.encoder
    mapping_tl = model.mapping_tl
    mapping_fl = model.mapping_fl
    dlatent_avg = model.dlatent_avg

    logger.info("Trainable parameters generator:")
    count_parameters(decoder)

    logger.info("Trainable parameters discriminator:")
    count_parameters(encoder)

    arguments = dict()
    arguments["iteration"] = 0

    model_dict = {
        'discriminator_s': encoder,
        'generator_s': decoder,
        'mapping_tl_s': mapping_tl,
        'mapping_fl_s': mapping_fl,
        'dlatent_avg': dlatent_avg
    }

    checkpointer = Checkpointer(cfg,
                                model_dict,
                                {},
                                logger=logger,
                                save=False)

    checkpointer.load()

    model.eval()

    layer_count = cfg.MODEL.LAYER_COUNT

    logger.info("Extracting attributes")

    decoder = nn.DataParallel(decoder)

    indices = [0, 1, 2, 3, 4, 10, 11, 17, 19]
    with torch.no_grad():
        p = Predictions(cfg, minibatch_gpu=4)
        for i in indices:
            p.evaluate(logger, mapping_fl, decoder, cfg.DATASET.MAX_RESOLUTION_LEVEL - 2, i)


if __name__ == "__main__":
    gpu_count = 1
    run(main, get_cfg_defaults(), description='StyleGAN', default_config='configs/celeba.yaml',
        world_size=gpu_count, write_log=False)

Here are the logs when I run each of the code blocks respectively:

Collecting tensorflow-gpu==1.10
  Downloading https://files.pythonhosted.org/packages/64/ca/830b7cedb073ae264d215d51bd18d7cff7a2a47e39d79f6fa23edae17bb2/tensorflow_gpu-1.10.0-cp36-cp36m-manylinux1_x86_64.whl (253.2MB)
     |████████████████████████████████| 253.3MB 52kB/s 
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (0.8.1)
Collecting numpy<=1.14.5,>=1.13.3
  Downloading https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
     |████████████████████████████████| 12.2MB 38.8MB/s 
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (1.1.0)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (0.34.2)
Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (0.3.3)
Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (3.12.4)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (1.31.0)
Collecting tensorboard<1.11.0,>=1.10.0
  Downloading https://files.pythonhosted.org/packages/c6/17/ecd918a004f297955c30b4fffbea100b1606c225dbf0443264012773c3ff/tensorboard-1.10.0-py3-none-any.whl (3.3MB)
     |████████████████████████████████| 3.3MB 44.2MB/s 
Collecting setuptools<=39.1.0
  Downloading https://files.pythonhosted.org/packages/8c/10/79282747f9169f21c053c562a0baa21815a8c7879be97abd930dbcf862e8/setuptools-39.1.0-py2.py3-none-any.whl (566kB)
     |████████████████████████████████| 573kB 42.4MB/s 
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (1.15.0)
Requirement already satisfied: absl-py>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==1.10) (0.9.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.11.0,>=1.10.0->tensorflow-gpu==1.10) (3.2.2)
Requirement already satisfied: werkzeug>=0.11.10 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.11.0,>=1.10.0->tensorflow-gpu==1.10) (1.0.1)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from markdown>=2.6.8->tensorboard<1.11.0,>=1.10.0->tensorflow-gpu==1.10) (1.7.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata; python_version < "3.8"->markdown>=2.6.8->tensorboard<1.11.0,>=1.10.0->tensorflow-gpu==1.10) (3.1.0)
ERROR: xarray 0.15.1 has requirement numpy>=1.15, but you'll have numpy 1.14.5 which is incompatible.
ERROR: xarray 0.15.1 has requirement setuptools>=41.2, but you'll have setuptools 39.1.0 which is incompatible.
ERROR: umap-learn 0.4.6 has requirement numpy>=1.17, but you'll have numpy 1.14.5 which is incompatible.
ERROR: tifffile 2020.7.24 has requirement numpy>=1.15.1, but you'll have numpy 1.14.5 which is incompatible.
ERROR: tensorflow 2.3.0 has requirement numpy<1.19.0,>=1.16.0, but you'll have numpy 1.14.5 which is incompatible.
ERROR: tensorflow 2.3.0 has requirement tensorboard<3,>=2.3.0, but you'll have tensorboard 1.10.0 which is incompatible.
ERROR: spacy 2.2.4 has requirement numpy>=1.15.0, but you'll have numpy 1.14.5 which is incompatible.
ERROR: plotnine 0.6.0 has requirement numpy>=1.16.0, but you'll have numpy 1.14.5 which is incompatible.
ERROR: numba 0.48.0 has requirement numpy>=1.15, but you'll have numpy 1.14.5 which is incompatible.
ERROR: imgaug 0.2.9 has requirement numpy>=1.15.0, but you'll have numpy 1.14.5 which is incompatible.
ERROR: google-auth 1.17.2 has requirement setuptools>=40.3.0, but you'll have setuptools 39.1.0 which is incompatible.
ERROR: fastai 1.0.61 has requirement numpy>=1.15, but you'll have numpy 1.14.5 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
ERROR: cvxpy 1.0.31 has requirement numpy>=1.15, but you'll have numpy 1.14.5 which is incompatible.
ERROR: blis 0.4.1 has requirement numpy>=1.15.0, but you'll have numpy 1.14.5 which is incompatible.
ERROR: astropy 4.0.1.post1 has requirement numpy>=1.16, but you'll have numpy 1.14.5 which is incompatible.
ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.
Installing collected packages: numpy, tensorboard, setuptools, tensorflow-gpu
  Found existing installation: numpy 1.18.5
    Uninstalling numpy-1.18.5:
      Successfully uninstalled numpy-1.18.5
  Found existing installation: tensorboard 2.3.0
    Uninstalling tensorboard-2.3.0:
      Successfully uninstalled tensorboard-2.3.0
  Found existing installation: setuptools 49.2.0
    Uninstalling setuptools-49.2.0:
      Successfully uninstalled setuptools-49.2.0
Successfully installed numpy-1.14.5 setuptools-39.1.0 tensorboard-1.10.0 tensorflow-gpu-1.10.0
WARNING: The following packages were previously imported in this runtime:
  [numpy,pkg_resources]
You must restart the runtime in order to use newly installed versions.
[autoreload of pkg_resources._vendor.six failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
AttributeError: 'NoneType' object has no attribute 'cStringIO'
]
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:882: UserWarning: add_newdoc was used on a pure-python object <function empty_like at 0x7fd680b3e7b8>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1239: UserWarning: add_newdoc was used on a pure-python object <function concatenate at 0x7fd680b3e8c8>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1313: UserWarning: add_newdoc was used on a pure-python object <function inner at 0x7fd680b3e9d8>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1519: UserWarning: add_newdoc was used on a pure-python object <function where at 0x7fd680b3eae8>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1596: UserWarning: add_newdoc was used on a pure-python object <function lexsort at 0x7fd680b3ebf8>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1704: UserWarning: add_newdoc was used on a pure-python object <function can_cast at 0x7fd680b3ed08>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1804: UserWarning: add_newdoc was used on a pure-python object <function min_scalar_type at 0x7fd680b3ee18>. Prefer to attach it directly to the source.
  """)
/usr/local/lib/python3.6/dist-packages/numpy/add_newdocs.py:1873: UserWarning: add_newdoc was used on a pure-python object <function result_type at 0x7fd680b3ef28>. Prefer to attach it directly to the source.
  """)
[autoreload of numpy failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
AttributeError: module 'numpy.core.multiarray' has no attribute 'newbuffer'
]
[autoreload of numpy.core failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name '_numpy_tester'
]
[autoreload of numpy.core.numerictypes failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
IndexError: string index out of range
]
[autoreload of numpy.core.numeric failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name 'TooHardError'
]
[autoreload of numpy.lib failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
NameError: name 'type_check' is not defined
]
[autoreload of numpy.matrixlib failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
NameError: name 'defmatrix' is not defined
]
[autoreload of numpy.linalg failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name '_numpy_tester'
]
[autoreload of numpy.lib.function_base failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name 'digitize'
]
[autoreload of numpy.fft failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name '_FFTCache'
]
[autoreload of numpy.polynomial failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name '_numpy_tester'
]
[autoreload of numpy.random failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name '_numpy_tester'
]
[autoreload of numpy.ma failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
ImportError: cannot import name '_numpy_tester'
]
[autoreload of numpy.ma.core failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
AttributeError: module 'numpy' has no attribute 'rank'
]
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py in <module>()
     57 
---> 58   from tensorflow.python.pywrap_tensorflow_internal import *
     59   from tensorflow.python.pywrap_tensorflow_internal import __version__

7 frames
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py in <module>()
     72 for some common reasons and solutions.  Include the entire stack trace
     73 above this error message when asking for help.""" % traceback.format_exc()
---> 74   raise ImportError(msg)
     75 
     76 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

Segmentation fault (core dumped)

I tested interactive_demo.py on p2.x8large(8GPUs)

but I encountered

Segmentation fault (core dumped)

Could you give me advice which part should I fix for this problem??

Exception: process 0 terminated with signal SIGABRT

I used pip install dareblopy to reinstall dareblopy. However, after reinstallment, the problem still exists, and I make sure the path to tfrecords I created by python dataset_preparation/prepare_celeba_hq_tfrecords.py is correct.

Details on calculating principal direction vectors for attributes

Hi,

Thanks for releasing such a well-written code and the interactive demo for the paper. Even when I am testing it on real-world images, the reconstruction and semantic changes are working very well.

However, I was wondering whether you are planning to release more details on the semantic editing part. In particular, I couldn't find the details on how the principal direction vectors are calculated in the paper. Surprisingly, the paper doesn't have any results on semantic editing, i.e., the ones demonstrated in the demo.

I wonder whether you are planning to release any additional documents including these details? Or is it a generic methodology well-known in the community? I am relatively new in this domain and not sure about it.

Thanks.

RuntimeError: cuDNN error during backpropagation

I get this error when backpropagating as line below.
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

loss_d.backward()

I installed all the requirements, and used PyTorch v1.4.0.
I included contiguous() everywhere after .view(), .reshape(), so I might not be non-contiguous problem.
I'm currently using single GPU, so it might not be multi-GPU problem.

Do you have any idea about this error?

Demo on remote server

Is it possible to support the functionality where we can run the interactive demo on a remote server (similar to tensorboard)? I have GPUs available only on a headless server, which might be the case for many others.

Thanks.

Training on ImageNet

Hi,

I see that you have implemented TFrecords for ImageNet in ALAE/dataset_preparation/prepare_imagenet.py. Do you also have trained models for this dataset, and if yes what was your experience ?

Thanks for the information!

Training fails to initialize

Hi,
Could you please help with startup of training?

After the start of training script I get this output:

_2020-07-06 19:11:33,163 logger INFO: Namespace(config_file='configs\7359-frackles.yaml', opts=[])
2020-07-06 19:11:33,163 logger INFO: World size: 1
2020-07-06 19:11:33,163 logger INFO: Loaded configuration file configs\7359-frackles.yaml
2020-07-06 19:11:33,163 logger INFO:
NAME: 7359-frackles-test
PPL_CELEBA_ADJUSTMENT: True
DATASET:
PART_COUNT: 16
SIZE: 20000
SIZE_TEST: 49000-20000
PATH: M:/dev/ALAE/project/ALAE-master/data/datasets/frackleLeft_20200108_x128color-dataset/frackleLeft_20200108_x128-dataset-r%02d.tfrecords.%03d
PATH_TEST: M:/dev/ALAE/project/ALAE-master/data/datasets/frackleLeft_20200108_x128color-dataset/frackleLeft_20200108_x128-dataset-r%02d.tfrecords.%03d
MAX_RESOLUTION_LEVEL: 7
STYLE_MIX_PATH: style_mixing/test_images/set_celeba
MODEL:
LATENT_SPACE_SIZE: 256
LAYER_COUNT: 6
MAX_CHANNEL_COUNT: 256
START_CHANNEL_COUNT: 64
DLATENT_AVG_BETA: 0.995
MAPPING_LAYERS: 8
OUTPUT_DIR: training_artifacts/7359-frackles-test
TRAIN:
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 6
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
TRAIN_EPOCHS: 80

4 8 16 32 64 128 256 512 1024

LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 32, 32, 16]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 32, 16]
LOD_2_BATCH_1GPU: [128, 128, 128, 64, 32, 16]

LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]

2020-07-06 19:11:33,164 logger INFO: Running with config:
DATASET:
FFHQ_SOURCE: /data/datasets/ffhq-dataset/tfrecords/ffhq/ffhq-r%02d.tfrecords
FLIP_IMAGES: True
MAX_RESOLUTION_LEVEL: 7
PART_COUNT: 16
PART_COUNT_TEST: 1
PATH: M:/dev/ALAE/project/ALAE-master/data/datasets/frackleLeft_20200108_x128color-dataset/frackleLeft_20200108_x128-dataset-r%02d.tfrecords.%03d
PATH_TEST: M:/dev/ALAE/project/ALAE-master/data/datasets/frackleLeft_20200108_x128color-dataset/frackleLeft_20200108_x128-dataset-r%02d.tfrecords.%03d
SAMPLES_PATH: dataset_samples/faces/realign128x128
SIZE: 20000
SIZE_TEST: 29000
STYLE_MIX_PATH: style_mixing/test_images/set_celeba
MODEL:
CHANNELS: 3
DLATENT_AVG_BETA: 0.995
ENCODER: EncoderDefault
GENERATOR: GeneratorDefault
LATENT_SPACE_SIZE: 256
LAYER_COUNT: 6
MAPPING_FROM_LATENT: MappingFromLatent
MAPPING_LAYERS: 8
MAPPING_TO_LATENT: MappingToLatent
MAX_CHANNEL_COUNT: 256
START_CHANNEL_COUNT: 64
STYLE_MIXING_PROB: 0.9
TRUNCATIOM_CUTOFF: 8
TRUNCATIOM_PSI: 0.7
Z_REGRESSION: False
NAME: 7359-frackles-test
OUTPUT_DIR: training_artifacts/7359-frackles-test
PPL_CELEBA_ADJUSTMENT: True
TRAIN:
ADAM_BETA_0: 0.0
ADAM_BETA_1: 0.99
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 6
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]
LOD_2_BATCH_1GPU: [128, 128, 128, 64, 32, 16]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 32, 16]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 32, 32, 16]
LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
REPORT_FREQ: [100, 80, 60, 30, 20, 10, 10, 5, 5]
SNAPSHOT_FREQ: [300, 300, 300, 100, 50, 30, 20, 20, 10]
TRAIN_EPOCHS: 80
Running on GeForce RTX 2080 Ti
2020-07-06 19:11:35,057 logger INFO: Trainable parameters generator:
2020-07-06 19:11:35,059 logger INFO: Trainable parameters discriminator:
2020-07-06 19:11:35,062 logger INFO: No checkpoint found. Initializing model from scratch
2020-07-06 19:11:35,062 logger INFO: Starting from epoch: 0
2020-07-06 19:11:35,116 logger INFO: ################################################################################
2020-07-06 19:11:35,117 logger INFO: # Switching LOD to 0
2020-07-06 19:11:35,117 logger INFO: # Starting transition
2020-07-06 19:11:35,117 logger INFO: ################################################################################
2020-07-06 19:11:35,117 logger INFO: ################################################################################
2020-07-06 19:11:35,117 logger INFO: # Transition ended
2020-07-06 19:11:35,117 logger INFO: ################################################################################
2020-07-06 19:11:35,119 logger INFO: Batch size: 128, Batch size per GPU: 128, LOD: 0 - 4x4, blend: 1.000, dataset size: 20000
Backend TkAgg is interactive backend. Turning interactive mode on._

Process finished with exit code -1073741819 (0xC0000005)

When debugging in PyCharm I have found that the error occures on line 74 in data_loader.py when calling b = next(yielder)

But since I have a very little experience in debugging python I would be glad if you know what might me a problem.

Thank you very much in advance.

CelebA-HQ Train/Test Split

Hello authors,

Thank you for your incredibly interesting paper. I had a very quick question about the CelebA-HQ train/test split.

I believe the config uses the split 29000/1000: https://github.com/podgorskiy/ALAE/blob/master/configs/celeba-hq256.yaml#L6-L7

And in the paper (page 7, bottom left column), you say: "We follow [16, 17, 27, 23] and use CelebAHQ downscaled to 256 × 256 with training/testing split of 27000/3000."

If I am looking to compare results, which split size should I use -- or is there something here that I am missing?

Thank you!

Interactive demo doesn't show up on Windows

Windows 10, Python 3.6, all the dependencies installed
python interactive_demo.py
Is working for a few seconds and then exiting with no visual result or prompt. Any ideas on how to debug?

Bimpy issue

Thank you for your great work! However, I'm having trouble with Bimpy.

I got segmentation fault error message on line 184 of interactive_demo.py.

ctx.init(1800, 1600, "Styles")

Segmentation fault error seems to appear by ctx.init as i tried simple example below. Do you have any idea about this case? I'm trying on Linux.
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.