Giter Club home page Giter Club logo

stylegan2-ada-pytorch's Introduction

StyleGAN2-ADA — Official PyTorch implementation

Teaser image

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Release notes

This repository is a faithful reimplementation of StyleGAN2-ADA in PyTorch, focusing on correctness, performance, and compatibility.

Correctness

  • Full support for all primary training configurations.
  • Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
  • Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.

Performance

  • Training is typically 5%–30% faster compared to the TensorFlow version on NVIDIA Tesla V100 GPUs.
  • Inference is up to 35% faster in high resolutions, but it may be slightly slower in low resolutions.
  • GPU memory usage is comparable to the TensorFlow version.
  • Faster startup time when training new networks (<50s), and also when using pre-trained networks (<4s).
  • New command line options for tweaking the training performance.

Compatibility

  • Compatible with old network pickles created using the TensorFlow version.
  • New ZIP/PNG based dataset format for maximal interoperability with existing 3rd party tools.
  • TFRecords datasets are no longer supported — they need to be converted to the new format.
  • New JSON-based format for logs, metrics, and training curves.
  • Training curves are also exported in the old TFEvents format if TensorBoard is installed.
  • Command line syntax is mostly unchanged, with a few exceptions (e.g., dataset_tool.py).
  • Comparison methods are not supported (--cmethod, --dcap, --cfg=cifarbaseline, --aug=adarv)
  • Truncation is now disabled by default.

Data repository

Path Description
stylegan2-ada-pytorch Main directory hosted on Amazon S3
  ├  ada-paper.pdf Paper PDF
  ├  images Curated example images produced using the pre-trained models
  ├  videos Curated example interpolation videos
  └  pretrained Pre-trained models
    ├  ffhq.pkl FFHQ at 1024x1024, trained using original StyleGAN2
    ├  metfaces.pkl MetFaces at 1024x1024, transfer learning from FFHQ using ADA
    ├  afhqcat.pkl AFHQ Cat at 512x512, trained from scratch using ADA
    ├  afhqdog.pkl AFHQ Dog at 512x512, trained from scratch using ADA
    ├  afhqwild.pkl AFHQ Wild at 512x512, trained from scratch using ADA
    ├  cifar10.pkl Class-conditional CIFAR-10 at 32x32
    ├  brecahad.pkl BreCaHAD at 512x512, trained from scratch using ADA
    ├  paper-fig7c-training-set-sweeps Models used in Fig.7c (sweep over training set size)
    ├  paper-fig11a-small-datasets Models used in Fig.11a (small datasets & transfer learning)
    ├  paper-fig11b-cifar10 Models used in Fig.11b (CIFAR-10)
    ├  transfer-learning-source-nets Models used as starting point for transfer learning
    └  metrics Feature detectors used by the quality metrics

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using NVIDIA DGX-1 with 8 Tesla V100 GPUs.
  • 64-bit Python 3.7 and PyTorch 1.7.1. See https://pytorch.org/ for PyTorch install instructions.
  • CUDA toolkit 11.0 or later. Use at least version 11.1 if running on RTX 3090. (Why is a separate CUDA toolkit installation required? See comments in #2.)
  • Python libraries: pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3. We use the Anaconda3 2020.11 distribution which installs most of these by default.
  • Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl

# Style mixing example
python style_mixing.py --outdir=out --rows=85,100,75,458,1500 --cols=55,821,1789,293 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag sg2ada:latest .
./docker_run.sh python3 generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Note: The Docker image requires NVIDIA driver release r455.23 or later.

Legacy networks: The above commands can load most of the network pickles created using the previous TensorFlow versions of StyleGAN2 and StyleGAN2-ADA. However, for future compatibility, we recommend converting such legacy pickles into the new format used by the PyTorch version:

python legacy.py \
    --source=https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl \
    --dest=stylegan2-cat-config-f.pkl

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=~/mytargetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/projected_w.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --projected_w for generate.py:

python generate.py --outdir=out --projected_w=out/projected_w.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

Using networks from Python

You can use pre-trained networks in your own Python code as follows:

with open('ffhq.pkl', 'rb') as f:
    G = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda()    # latent codes
c = None                                # class labels (not used in this example)
img = G(z, c)                           # NCHW, float32, dynamic range [-1, +1]

The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence.

The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:

w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)

Please refer to generate.py, style_mixing.py, and projector.py for further examples.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.

Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
#
# Note: --resize-filter=box is required to reproduce FID scores shown in the
# paper.  If you don't need to match exactly, it's better to leave this out
# and default to Lanczos.  See https://github.com/NVlabs/stylegan2-ada-pytorch/issues/283#issuecomment-1731217782
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256 --resize-filter=box

MetFaces: Download the MetFaces dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces.zip

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/cifar-10-python.tar.gz --dest=~/datasets/cifar10.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000

BreCaHAD:

Step 1: Download the BreCaHAD dataset.

Step 2: Extract 512x512 resolution crops using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

Step 3: Create ZIP archive using dataset_tool.py from this repository:

python dataset_tool.py --source=/tmp/brecahad-crops --dest=~/datasets/brecahad.zip

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1

The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.

In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1, controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).

The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:

Base config Description
auto (default) Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results.
stylegan2 Reproduce results for StyleGAN2 config F at 1024x1024 using 1, 2, 4, or 8 GPUs.
paper256 Reproduce results for FFHQ and LSUN Cat at 256x256 using 1, 2, 4, or 8 GPUs.
paper512 Reproduce results for BreCaHAD and AFHQ at 512x512 using 1, 2, 4, or 8 GPUs.
paper1024 Reproduce results for MetFaces at 1024x1024 using 1, 2, 4, or 8 GPUs.
cifar Reproduce results for CIFAR-10 (tuned configuration) using 1 or 2 GPUs.

The training configuration can be further customized with additional command line options:

  • --aug=noaug disables ADA.
  • --cond=1 enables class-conditional training (requires a dataset with labels).
  • --mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
  • --resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
  • --resume=~/training-runs/<NAME>/network-snapshot-<INT>.pkl resumes a previous training run.
  • --gamma=10 overrides R1 gamma. We recommend trying a couple of different values for each new dataset.
  • --aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
  • --augpipe=blit enables pixel blitting but disables all other augmentations.
  • --augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).

Please refer to python train.py --help for the full list.

Expected training time

The total training time depends heavily on resolution, number of GPUs, dataset, desired quality, and hyperparameters. The following table lists expected wallclock times to reach different points in the training, measured in thousands of real images shown to the discriminator ("kimg"):

Resolution GPUs 1000 kimg 25000 kimg sec/kimg GPU mem CPU mem
128x128 1 4h 05m 4d 06h 12.8–13.7 7.2 GB 3.9 GB
128x128 2 2h 06m 2d 04h 6.5–6.8 7.4 GB 7.9 GB
128x128 4 1h 20m 1d 09h 4.1–4.6 4.2 GB 16.3 GB
128x128 8 1h 13m 1d 06h 3.9–4.9 2.6 GB 31.9 GB
256x256 1 6h 36m 6d 21h 21.6–24.2 5.0 GB 4.5 GB
256x256 2 3h 27m 3d 14h 11.2–11.8 5.2 GB 9.0 GB
256x256 4 1h 45m 1d 20h 5.6–5.9 5.2 GB 17.8 GB
256x256 8 1h 24m 1d 11h 4.4–5.5 3.2 GB 34.7 GB
512x512 1 21h 03m 21d 22h 72.5–74.9 7.6 GB 5.0 GB
512x512 2 10h 59m 11d 10h 37.7–40.0 7.8 GB 9.8 GB
512x512 4 5h 29m 5d 17h 18.7–19.1 7.9 GB 17.7 GB
512x512 8 2h 48m 2d 22h 9.5–9.7 7.8 GB 38.2 GB
1024x1024 1 1d 20h 46d 03h 154.3–161.6 8.1 GB 5.3 GB
1024x1024 2 23h 09m 24d 02h 80.6–86.2 8.6 GB 11.9 GB
1024x1024 4 11h 36m 12d 02h 40.1–40.8 8.4 GB 21.9 GB
1024x1024 8 5h 54m 6d 03h 20.2–20.6 8.3 GB 44.7 GB

The above measurements were done using NVIDIA Tesla V100 GPUs with default settings (--cfg=auto --aug=ada --metrics=fid50k_full). "sec/kimg" shows the expected range of variation in raw training performance, as reported in log.txt. "GPU mem" and "CPU mem" show the highest observed memory consumption, excluding the peak at the beginning caused by torch.backends.cudnn.benchmark.

In typical cases, 25000 kimg or more is needed to reach convergence, but the results are already quite reasonable around 5000 kimg. 1000 kimg is often enough for transfer learning, which tends to converge significantly faster. The following figure shows example convergence curves for different datasets as a function of wallclock time, using the same settings as above:

Training curves

Note: --cfg=auto serves as a reasonable first guess for the hyperparameters but it does not necessarily lead to optimal results for a given dataset. For example, --cfg=stylegan2 yields considerably better FID for FFHQ-140k at 1024x1024 than illustrated above. We recommend trying out at least a few different values of --gamma for each new dataset.

Quality metrics

By default, train.py automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl (or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly (3%–9%).

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq.zip --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

The first example looks up the training configuration and performs the same operation as if --metrics=pr50k3_full had been specified during training. The second example downloads a pre-trained network pickle, in which case the values of --mirror and --data must be specified explicitly.

Note that many of the metrics have a significant one-off cost when calculating them for the first time for a new dataset (up to 30min). Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

We employ the following metrics in the ADA paper. Execution time and GPU memory usage is reported for one NVIDIA Tesla V100 GPU at 1024x1024 resolution:

Metric Time GPU mem Description
fid50k_full 13 min 1.8 GB Fréchet inception distance[1] against the full dataset
kid50k_full 13 min 1.8 GB Kernel inception distance[2] against the full dataset
pr50k3_full 13 min 4.1 GB Precision and recall[3] againt the full dataset
is50k 13 min 1.8 GB Inception score[4] for CIFAR-10

In addition, the following metrics from the StyleGAN and StyleGAN2 papers are also supported:

Metric Time GPU mem Description
fid50k 13 min 1.8 GB Fréchet inception distance against 50k real images
kid50k 13 min 1.8 GB Kernel inception distance against 50k real images
pr50k3 13 min 4.1 GB Precision and recall against 50k real images
ppl2_wend 36 min 2.4 GB Perceptual path length[5] in W, endpoints, full image
ppl_zfull 36 min 2.4 GB Perceptual path length in Z, full paths, cropped image
ppl_wfull 36 min 2.4 GB Perceptual path length in W, full paths, cropped image
ppl_zend 36 min 2.4 GB Perceptual path length in Z, endpoints, cropped image
ppl_wend 36 min 2.4 GB Perceptual path length in W, endpoints, cropped image

References:

  1. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
  2. Demystifying MMD GANs, Bińkowski et al. 2018
  3. Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
  4. Improved Techniques for Training GANs, Salimans et al. 2016
  5. A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

Copyright © 2021, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke for helpful comments; Tero Kuosmanen and Sabu Nadarajan for their support with compute infrastructure; and Edgar Schönfeld for guidance on setting up unconditional BigGAN.

stylegan2-ada-pytorch's People

Contributors

jannehellsten avatar nurpax avatar tkarras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stylegan2-ada-pytorch's Issues

Questions about the dataset preparation: the necessity of .zip format?

@nurpax Hi, I have a question about the dataset preparation:

In Preparing datasets section, you mentioned that: "Alternatively, the folder can also be used directly as a dataset... but doing so may lead to suboptimal performance."

I'm a bit confused why we have to convert a folder dataset into the .zip format? Maybe because the original folder may not contain the uncompressed PNG files? Or any other considerations for the dataset format?

Thanks in advance!

No module named 'upfirdn2d_plugin'

Describe the bug
Trying to use class AugmentPipe in my project.

To Reproduce
Run this notebook
https://colab.research.google.com/drive/1WyNx2jSNlXPBtXgpOmLQMxRE2GzFjeFM?usp=sharing

Trace

No module named 'upfirdn2d_plugin'
warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + str(sys.exc_info()[1]))
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
tensor(0.8200, device='cuda:0', grad_fn=)
/content/ada/torch_utils/ops/upfirdn2d.py:34: UserWarning: Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:

Desktop (please complete the following information):
Google colab, linux, latest pytorch

How does one get an intermediate output?

I'm trying to generate intermediate outputs from a pre-trained model for a different downstream task. Since the network I'm feeding it to is 224x224 I only need to generate a 256x256 image, not the full 1024x1024.

Digging through the code, it look like I could do something like this:

with open('stylegan2-ada-pytorch_pretrained_metfaces.pkl', 'rb') as f:
    
    styleGAN2_ADA = pickle.load(f)
    G = styleGAN2_ADA['G_ema']
    G.synthesis.block_resolutions = [4, 8, 16, 32, 64, 128, 256,]

Which sort of works, but I think I'm missing the is_final flag on the last block. I'm not sure how to change it for a pre-trained model.

Output when changing .block_resolutions
yHYBh

Standard output, same latent vector
NcQpT

Different from tf-version Discriminator architecture?

Hi,

I am curious whether the architecture is different from tf version of stylegan2.
I found:
if c_dim > 0: self.mapping = MappingNetwork(z_dim=0, c_dim=c_dim, w_dim=cmap_dim, num_ws=None, w_avg_beta=None, **mapping_kwargs)

That is not present in tf version, it adds several additional layers.
Is it done somewhere in tf version too or it is a change?

No module named 'upfirdn2d_plugin'

Hello,

When I start training, everything seems to work fine however I get the below message which does not interrupt the training

I am using Python 3.8,
torch==1.7.0+cu110 torchvision==0.8.1+cu110
and Cuda 11.0
my hardware is NVIDIA RTX 3090

No module named 'upfirdn2d_plugin'
  warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + str(sys.exc_info()[1]))
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
C:\Users\VENUM\Google Drive\PyTorch\stylegan2-ada-pytorch\torch_utils\ops\upfirdn2d.py:34: UserWarning: Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:

Hints on RGBA image dataset

Hi. First of all, thanks for the release!

Currently, only grayscale and RGB image dataset are supported.
For example, dataset_tool.py won't even allow RGBA images as input.
Could you give me some hints on the simplest modification to the source code for RGBA image to be considered and how feasible is it for stylegan2-ada to train on the RGBA image dataset?

Thanks!

No module named 'upfirdn2d_plugin'

Describe the bug
A clear and concise description of what the bug is.

No module named 'upfirdn2d_plugin'
  warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + str(sys.exc_info()[1]))
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
.../stylegan2-ada-pytorch/torch_utils/ops/upfirdn2d.py:34: UserWarning: Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:

To Reproduce
Steps to reproduce the behavior:

python3 train.py --outdir=train_result --data=../data/dataset1.zip --gpus=8

Please copy&paste text instead of screenshots for better searchability.

Desktop (please complete the following information):

  • CUDA 11.0
  • Debian
  • GPU V100
  • torch python3 train.py --outdir=train_result --data=../data/dataset1.zip --gpus=8

Additional context
Add any other context about the problem here.

watching youtube VR 360 "errors"

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. In '...' directory, run command '...'
  2. See error (please copy&paste full log and stacktraces).

Please copy&paste text instead of screenshots for better searchability.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Linux Ubuntu 20.04, Windows 10]
  • PyTorch version (e.g., pytorch 1.7.1)
  • CUDA toolkit version (e.g., CUDA 11.0)
  • NVIDIA driver version
  • GPU [e.g., Titan V, RTX 3090]
  • Docker: did you use Docker? If yes, specify docker image URL (e.g., nvcr.io/nvidia/pytorch:20.12-py3)

Additional context
What is going on with mixed realty and "HP Reverb G2 headset" and "youtube 360 VR video" ?
we are geting error in headset that says "can't read page" . When we Click on the "headset icon" under the vr 360 youtube video's, it attempts to go into VR but then a messege pops upin the headset and says can't read page.
Specs:

  • Dell Alien R-11 desktop
  • Windows 10 64 bit
  • i9 cpu with 64 gb
  • gpu= rtx-3090 with 24gb
  • mixed reality headset HP Reverb G2
  • latest versions

ps in steam the steam vr 360 video work.

SynthesisNetwork and number of Ws

Hi! On file "training/networks.py" line 445:

class SynthesisNetwork(torch.nn.Module):
    def __init__(...
        ...
        for res in self.block_resolutions:
          ...
          is_last = (res == self.img_resolution)
          block = SynthesisBlock(in_channels, out_channels, w_dim=w_dim, resolution=res,
              img_channels=img_channels, is_last=is_last, use_fp16=use_fp16, **block_kwargs)
          self.num_ws += block.num_conv
          if is_last:
              self.num_ws += block.num_torgb
          ...

As you can see except on the last block, even in the skip SynthesisLayer version, you are just adding two counts of ws instead of three. Then on line 459, splitting ws:

    ...
    def forward(self, ws, **block_kwargs):
        block_ws = []
        with torch.autograd.profiler.record_function('split_ws'):
            misc.assert_shape(ws, [None, self.num_ws, self.w_dim])
            ws = ws.to(torch.float32)
            w_idx = 0
            for res in self.block_resolutions:
                block = getattr(self, f'b{res}')
                block_ws.append(ws.narrow(1, w_idx, block.num_conv + block.num_torgb))
                w_idx += block.num_conv

w_idx increases by two but block_ws adds the following three. In this scenario, as far as I can tell, self.b[n].torgb and self.b[n+1].conv0 receive the exact same view of w. Is this intended behaviour?
On the other hand, is there a reason to use torch.Tensor.repeat and consequently allocate new memory for exact copies of the original w instead of using torch.Tensor.expand and create new views?

Cheers

Bad output from generator

Describe the bug

Values in output array are in the incorrect range, expect -1 to 1 but got approximately -150 - 50. When attempting to transform to the 0-255 range using the transformation x * 127.5 + 128 all values are clipped at the extremes. Changing this to a scale of 1 and shift of 170 all values are put in the right range, but image is still full of artifacts. See screenshots below.

Both images do contain the shape of a face, so something is working correctly.

To Reproduce

Can get this by cloning the repo and running the example command:

python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
      --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

I used the following code based off of generate.py to rescale the image values:

import pickle
import torch
from PIL import Image


DEVICE = 'cuda:0'

def transform_output(a, scale, offset):
    return (a.permute(0, 2, 3, 1) * scale + offset).clamp(0, 255).to(torch.uint8)

def latent_from_seed(seed):
    return torch.from_numpy(np.random.RandomState(seed).randn(1, G.z_dim)).to(DEVICE) 


# Load net
with open('pretrained/metfaces.pkl', 'rb') as f:
    data = pickle.load(f)

G = data['G_ema'].cuda()


# Generate
z = latent_from_seed(85)
a = G(z, None, truncation_psi=1, noise_mode='const')


# Image using standard transformation
a2 = transform_output(a, scale=127.5, offset=128)
img = Image.fromarray(a2[0].cpu().numpy(), 'RGB')


# Image using alternate transformation
a2 = transform_output(a, scale=1, offset=170)
img = Image.fromarray(a2[0].cpu().numpy(), 'RGB')

Screenshots

Output of metfaces network using seed 85 in example command:

seed0085

After attempting to scale/shift values into the correct range:

image

Distribution of channel values in output of generator:

image

System info

  • Pop!_OS 20.10 (Ubuntu 20.10)
  • pytorch 1.7.1
  • CUDA toolkit 11.2
  • NVIDIA driver 460.39
  • GTX 1650 4GB (figure this should be OK for just running the generator?)
  • No docker

Errors related to the generated image

The following error occurred when generating images with self-trained pkl files:
python generate.py --outdir outt --seeds 0-50 --network ./00005-data-auto2/network-snapshot-000600.pkl

Traceback (most recent call last):
File "generate.py", line 127, in
generate_images() # pylint: disable=no-value-for-parameter
File "/mistgpu/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/mistgpu/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/mistgpu/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mistgpu/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/mistgpu/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "generate.py", line 121, in generate_images
PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(f'{outdir}/seed{seed:04d}.png')
File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2785, in fromarray
return frombuffer(mode, size, obj, "raw", rawmode, 0, 1)
File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2725, in frombuffer
return frombytes(mode, size, data, decoder_name, args)
File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2671, in frombytes
im.frombytes(data, decoder_name, args)
File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 800, in frombytes
raise ValueError("not enough image data")
ValueError: not enough image data

What caused the above reasons?

legacy.py conversion fails for some network pickles

Describe the bug
Some Stylegan network pickles define "resolution_w" and "resolution_h" in addition to "resolution". legacy.py only handles "resolution" and will fail with "Unknown TensorFlow kwarg"

To Reproduce
I have looked through the source of previous StyleGAN & StyleGAN2 releases and have not found which verson is defining "resolution_w" and "resolution_h" in addition to "resolution".

Steps to reproduce the behavior:

  1. Download a legacy network pickle, for example faces (FFHQ config-f 512x512) listed here or download from Direct Link on Mega
  2. Run python legacy.py --source=./ffhq-512-avg-tpurun1.pkl --dest=stylegan2-ada-ffh512-config-f.pkl
  3. This will fail with error "Unknown TensorFlow kwarg" due to "resolution_w" and "resolution_h" kwargs. (In the example pkl, "resolution", "resolution_w" and "resolution_h" are all equal to 512.

Expected behavior
legacy.py should account for the existence of "resolution_w" and "resolution_h" kwargs.

Fix
In legacy.py lines 148 & 248 we check for unknown kwargs.

I'm not much of a python programmer, but in both of those sections adding this code will fix the issue:

    if 'resolution_w' in tf_kwargs:
        tf_kwargs.pop('resolution_w', None)
        tf_kwargs.pop('resolution_h', None)

This code should be inserted before lines 153 and 250:

unknown_kwargs = list(set(tf_kwargs.keys()) - known_kwargs)

Using this fix I have successfully converted the above linked FFHQ-config-f-512x512 network and run transfer learning against it.

Desktop (please complete the following information):

  • OS: Google Colab Pro, Python 3.6.9
  • PyTorch version 1.7.0+cu101
  • CUDA toolkit version release 10.1, V10.1.243
  • NVIDIA driver version NVIDIA UNIX x86_64 Kernel Module 460.32.03
  • GPU NVIDIA V100

Result of projected image is different from tensorflow implementation

Dear NVlabs,

I tried projecting robert downey jr image using both tensorflow and pytorch version. The result is different. The tensorflow one is producing much better output. Im using Legacy stylegan2 weight.

arobertdowneyjr_01 (1)
Original image

arobertdowneyjr_01
Tensorflow projected image.

proj
pytorch projected image

Greyscale Projection

Projector.py converts your target image to RGB but this of course causes an assertion error with a network trained on greyscale images. I am working on resolving this on my own. Here is the snippet of code that seems to be the first of the greyscale problems:

    # Load target image.
    target_pil = PIL.Image.open(target_fname).convert('RGB')
    w, h = target_pil.size
    s = min(w, h)
    target_pil = target_pil.crop(((w - s) // 2, (h - s) // 2, (w + s) // 2, (h + s) // 2))
    target_pil = target_pil.resize((G.img_resolution, G.img_resolution), PIL.Image.LANCZOS)
    target_uint8 = np.array(target_pil, dtype=np.uint8)

    # Optimize projection.
    start_time = perf_counter()
    projected_w_steps = project(
        G,
        target=torch.tensor(target_uint8.transpose([2, 0, 1]), device=device), # pylint: disable=not-callabl

Weight conversion to StyleGAN2

I was wondering if there is a tool available that accomplishes the reverse of what legacy.py does. I want to take my weights generated from this repo and convert them into StyleGAN2 (without ADA) architecture. Is this possible?

Generation of dataset.json for the pytorch version

How I can generate dataset.json for training a blank new model?
My Datasets are currently sort in a folder structure.
I want to train on Full body pictures. I see in some datasets, the area of the faces are marked in this file.
If exists a guide or documentation how to create the dataset meta file and what arguments are in use?
Thanks

How (if possible) to use multiclass generation?

I have dataset of landscapes whith labels like "day time", "hills", "grass", "trees" as value between 0 and 1. And I want use it to train my model. Most closest scenario which I see in this repo is conditional gan based on cifar-10, but this task including non intersection classes.
Can I use the multi labels in generation or maybe I can for this repo and change some part of process with generation latent space? If second then can you give advice which places in trainig loop I should change to my purpose?

semaphore_tracker error on configurations paper512 and cifar

first, thank you for writing this library!

Describe the bug
getting this error running these configurations

nohup python train.py --cfg=paper512 --gpus=8 --resume=ffhq512 --outdir=training-runs  --data=../data/processed-icons-nocond-512.zip > output.txt &
nohup python train.py --cfg=cifar --gpus=8 --resume=ffhq512 --outdir=training-runs  --data=../data/processed-icons-nocond-512.zip > output.txt &
(pytorch_latest_p37) ubuntu@ip-172-31-0-96:~/stylegan2-ada-pytorch$ /home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 17 leaked semaphores to clean up at shutdown

while these configurations do not have errors

nohup python train.py --cfg=auto --gpus=8 --resume=ffhq512 --outdir=training-runs  --data=../data/processed-icons-nocond-512.zip > output.txt &
nohup python train.py --cfg=stylegan2 --gpus=8 --resume=ffhq512 --outdir=training-runs  --data=../data/processed-icons-nocond-512.zip > output.txt &
nohup python train.py --cfg=paper1024 --gpus=8 --resume=ffhq512 --outdir=training-runs  --data=../data/processed-icons-nocond-512.zip > output.txt &

To Reproduce
I don't have steps now, but can try to create them. I was hoping there might be some general advice for the error above.

Desktop (please complete the following information):

  • OS: Ubuntu 18
  • PyTorch 1.7.1
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |===============================+======================+======================|
    | 0 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
    | N/A 43C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |
    | | | N/A |
    ... 8 cores
    using p3.16.xlarge

freezing layers for transfer learning

Hello,

for transfer learning in addition to the command below to resume training, is possible to pass argument in order to freeze layers? or this need to be done by modifying the training file?
--resume=~/training-runs//network-snapshot-.pkl resumes a previous training run.

Thanks,

some errors with windows10 with RTX-3070

Describe the bug
I'm new to this and I met these errors. I already installed required libraries and cleared the cache.
I run the train.py with a customer dataset zip file and I keep get 'No module named 'upfirdn2d_plugin' warning.
But it seems still running since the memory of GPU is highly occupied.
Then comes 'Evaluating metrics...' I then got this error:
'RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory'

Screenshots
image

Desktop (please complete the following information):

  • OS: [Windows 10]
  • PyTorch version (pytorch 1.7.1)
  • CUDA toolkit version (CUDA 11.0) [specifically it shows like this in pip list: 'torch 1.7.1+cu110']
  • NVIDIA driver version 8.1.940.0
  • GPU [ RTX 3070]
  • Docker: did you use Docker? no

Here is my log
log.txt

Injecting latent back into projector

Say for example that one has a dlatent for a specified image, how would one inject that dlatent back into the projector to resume embedding?

It seems that this would have to be replaced:

z_samples = np.random.RandomState(123).randn(w_avg_samples, G.z_dim)

but, I am not sure what other changes have to be made.

any tips about how to train my own dataset

i have 2000+ images. image resolution is 256x256.
My configuration is as such:
--kimg=25000 --augpipe=bgcfn --gpus=2 --resume=ffhq256 --cfg=paper256 --snap=10

fid value is around 60. it's difficult to decrease fid value. any suggestion to decrease it?

How to use center_crop_wide for 128x128 resolution?

Hi,
I wanted to prepare LSUN Car dataset at 128x128. Based on your instruction

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \ --transform=center-crop-wide --width=512 --height=384 --max_images=200000

I ran

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \ --transform=center-crop-wide --width=128 --height=96 --max-images=200000

However, I got deformed resized images as shown below
image

Could you help to offer a correct way to do so?
I also notice that you recommend center-crop for LSUN Cat but center-crop-wide for LSUN Car. Is there any specific reason to do so?

Thanks a lot!

How to use the Discriminator

Hi, I want to use Discriminator, but I am not sure what parameters are required.
I would appreciate it if you let me know.

Thank you.

How can we use our own image dataset to generate a dataset with labeled format suitable for ADA training? (like cifar-10 for example)

How can we use our own image dataset to generate a dataset with labeled format suitable for ADA training? (like cifar-10 for example)
I have separated my images with different name folders, and then used dataset_tool.py to generate ZIP dataset, but when I open the zip package, I found that all the images are still put into the one folder. Is there any way to use my own image dataset to generate a dataset with labels?
Thank you.

Enhancement - generate.py - documentation on how to load a local pkl file with docker

@click.option('--network', 'network_pkl', help='Network pickle filename', required=True) // should be false
@click.option('--local', 'local_pkl', help='Local pickle filename')

I recall some code that failed over to load file locally when a url is not specified. will dig around.

UPDATE
this isn't probably going to work so well with docker - needs instructions.
https://github.com/BartvLaatum/lsde_stylegan/blob/de75d15f80bb3d2bc8463b1487628ce4b6f57105/open_dlat.py

UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte

Describe the bug

(stylegan2-ada) E:\Projects\stylegan2-ada>python ./stylegan2-ada/generate.py --outdir=out --seeds=0-35 --class=2 --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl
Loading networks from "https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl"...
Generating image for seed 0 (0/36) ...
Setting up PyTorch plugin "bias_act_plugin"... C:\Users\oleg\AppData\Local\Continuum\anaconda3\envs\stylegan2-ada\lib\site-packages\torch\utils\cpp_extension.py:287: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... C:\Users\oleg\AppData\Local\Continuum\anaconda3\envs\stylegan2-ada\lib\site-packages\torch\utils\cpp_extension.py:287: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Done.
Generating image for seed 1 (1/36) ...
Generating image for seed 2 (2/36) ...
Generating image for seed 3 (3/36) ...

Saw these warnings but images generated successfull

Desktop (please complete the following information):

  • OS: Windows 10
  • PyTorch version: 1.7.1
  • CUDA toolkit version: 11.0
  • NVIDIA driver version: 461.40
  • GPU RTX 2080
  • Docker: —

RuntimeError: AssertionError:

Hi. I'm trying to run the sample code but it raises an error.

tick 0     kimg 0.0      time 1m 02s       sec/tick 15.7    sec/kimg 3923.85 maintenance 46.2   cpumem 3.91   gpumem 37.23  augment 0.000
Evaluating metrics...
Traceback (most recent call last):
  File "train.py", line 530, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "train.py", line 523, in main
    subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
  File "train.py", line 376, in subprocess_fn
    training_loop.training_loop(rank=rank, **args)
  File "/workspace/training/training_loop.py", line 371, in training_loop
    result_dict = metric_main.calc_metric(metric=metric, G=snapshot_data['G_ema'],
  File "/workspace/metrics/metric_main.py", line 45, in calc_metric
    results = _metric_dict[metric](opts)
  File "/workspace/metrics/metric_main.py", line 85, in fid50k_full
    fid = frechet_inception_distance.compute_fid(opts, max_real=None, num_gen=50000)
  File "/workspace/metrics/frechet_inception_distance.py", line 25, in compute_fid
    mu_real, sigma_real = metric_utils.compute_feature_stats_for_dataset(
  File "/workspace/metrics/metric_utils.py", line 216, in compute_feature_stats_for_dataset
    features = detector(images.to(opts.device), **detector_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
torch.jit.Error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 20, in forward
      pass
    else:
      ops.prim.RaiseException("AssertionError: ")
      ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    if use_fp16:
      _4 = 5

Traceback of TorchScript, original code (most recent call last):
  File "c:\p4research\research\tkarras\dnn\gan3support\feature_detectors\inception.py", line 197, in forward
    def forward(self, img, return_features: bool = False, use_fp16: bool = False, no_output_bias: bool = False):
        batch_size, channels, height, width = img.shape # [NCHW]
        assert channels == 3
        ~~~~~~~~~~~~~~~~~~~~ <--- HERE

        # Cast to float.
RuntimeError: AssertionError:

Do you have any idea how to solve this problem? Thanks in advance

cublas_v2.h: No such file or directory

Describe the bug
I am encountering a cublas_v2.h: No such file or directory error when using the command python train.py --outdir=outdir --data=../fmwa256x256.zip --gpus=1
Here's the more detailled error message :

Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"... Failed!
/home/oliviern/stylegan2-ada-pytorch/torch_utils/ops/bias_act.py:50: UserWarning: Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:

Error building extension 'bias_act_plugin': [1/2] /home/oliviern/venv/singularity_pytorch/bin/x86_64-conda_cos6-linux-gnu-c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/TH -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/oliviern/venv/singularity_pytorch/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/oliviern/stylegan2-ada-pytorch/torch_utils/ops/bias_act.cpp -o bias_act.o 
FAILED: bias_act.o 
/home/oliviern/venv/singularity_pytorch/bin/x86_64-conda_cos6-linux-gnu-c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/TH -isystem /home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/oliviern/venv/singularity_pytorch/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/oliviern/stylegan2-ada-pytorch/torch_utils/ops/bias_act.cpp -o bias_act.o 
In file included from /home/oliviern/stylegan2-ada-pytorch/torch_utils/ops/bias_act.cpp:10:0:
/home/oliviern/venv/singularity_pytorch/lib/python3.6/site-packages/torch/include/ATen/cuda/CUDAContext.h:7:10: fatal error: cublas_v2.h: No such file or directory
 #include <cublas_v2.h>
          ^~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

  warnings.warn('Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:\n\n' + str(sys.exc_info()[1]))
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!
/home//oliviern/stylegan2-ada-pytorch/torch_utils/ops/upfirdn2d.py:34: UserWarning: Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation.

To Reproduce
python train.py --outdir=outdir --data=../fmwa256x256.zip --gpus=1
I've also tried CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/cuda/targets/x86_64-linux/include/ CUDA_HOME=/usr/local/cuda CUDA_PATH_V10_1=/usr/local/cuda/bin PYTHONPATH=/home/oliviern python train.py --outdir=outdir --data=../fmwa256x256.zip --gpus=1 which gives the same error

Desktop (please complete the following information):

  • Linux (CentOS 7, 3.10.0-862.3.2.el7.x86_64)
  • PyTorch 1.7.0
  • CUDA 10.2
  • Tesla V100 32GB
  • Running on a Singularity env

Additional Information
cublas_v2.h is present in /usr/local/cuda/targets/x86_64-linux/include/

ls /usr/local/cuda/targets/x86_64-linux/include/cublas*
/usr/local/cuda/targets/x86_64-linux/include/cublas_api.h  /usr/local/cuda/targets/x86_64-linux/include/cublas_v2.h
/usr/local/cuda/targets/x86_64-linux/include/cublas.h	   /usr/local/cuda/targets/x86_64-linux/include/cublasXt.h

What could be the issue ?

where is Mapping network code

Where can I find the mapping network implicitly so as to configure and see how the image is getting tuned. Also how is the latent vector generator is fed into the Synthesis network(g)

Process hangs on 'Setting up PyTorch plugin "bias_act_plugin"...' when using multiple GPUs

I added these lines to train.py as lines 13 and 14 (right under import os):

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,2,3,4"

I tested the process with --gpus 1 and it spent a few minutes on Setting up PyTorch plugin "bias_act_plugin"... but then proceeded to train. However with --gpus 4 it has been hanging on this line for an hour and a half.

Creating output directory...
Launching processes...
Loading training set...

Num images:  505487
Image shape: [3, 256, 256]
Label shape: [0]

Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"...

Here's the nvidia-smi printout as well. As you can see three of the cores (2,3,4) have 100% GPU utilization while the first core (0) has 0%. The memory usage does not seem to be changing.


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |
| N/A   33C    P0    57W / 300W |   2088MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:1B:00.0 Off |                    0 |
| N/A   34C    P0    59W / 300W |  31147MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:3D:00.0 Off |                    0 |
| N/A   35C    P0    68W / 300W |   4261MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:3E:00.0 Off |                    0 |
| N/A   31C    P0    68W / 300W |   4345MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  Off  | 00000000:88:00.0 Off |                    0 |
| N/A   33C    P0    71W / 300W |   4201MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Do I just need to be more patient? On one core it really only took a couple of minutes to begin training.

EDIT: note that the cores (0,2,3,4) are not consecutive.

upfirdn2d_plugin Problem

Describe the bug
Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!

Please stop closing people's issues without a confirmed fix for this problem. #2 (comment) does not work and there is no confirmed fix on that issue that was closed without a confirmed fix.

Please be serious about it and let's work together for a fix instead of ignoring the problem and referring people to a close topic that does not offer any solution to their problem.

We tried everything proposed we also tried both Cuda 11.0 and 11.1, with different version of PyTorch just in case.
We are a team of 5 people and we all had the same problem in both Windows and Linux machine and even in google Collab which tells me that this is more than just a configuration problem.

and no %pip install ninja did not solve the problem in any of the machines we have in our lab.
also, using verbosity = 'full' does not seem to include any additional helpful information.

Desktop (please complete the following information):

Those are the two machines I used

Machine 1

  • ubuntu 20.04.1,
  • pytorch 1.7.1
  • CUDA 11.1,
  • RTX 3090

Machine 2

  • Windows 10
  • pytorch 1.7.1
  • CUDA 11.1, also tried with - CUDA 11.0
  • CUDA toolkit version (e.g., CUDA 11.0)
  • NVIDIA driver version 461.40
  • RTX 3090

Error: seeds required when generating from projected_w.npz

With the current code, one has to specify a dummy seed when generating images from projected_w.npz.

!python stylegan2-ada-pytorch/generate.py \
 --network={network_pkl} \
 --outdir={output_folder} \
 --projected-w={input_latent_vector}

is met with:

Usage: generate.py [OPTIONS]
Try 'generate.py --help' for help.

Error: Missing option '--seeds'.

However:

!python stylegan2-ada-pytorch/generate.py \
 --network={network_pkl} \
 --outdir={output_folder} \
 --seeds={dummy_seed} \
 --projected-w={input_latent_vector}

is met with:

Loading networks from "/content/lsundog-res256-paper256-kimg100000-noaug.pkl"...
warn: --seeds is ignored when using --projected-w
Generating images from projected W "out/1113080/projected_w.npz"
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.

First, this is due to the code logic, and I am not sure whether it is intended.

# Synthesize the result of a W projection.
if projected_w is not None:
if seeds is not None:
print ('warn: --seeds is ignored when using --projected-w')
print(f'Generating images from projected W "{projected_w}"')
ws = np.load(projected_w)['w']
ws = torch.tensor(ws, device=device) # pylint: disable=not-callable
assert ws.shape[1:] == (G.num_ws, G.w_dim)
for idx, w in enumerate(ws):
img = G.synthesis(w.unsqueeze(0), noise_mode=noise_mode)
img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
img = PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(f'{outdir}/proj{idx:02d}.png')
return

followed by:
if seeds is None:
ctx.fail('--seeds option is required when not using --projected-w')

Second, the "fail" message is not shown. Only the "warn" message and the "missing argument" message are shown.

RuntimeError: DataLoader worker (pid 31703) is killed by signal: Terminated.

Traceback (most recent call last):
File "train.py", line 530, in
main() # pylint: disable=no-value-for-parameter
File "/home/osm/torchgan/lib/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/osm/torchgan/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/osm/torchgan/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/osm/torchgan/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/osm/torchgan/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "train.py", line 523, in main
subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
File "train.py", line 376, in subprocess_fn
training_loop.training_loop(rank=rank, **args)
File "/home/osm/stylegan2-ada-pytorch-main/training/training_loop.py", line 147, in training_loop
G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module
File "/home/osm/stylegan2-ada-pytorch-main/dnnlib/util.py", line 289, in construct_class_by_name
return call_func_by_name(*args, func_name=class_name, **kwargs)
File "/home/osm/stylegan2-ada-pytorch-main/dnnlib/util.py", line 284, in call_func_by_name
return func_obj(*args, **kwargs)
File "/home/osm/stylegan2-ada-pytorch-main/torch_utils/persistence.py", line 104, in init
super().init(*args, **kwargs)
File "/home/osm/stylegan2-ada-pytorch-main/training/networks.py", line 493, in init
self.synthesis = SynthesisNetwork(w_dim=w_dim, img_resolution=img_resolution, img_channels=img_channels, **synthesis_kwargs)
File "/home/osm/stylegan2-ada-pytorch-main/torch_utils/persistence.py", line 104, in init
super().init(*args, **kwargs)
File "/home/osm/stylegan2-ada-pytorch-main/training/networks.py", line 434, in init
assert img_resolution >= 4 and img_resolution & (img_resolution - 1) == 0
AssertionError
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/home/osm/torchgan/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 31703) is killed by signal: Terminated.

Default pl_batch_shrink requires batch to be >= 2

The batch size is int divided by loss.pl_batch_shrink on this line, which leads to an error, since new batch size will be 0. As train.py specifies that batch must be at least 1, the value should probably change to 1 if batch == 1 or train.py should reflect that minimum batch size setting is 2.

Entanglement while generating interpolation

Hello,

I have trained a 256*256 model with 1700 images dataset. I have used paper256 configuration, aug=ada, augpipe=blit, target=0.7. After training, I was trying interpolation and got several other images during the interpolation. Can you please help me what I can do here to improve the disentanglement.

Thank you.

TypeError: forward() takes 2 positional arguments but 4 were given

I modified it like this to get the value of the middle layer of the crystal.

D(x_hat, None)

DataParallel(
  (module): Discriminator(
    (b1024): DiscriminatorBlock(
      (fromrgb): Conv2dLayer()
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b512): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b256): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b128): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b64): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b32): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b16): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b8): DiscriminatorBlock(
      (conv0): Conv2dLayer()
      (conv1): Conv2dLayer()
      (skip): Conv2dLayer()
    )
    (b4): Sequential(
      (0): MinibatchStdLayer()
      (1): Conv2dLayer()
    )
  )
)

However, this error occurs.
TypeError: forward() takes 2 positional arguments but 4 were given

I'd appreciate it if you could tell me the solution.

Thank you.

Specify GPU in training options

Hi, I was wondering if there's any way to specify which GPU to train on. train.py only seems to have an option to specify the number of GPUs

I'm working on a shared system, so it would be awesome if we can selectively train on GPU 0, 1, or 2 only so it doesn't try to use a GPU that someone else is using or to train two models simultaneously.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.