moono / stylegan2-tf-2.x Goto Github PK

View Code? Open in Web Editor NEW

103.0 103.0 29.0 60.47 MB

stylegan2, tensorflow 2, keras subclassing

Python 86.91% Dockerfile 0.55% Cuda 12.54%

stylegan2-tf-2.x's People

Contributors

Stargazers

Watchers

stylegan2-tf-2.x's Issues

What are the "labels" in the generator and discriminator?

Thank you for this awesome repo! I just really struggle to understand what are supposed to be the labels in the generator and discriminator, although they are not actually used since labels_dim is 0 by default for training? (c.f. labels in g_logistic_ns_pathreg)

different resolution pkl file

Hi,

When i try to execute inference_from_official_weights, if I put the checkpoint-required files(in official-pretrained folder) for resolutions other than 1024x1024( for eg if i put the pkl file for stylegan2 that corrisponds to resolution 256x256 (config d) ) and then i try to execute inference_from_official_weights, it doesnt work..any hint why this happens?

when training in 256x256, the output is 256x512 pixels

First, thanks for this great repository. It is very useful to study the sylegan2 architecture!

When training in 256x256 resolution, the output images have a size of 256x512 (h x w), These are in fact two images stacked on top of each other. I can easily 'unstack' this output by reshaping the tensor, but i wonder why it happens? If my batch size is 2, i get 4 outputs. This will become problematic when i will increase resolution and need to generate just a single 512x152. I don't want the system to actually generate a 512x1024.

The two 'stacked' images are also quite similar.

Memory leak

Hi,

My dataset is only around 25gb in memory, but after training for a few hours the memory usage is already more than 100gb and it keeps slowly but constantly increasing. I'm using a custom dataset created with tf.data.Dataset.from_generator. Do you know where the problem could be?

Using 1 gpu (3090), with custom cuda, batch of 4, no labels. The rest is more or less the original code.

Training details

Hi,
This is exactly what I was looking for. Thank you.
But, it is not clear to me how the training needs to be done. Could you please help me with that?

NHWC Support

Hi, thanks for the great repo. I am trying to convert the generator model to tflite. I get this error: "Unexpected value for attribute 'data_format'. Expected 'NHWC'." Do you have an option for NHWC? Or do you have any other idea to convert the generator to tflite file? Thanks...

Training crash randomly

Few thousands of steps after training start, I get the following error :

ValueError: in user code:

    /content/drive/My Drive/stylegan2-master/train.py:198 dist_d_train_step  *
        per_replica_losses = strategy.run(fn=self.d_train_step, args=(inputs,))
    /content/drive/My Drive/stylegan2-master/train.py:129 d_train_step  *
        d_loss = d_logistic(real_images, self.generator, self.discriminator, self.g_params['z_dim'])
    /content/drive/My Drive/stylegan2-master/losses.py:12 d_logistic  *
        real_scores = discriminator([real_images, labels], training=True)
    /content/drive/My Drive/stylegan2-master/stylegan2_ref/discriminator.py:129 call  *
        x = self.last_block(x)
    /content/drive/My Drive/stylegan2-master/stylegan2_ref/discriminator.py:85 call  *
        x = self.minibatch_std(x)
    /content/drive/My Drive/stylegan2-master/stylegan2_ref/custom_layers.py:158 call  *
        y = tf.reshape(x, [group_size, -1, self.num_new_features, s[1] // self.num_new_features, s[2], s[3]])
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 
    [...]  
    raise ValueError(str(e))

ValueError: Dimension size must be evenly divisible by 32768 but is 49152 for '{{node discriminator/4x4/minibatchstd/Reshape}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32](discriminator/8x8/mul, discriminator/4x4/minibatchstd/Reshape/shape)' with input shapes: [6,512,4,4], [6] and with input tensors computed as partial shapes: input[1] = [4,?,1,512,4,4].

The model is learning and I can generate images, but restart is needed each 10 minutes.
Any idea on how to fix it ?

Error

You ResizeConv2D not work when upsampling, I can't use this module

Need help understanding the role of the 'upfirdn' after the transpose_conv

Hello, I don't really get the point of the upfirdn (apart from reducing the dim of the height and width of x by one)?
How is the resample kernel chosen? ([1,3,3,1] in the code)
Why is the filter 4x4 (In the doc of the upfirdn of scipy, it says the filter should be 1-D https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.upfirdn.html )?

checkpoint of FFHQ 256*256?

Hi, thanks for your wonder codes! I wonder if you can share your trained FFHQ 256*256 checkpoint?

10 errors detected in the compilation of upfirdn_2d.cu

Thank you very much for your great work and contribution!

I try to get the code running using CUDA Version: 11.6 , Tensorflow 2.8 and Cudnn 8303

When calling the discriminator through train.py I get several NVCC errors in upfirdn_2d.cu.

Tensorflow version: 2.8.0
2022-11-17 09:21:46.087225: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-17 09:21:46.430941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13626 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
1 Physical GPUs, 1 Logical GPUs
2022-11-17 09:21:56.034905: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-11-17 09:21:56.258153: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8303

Setting up TensorFlow plugin "upfirdn_2d.cu": PreprocessingC:... 2022-11-17 09:22:04.023226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 13626 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
CompilingC:... Failed!
Traceback (most recent call last):
File "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "c:\Users\XX.vscode\extensions\ms-python.python-2022.18.2\pythonFiles\lib\python\debugpy_main.py", line 39, in
cli.main()
File "c:\Users\XX.vscode\extensions\ms-python.python-2022.18.2\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
run()
File "c:\Users\XX.vscode\extensions\ms-python.python-2022.18.2\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "c:\Users\XX.vscode\extensions\ms-python.python-2022.18.2\pythonFiles\lib\python\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "c:\Users\XX.vscode\extensions\ms-python.python-2022.18.2\pythonFiles\lib\python\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "c:\Users\XX.vscode\extensions\ms-python.python-2022.18.2\pythonFiles\lib\python\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "C:...\train.py", line 438, in
main()
File "C:...\train.py", line 432, in main
trainer = Trainer(training_parameters, name=f'stylegan2-ffhq-{args["train_res"]}x{args["train_res"]}')
File "C:...\train.py", line 63, in init
self.discriminator, self.generator, self.g_clone = initiate_models(self.g_params,
File "C:...\train.py", line 16, in initiate_models
generator = load_generator(g_params=g_params, is_g_clone=False, ckpt_dir=None, custom_cuda=use_custom_cuda)
File "C:...\load_models.py", line 25, in load_generator
_ = generator([test_latent, test_labels])
File "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:...\stylegan2\generator.py", line 104, in call
image_out = self.synthesis(w_broadcasted)
File "C:...\stylegan2\layers\synthesis_block.py", line 144, in call
x = block([x, w0, w1])
File "C:...\stylegan2\layers\synthesis_block.py", line 83, in call
x = self.conv_0([x, w0])
File "C:...\stylegan2\layers\modulated_conv2d.py", line 71, in call
x = upsample_conv_2d(x, self.in_res, w, self.kernel, self.kernel, self.pad0, self.pad1, self.k)
File "C:...\stylegan2\layers\cuda\upfirdn_2d_v2.py", line 88, in upsample_conv_2d
return _simple_upfirdn_2d(x, new_x_res, k, pad0=pad0, pad1=pad1)
File "C:...\stylegan2\layers\cuda\upfirdn_2d_v2.py", line 106, in _simple_upfirdn_2d
y = upfirdn_2d_cuda(y, k, upx=up, upy=up, downx=down, downy=down, padx0=pad0, padx1=pad1, pady0=pad0, pady1=pad1)
File "C:...\stylegan2\layers\cuda\upfirdn_2d_v2.py", line 146, in upfirdn_2d_cuda
return func(x)
File "C:...\stylegan2\layers\cuda\upfirdn_2d_v2.py", line 138, in func
y = _get_plugin().up_fir_dn2d(x=x, k=kc, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)
File "C:...\stylegan2\layers\cuda\upfirdn_2d_v2.py", line 10, in _get_plugin
return custom_ops.get_plugin(os.path.join(loc, cu_fn))
File "C:...\stylegan2\layers\cuda\custom_ops.py", line 148, in get_plugin
_run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir))
File "C:...\stylegan2\layers\cuda\custom_ops.py", line 62, in _run_cmd
raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
RuntimeError: Exception encountered when calling layer "conv_0" (type ModulatedConv2D).

NVCC returned an error. See below for full command line and output log:

nvcc --std=c++11 -DNDEBUG "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python_pywrap_tensorflow_internal.lib" --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include" --include-path "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include\external\protobuf_archive\src" --include-path "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include\external\com_google_absl" --include-path "C:\Users\XX\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include\external\eigen_archive" --compiler-bindir "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.31.31103/bin/Hostx64/x64" 2>&1 "C:...\stylegan2\layers\cuda\upfirdn_2d.cu" --shared -o "C:\Users\XX\AppData\Local\Temp\tmp7cgcj6sd\upfirdn_2d_tmp.dll" --keep --keep-dir "C:\Users\XX\AppData\Local\Temp\tmp7cgcj6sd"

C:...\stylegan2\layers\cuda\upfirdn_2d.cu(310): error: expected an expression
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(310): error: no instance of constructor "tensorflow::register_op::OpDefBuilderWrapper::OpDefBuilderWrapper" matches the argument list
argument types are: (const char [10], __nv_bool)
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(323): error: expected an expression
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(323): error: expected an expression
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(323): error: expected a type specifier
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(323): error: expected an expression
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(324): error: expected an expression
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(324): error: expected an expression
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(324): error: expected a type specifier
C:...\stylegan2\layers\cuda\upfirdn_2d.cu(324): error: expected an expression

10 errors detected in the compilation of "w:/Entwicklung/300_Neural_Network/331_StyleGAN_Keras/stylegan2/layers/cuda/upfirdn_2d.cu".
nvcc warning : The -std=c++11 flag is not supported with the configured host compiler. Flag will be ignored.
_pywrap_tensorflow_internal.lib
upfirdn_2d.cu

Call arguments received:
• inputs=['tf.Tensor(shape=(1, 512, 4, 4), dtype=float32)', 'tf.Tensor(shape=(1, 512), dtype=float32)']
• training=None
• mask=None

Do you've got any idea how to fix it?

Please update readme

Hello,

This is a very useful repo. Can you please update readme so one can figure out how to train on custom datasets?

Thank you,
Siavash

Training on custom dataset failed

Hello, I was trying to run the code with the following command:

python train.py --tfrecord_dir=../datasets/butterfly-dataset --train_res=1024

and get the following error message:

2020-09-30 19:31:44.655206: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_grad_input_ops.cc:1103 : Resource exhausted: OOM when allocating tensor with shape[4,64,511,511] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "train.py", line 438, in
main()
File "train.py", line 433, in main
trainer.train(dist_dataset, strategy)
File "train.py", line 261, in train
d_loss = dist_d_train_step((real_images, ))
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in call
result = self._call(*args, **kwds)
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
return self._call_flat(
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
outputs = execute.execute(
File "/media/chembiodep/Storage/GREAT/penv38/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: 3 root error(s) found.
(0) Internal: cudaErrorNoKernelImageForDevice
[[node replica_1/discriminator/1024x1024/skip/UpFirDn2D (defined at :98) ]]
[[mul/_222]]
(1) Internal: cudaErrorNoKernelImageForDevice
[[node replica_1/discriminator/1024x1024/skip/UpFirDn2D (defined at :98) ]]
(2) Internal: cudaErrorNoKernelImageForDevice
[[node replica_1/discriminator/1024x1024/skip/UpFirDn2D (defined at :98) ]]
[[AddN_94/_234]]
0 successful operations.
0 derived errors ignored. [Op:__inference_dist_d_train_step_61417]

Errors may have originated from an input operation.
Input Source operations connected to node replica_1/discriminator/1024x1024/skip/UpFirDn2D:
replica_1/discriminator/1024x1024/skip/Reshape (defined at /media/chembiodep/Storage/GREAT/butterfly/stylegan2-tf-2.x/stylegan2/layers/cuda/upfirdn_2d_v2.py:105)
replica_1/discriminator/1024x1024/skip/Const (defined at /media/chembiodep/Storage/GREAT/butterfly/stylegan2-tf-2.x/stylegan2/layers/cuda/upfirdn_2d_v2.py:129)

Input Source operations connected to node replica_1/discriminator/1024x1024/skip/UpFirDn2D:
replica_1/discriminator/1024x1024/skip/Reshape (defined at /media/chembiodep/Storage/GREAT/butterfly/stylegan2-tf-2.x/stylegan2/layers/cuda/upfirdn_2d_v2.py:105)
replica_1/discriminator/1024x1024/skip/Const (defined at /media/chembiodep/Storage/GREAT/butterfly/stylegan2-tf-2.x/stylegan2/layers/cuda/upfirdn_2d_v2.py:129)

Function call stack:
dist_d_train_step -> dist_d_train_step -> dist_d_train_step

Here is the file structure:

butterfly/
├── datasets/
│ └── butterfly-dataset
│ │ └── ...
└── stylegan2-tf-2.x
├── train.py
└── ...

Spec:
Quadro P6000
TITAN V
CUDA version: 11.1

Cannot convert pkl file to ckpt when the size is 256 or 512.

As title,
thanks for your great working!
could you tell me how to convert the other size pkl to ckpt.

convert official weights

Dear Moono,

Thank you for shaing this great implement. I really apreeciate it.

If I want to convert the weight from tf1 to tf2, I should download the weights from stylegan2 website (stylegan2-ffhq-config-f.pkl), and place it under official-pretrained folder, is this correct?

Then i run inference_from_official_weights.py. It output:

DataLossError: Unable to open table file ./official-pretrained/stylegan2-ffhq-config-f.pkl: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Could you tell me how to solve this issue?

Thank you again for your help.

Best Wishes,

Alex

moono / stylegan2-tf-2.x Goto Github PK

stylegan2-tf-2.x's People

Contributors

Stargazers

Watchers

Forkers

stylegan2-tf-2.x's Issues

Recommend Projects

Recommend Topics

Recommend Org