Comments (5)
Could you please give more details?
What custom does the dataset look like? What is the resolution of the images, how it was created?
What's your config?
How many GPUs?
Windows|Linux?
Maybe you can create a minimal example of the dataset with ~8-16 images that causes the same problem?
''Aborted'' - is that all the error reported, or there is something more to it?
Could you please confirm, that before failing at the line you mentioned, the dataset is not empty and has the expected size?
from alae.
Hi,
The custom dataset is gray scale image of size (128,128). I created tfrecords for the samples.
My config file looks like this
DATASET:
PART_COUNT: 1
SIZE: 12000
#PATH: /data/datasets/mnist/tfrecords/mnist-r%02d.tfrecords.%03d
SAMPLES_PATH: no_path
PATH: /home/Documents/ALAE-master/data/datasets/test/tfrecords/test-r%02d.tfrecords.%03d
#PATH_TEST: /home/Documents/ALAE-master/data/datasets/test/tfrecords/test-r%02d.test_tfrecords.%03d
MAX_RESOLUTION_LEVEL: 7
MODEL:
LATENT_SPACE_SIZE: 256
LAYER_COUNT: 6
MAX_CHANNEL_COUNT: 256
START_CHANNEL_COUNT: 64
DLATENT_AVG_BETA: 0.995
MAPPING_LAYERS: 8
OUTPUT_DIR: results
CHANNELS: 1
TRAIN:
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 6
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
TRAIN_EPOCHS: 80
4 8 16 32 64 128 256 512 1024
LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 32, 32, 16]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 32, 16]
LOD_2_BATCH_1GPU: [128, 128, 128, 64, 32, 16]
LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]
I am running this on Linux with 1 GPU
Aborted is the only error message I am getting.
Yes i confirmed the dataset length before the mentioned line.
Also the same dataset with resolution 32 works without errors when used with mnist.yaml
from alae.
Does it fail only later during training when reaches resolutions > 32 or it fails immediately? Could you share the whole log on pastebin?
You said that it works fine at resolution 32 with mnist.yaml. Do you use precisely the same files as dataset, or you created them separately? It should be the case, that first 4 tfrecord files (that correspond to 4x4, 8x8, 16x16, 32x32) should be the same. So you should be able to run mnist.yaml on the dataset created for resolution 128.
So, if it works fine with mnist.yaml with the 128 dataset, I would just change MAX_RESOLUTION_LEVEL to 7 and LAYER_COUNT to 6 and see what happens.
from alae.
I have entered the folds size wrong in my case. When I corrected it, everything works fine
from alae.
I have entered the folds size wrong in my case. When I corrected it, everything works fine
what does that mean ? I am stuck at the aborted issue, while running on celeba 128 :(
Any advise ?
from alae.
Related Issues (20)
- Can run interactive demo but can't train: Segmentation fault (core dumped) HOT 6
- what is interactive_slider.py HOT 1
- AttributeError: 'WarmupMultiStepLR' object has no attribute 'verbose' HOT 2
- conv about fuse_scale
- Why can't I run ALAE on my cpu machine? HOT 1
- where is the manipulations mentioned in the abstract?
- Training about StyleALAE
- RTX 3090 - libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11 HOT 2
- Some Question about cfg.DATASET.PATH HOT 1
- Trying to train on Google Colab: What should expect after throwing Transition ended HOT 8
- Difference between prepare_celeba_tfrecords.py and prepare_celeba_hq_tfrecords.py HOT 2
- The id information is lost mostly after the image reconstruction HOT 1
- Get original image error
- repurposing models
- how can i download celebahq-classifier-00-male.pkl
- ImportError when downloading the data and trying to run the demo HOT 2
- The picture doesn't change at all.
- Error while running download_all script HOT 1
- Got 3 channels instead of 1 channels while trainning for MNIST-Style
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alae.