Giter Club home page Giter Club logo

Comments (5)

podgorskiy avatar podgorskiy commented on July 3, 2024

Could you please give more details?
What custom does the dataset look like? What is the resolution of the images, how it was created?
What's your config?
How many GPUs?
Windows|Linux?
Maybe you can create a minimal example of the dataset with ~8-16 images that causes the same problem?

''Aborted'' - is that all the error reported, or there is something more to it?

Could you please confirm, that before failing at the line you mentioned, the dataset is not empty and has the expected size?

from alae.

amruz avatar amruz commented on July 3, 2024

Hi,
The custom dataset is gray scale image of size (128,128). I created tfrecords for the samples.
My config file looks like this
DATASET:
PART_COUNT: 1
SIZE: 12000
#PATH: /data/datasets/mnist/tfrecords/mnist-r%02d.tfrecords.%03d
SAMPLES_PATH: no_path
PATH: /home/Documents/ALAE-master/data/datasets/test/tfrecords/test-r%02d.tfrecords.%03d
#PATH_TEST: /home/Documents/ALAE-master/data/datasets/test/tfrecords/test-r%02d.test_tfrecords.%03d
MAX_RESOLUTION_LEVEL: 7

MODEL:
LATENT_SPACE_SIZE: 256
LAYER_COUNT: 6
MAX_CHANNEL_COUNT: 256
START_CHANNEL_COUNT: 64
DLATENT_AVG_BETA: 0.995
MAPPING_LAYERS: 8
OUTPUT_DIR: results
CHANNELS: 1
TRAIN:
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 6
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
TRAIN_EPOCHS: 80

4 8 16 32 64 128 256 512 1024

LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 32, 32, 16]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 32, 16]
LOD_2_BATCH_1GPU: [128, 128, 128, 64, 32, 16]

LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]

I am running this on Linux with 1 GPU
Aborted is the only error message I am getting.
Yes i confirmed the dataset length before the mentioned line.
Also the same dataset with resolution 32 works without errors when used with mnist.yaml

from alae.

podgorskiy avatar podgorskiy commented on July 3, 2024

Does it fail only later during training when reaches resolutions > 32 or it fails immediately? Could you share the whole log on pastebin?
You said that it works fine at resolution 32 with mnist.yaml. Do you use precisely the same files as dataset, or you created them separately? It should be the case, that first 4 tfrecord files (that correspond to 4x4, 8x8, 16x16, 32x32) should be the same. So you should be able to run mnist.yaml on the dataset created for resolution 128.
So, if it works fine with mnist.yaml with the 128 dataset, I would just change MAX_RESOLUTION_LEVEL to 7 and LAYER_COUNT to 6 and see what happens.

from alae.

amruz avatar amruz commented on July 3, 2024

I have entered the folds size wrong in my case. When I corrected it, everything works fine

from alae.

Harsha-Musunuri avatar Harsha-Musunuri commented on July 3, 2024

I have entered the folds size wrong in my case. When I corrected it, everything works fine

what does that mean ? I am stuck at the aborted issue, while running on celeba 128 :(
Any advise ?

from alae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.