Comments (5)
As ActNorm uses individual batch to calculate statistics and initialize the parameter using it, in DataParallel scenario it scrambles model training (like batch norm). If you can forward once in 1 GPU (without backward), then ActNorm will be initialized properly and you can use DataParallel to train your model. I found this enables multi gpu training. (without this model is not trainable.)
from glow-pytorch.
Aha! I thought it had something to do with ActNorm
, but your explanation made it very clear. Do you know the best way to forward a batch in one GPU while avoiding doing so in others?
from glow-pytorch.
You can check this f8805e7. This is my workaround. If you can forward 1 batch in 1 GPU this will work. I think you can use this even with torch.no_grad, so maybe this is not a problem.
from glow-pytorch.
Thank you for the change, but I didn't quite work for me. My understanding is that the problem has something to do with two GPUs having different weights after initialization. I don't think calling forward on individual GPUs would synchronize their weights.
from glow-pytorch.
I have a similar problem running the code. Running on a single GPU works fine but logdet would have different values on a multi GPU case.
from glow-pytorch.
Related Issues (20)
- Something wrong with affine coupling? HOT 4
- Question: Does this work with larger resolutions? HOT 2
- coupling.py is error:The size of tensor a (2) must match the size of tensor b (62) at non-singleton dimension 1
- Conditional Gaussian prior parameters produce unnormalized likelihoods HOT 2
- Change image_size to 256 HOT 7
- File Checkpoint HOT 1
- a question about dataset
- Loss value HOT 2
- a question about the sigmoid function in the affine coupling layer HOT 2
- loss NAN HOT 1
- Maybe something wrong with affine paramter in argparse? HOT 1
- Act Norm Output issue HOT 1
- z_list HOT 9
- Why my sample pictures are black? HOT 2
- 如果对图像生成,glow感兴趣,或者需要帮助,可以联系我
- any pretrained models?
- too smalll value of logP
- Flow not perfectly invertible HOT 4
- why with torch.no_grad() when i == 0: HOT 1
- what's the difference between the " reconstruct=True" and " reconstruct=False"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow-pytorch.