Thank you for your code. It looks like you have tried to use <code class="notranslate"

You can check this <a class="commit-link" data-hovercard-type="commit" data-hovercard-

Data Parallel about glow-pytorch HOT 5 CLOSED

rosinality commented on August 10, 2024

Data Parallel

from glow-pytorch.

Comments (5)

rosinality commented on August 10, 2024

As ActNorm uses individual batch to calculate statistics and initialize the parameter using it, in DataParallel scenario it scrambles model training (like batch norm). If you can forward once in 1 GPU (without backward), then ActNorm will be initialized properly and you can use DataParallel to train your model. I found this enables multi gpu training. (without this model is not trainable.)

from glow-pytorch.

tangbinh commented on August 10, 2024

Aha! I thought it had something to do with ActNorm, but your explanation made it very clear. Do you know the best way to forward a batch in one GPU while avoiding doing so in others?

from glow-pytorch.

rosinality commented on August 10, 2024

You can check this f8805e7. This is my workaround. If you can forward 1 batch in 1 GPU this will work. I think you can use this even with torch.no_grad, so maybe this is not a problem.

from glow-pytorch.

tangbinh commented on August 10, 2024

Thank you for the change, but I didn't quite work for me. My understanding is that the problem has something to do with two GPUs having different weights after initialization. I don't think calling forward on individual GPUs would synchronize their weights.

from glow-pytorch.

eugenelet commented on August 10, 2024

I have a similar problem running the code. Running on a single GPU works fine but logdet would have different values on a multi GPU case.

from glow-pytorch.

Recommend Projects

Data Parallel about glow-pytorch HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent