Giter Club home page Giter Club logo

Comments (5)

rosinality avatar rosinality commented on August 10, 2024

As ActNorm uses individual batch to calculate statistics and initialize the parameter using it, in DataParallel scenario it scrambles model training (like batch norm). If you can forward once in 1 GPU (without backward), then ActNorm will be initialized properly and you can use DataParallel to train your model. I found this enables multi gpu training. (without this model is not trainable.)

from glow-pytorch.

tangbinh avatar tangbinh commented on August 10, 2024

Aha! I thought it had something to do with ActNorm, but your explanation made it very clear. Do you know the best way to forward a batch in one GPU while avoiding doing so in others?

from glow-pytorch.

rosinality avatar rosinality commented on August 10, 2024

You can check this f8805e7. This is my workaround. If you can forward 1 batch in 1 GPU this will work. I think you can use this even with torch.no_grad, so maybe this is not a problem.

from glow-pytorch.

tangbinh avatar tangbinh commented on August 10, 2024

Thank you for the change, but I didn't quite work for me. My understanding is that the problem has something to do with two GPUs having different weights after initialization. I don't think calling forward on individual GPUs would synchronize their weights.

from glow-pytorch.

eugenelet avatar eugenelet commented on August 10, 2024

I have a similar problem running the code. Running on a single GPU works fine but logdet would have different values on a multi GPU case.

from glow-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.