Giter Club home page Giter Club logo

Comments (5)

zhangweifeng1218 avatar zhangweifeng1218 commented on May 23, 2024 2

Thanks, I have found the reason. The implement of weight_norm in pytorch 0.4.0 is a little different. When the dim is set to be None, weight_norm in 0.4.0 output a 0-dim weight_g which cannot be broadcast to multiple GPUs. Your code work well in pytorch 0.3.1 whose weight_norm output a 1-dim weight_g when dim is None.

from ban-vqa.

jnhwkim avatar jnhwkim commented on May 23, 2024

Yes, you're right. Can you send me a pull request for it?
Notice that if the number of glimpses is fewer than 32, it does not affect, though.

from ban-vqa.

zhangweifeng1218 avatar zhangweifeng1218 commented on May 23, 2024

Thanks for you reply.
I have downloaded your code, data required and run 'python3 main.py --use_both True --use_vg True' on my machine which has 4 tesla v100 GPUs and pytorch 0.4.0 installed.
But I got the following runtime error:

Traceback (most recent call last):
File "main.py", line 99, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home1/yul/zwf/ban-vqa-master/train.py", line 72, in train
pred, att = model(v, b, q, a)
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 113, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 118, in replicate
return replicate(module, device_ids)
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
RuntimeError: slice() cannot be applied to a 0-dim tensor.

It seems like that something wrong happened when torch copies the model into the 4 GPUs. But there is no such error when I train other networks distributedly by using nn.DataParallel. It is really confusing and I have not find the reason yet....

from ban-vqa.

jnhwkim avatar jnhwkim commented on May 23, 2024

@zhangweifeng1218 Unfortunately, our code is tested on PyTorch 0.3.1 as README describes. I recommend you to check the migration procedure or related issues. The error is persistent when you run the code in 0.3.1? And, I also had used 4 TItan XPs when I trained the model.

from ban-vqa.

jnhwkim avatar jnhwkim commented on May 23, 2024

@zhangweifeng1218 Good, thanks for the info.

from ban-vqa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.