Comments (5)
Thanks, I have found the reason. The implement of weight_norm in pytorch 0.4.0 is a little different. When the dim is set to be None, weight_norm in 0.4.0 output a 0-dim weight_g which cannot be broadcast to multiple GPUs. Your code work well in pytorch 0.3.1 whose weight_norm output a 1-dim weight_g when dim is None.
from ban-vqa.
Yes, you're right. Can you send me a pull request for it?
Notice that if the number of glimpses is fewer than 32, it does not affect, though.
from ban-vqa.
Thanks for you reply.
I have downloaded your code, data required and run 'python3 main.py --use_both True --use_vg True' on my machine which has 4 tesla v100 GPUs and pytorch 0.4.0 installed.
But I got the following runtime error:
Traceback (most recent call last):
File "main.py", line 99, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home1/yul/zwf/ban-vqa-master/train.py", line 72, in train
pred, att = model(v, b, q, a)
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 113, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 118, in replicate
return replicate(module, device_ids)
File "/home1/yul/.conda/envs/py3.5/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
RuntimeError: slice() cannot be applied to a 0-dim tensor.
It seems like that something wrong happened when torch copies the model into the 4 GPUs. But there is no such error when I train other networks distributedly by using nn.DataParallel. It is really confusing and I have not find the reason yet....
from ban-vqa.
@zhangweifeng1218 Unfortunately, our code is tested on PyTorch 0.3.1 as README describes. I recommend you to check the migration procedure or related issues. The error is persistent when you run the code in 0.3.1? And, I also had used 4 TItan XPs when I trained the model.
from ban-vqa.
@zhangweifeng1218 Good, thanks for the info.
from ban-vqa.
Related Issues (20)
- Why the learning rate is different with or without evalLoader ? HOT 3
- What does the 'xhyk,bvk,bqk->bhvq' mean??? HOT 6
- Can not download the image feature HOT 2
- train36_imgid2idx.pkl file HOT 1
- Annotation and Sentence dataset download
- How to get labels for objects? HOT 2
- can not found "Annotation and Sentence files to data/flickr30k/Flickr30kEntities.tar.gz. " HOT 1
- error when using adaptive_detection_features_converter.py HOT 2
- Evaluate.py HOT 1
- test.py HOT 1
- tar cache.pkl.tgz error, when downloading Pickle caches for the pretrained model HOT 1
- Attention Visualization HOT 1
- where to find "val_flickr30k_resnet101_faster_rcnn_genome.tsv.3"? HOT 1
- flickr 30k features download HOT 4
- how to get the files HOT 1
- Download Flickr30k features HOT 3
- Pretrained model for Flickr30k HOT 2
- Error in Flickr30k features HOT 2
- link no longer works HOT 2
- Trouble creating ID.pkls HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ban-vqa.