jnhwkim / ban-vqa Goto Github PK

View Code? Open in Web Editor NEW

540.0 14.0 99.0 1.24 MB

Bilinear attention networks for visual question answering

License: MIT License

Python 95.48% Shell 4.52%

visual-question-answering attention bilinear-pooling pytorch-implmention

ban-vqa's People

Contributors

Stargazers

Watchers

Forkers

kmader 19ai shafiahmed cclauss achint08 stevenlol jdc08161063 gdlsdfz hyzcn codeaudit briando2005 ziaridoy20 cch2016 coderhhx hephaex mohitzsh xiaojie18 claudiogreco chenfei-wu liciolj tkim-snu researcher2003pro erobic quangvy2703 csehong dukebw namisan shubhampachori12110095 summerraining jaesuny back2yes cyhbrilliant eustcpl o7s8r6 klqulei swstarlab wangwenshan amirunpri2018 ythhy jhy1993 thilinicooray msj905 farleylai chen-joe-zy aadyanatesan dwang68 yldcs lxxaaa ronghanghu 525747310 qaq-v shaobo-xu dathuynh crystal22 opaya scape1989 yancyycwong ammieqi applejenny66 dong-jinkim hyoje42 youjibiying jaeyun95 jxqi sauravskv07 liangwenkai onlyonewater zhwzhong alexmirrington ayush1801 visheshjain112 oomintrixx daisey666 yangjianhang ybybzhang nimo1989 exquisitedice mymuli tianzhengg liuhl-source nikhilbchilwant strategist922 sunshinewhy pter61 coral3333 mightycrane pengfeiliheu jelarum nobelvictory drigoni hyu-graduation-project-vqa techthiyanes syx528911137 shashwatnigam99 atroxgod aiegoo deardeer7 womengfeixiang

ban-vqa's Issues

Evaluating pretrained model

Hello,

I am trying to evaluate the pretrained model on the VQA dataset. If possible, I would like to ask you the following questions:

I executed the command "python3.6 evaluate.py". However, in that case, the script returns the following error:

Evaluate a given model optimized by training split using validation split.
loading dictionary from data/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
  File "evaluate.py", line 47, in <module>
    model.load_state_dict(model_data.get('model_state', model_data))
  File "/home/claudio.greco/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
    .format(name))
KeyError: 'unexpected key "module.w_emb.emb_.weight" in state_dict'

Probably, this happens because the default parameters of the script do not match the ones of the pretrained model. Am I right?

In order to solve problem (1), I executed the command "python3.6 evaluate.py. --num_hid=1280 --op='c' --gamma=8". In this case, it works, but the script returns the result "eval score: 82.23 (92.66)", which seems a bit too high to me. With what row and table in the paper should I compare this result to?
I tried to evaluate the pretrained model on the test split of the VQA dataset by changing "eval_dset = VQAFeatureDataset('dev', dictionary, adaptive=True)" to "eval_dset = VQAFeatureDataset('test2015', dictionary, adaptive=True)" in the evaluate.py script. However, in that case, the script returns the following error:

Evaluate a given model optimized by training split using validation split.
loading dictionary from data/dictionary.pkl
loading features from h5 file
/mnt/8tera/claudio.greco/ban-vqa/language_model.py:95: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
  output, hidden = self.rnn(x, hidden)
Traceback (most recent call last):
  File "evaluate_new.py", line 51, in <module>
    eval_score, bound, entropy = evaluate(model, eval_loader)
  File "/mnt/8tera/claudio.greco/ban-vqa/train.py", line 121, in evaluate
    batch_score = compute_score_with_logits(pred, a.cuda()).sum()
  File "/mnt/8tera/claudio.greco/ban-vqa/train.py", line 26, in compute_score_with_logits
    one_hots.scatter_(1, logits.view(-1, 1), 1)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

Do you know why this is happening?

Thank you very much!

Can I create a communication group

Can I create a communication group about this project?
Do you have any recommended tools?

can not found "Annotation and Sentence files to data/flickr30k/Flickr30kEntities.tar.gz. "

Same as above

Are the hdf5 files in the downloaded flickr30k_features.zip used to reproduce the results? I don't see tsv files in flickr30k_features.zip but I do need the features and bounding boxes for flickr 30k validation/testing sets. The files in flickr30k_features.zip are confusing, for example, in val.hdf5 file, there are (30722, 2048) features, but in adaptive_detection_features_converter.py, known_num_boxes for validation set is 29906, so what are these 30722 features?

Evaluate.py

When you run evaluate.py for the pretrained model, is there a way to run evaluate without needing a GPU/Cuda?

Flickr30K evaluation?

It seems like the Flickr30K grounding task in the report is not included in the repo.
Am I missing something?

Do you use the VG dataset to get the results on validation set

Hi Kim:
Thanks for your excellent work and code

Did you use the visual genome dataset for training to get the results on validation set which is listed in table 1 in your paper? As you compared with bottom-up and top-down results which used the VG dataset, so I assume you also used the VG dataset + VQA 2.0 train to get the final results on validation set, am i right ?

where to find "val_flickr30k_resnet101_faster_rcnn_genome.tsv.3"?

Hi,

I downloaded flickr30k_features.zip from https://drive.google.com/file/d/1BmcxeY1kXzMZv54d4wMtl7HGc8Cs9zgO/view?usp=sharing but all the tsv files are not in this zip file. Where can I get them? Also, why there exist train/val/test.hdf5 files in this zip file? I thought these files should be generated through adaptive_detection_features_converter.py.

link no longer works

Dear authors:
the link image metadata and question answers of VQA are no longer works.could you support it again?

Compared models without using Visual Genome

Hi Kim:

Thanks for sharing your great work and elegant codes.

I have questions about your test-dev results. As your READER.MD indicated, the training contains the data-augmentation trick with Visual Genome. However, the compared models (Counter, Bottom-up) in your paper did not use Visual Genome for training. That seems an unfair comparison.

Have you trained BAN model without Visual Genome? I think it could better verify your model high efficiency.

Question

Hello guys,

Very nice piece of work.
I was wondering why you didn't use a
einsum implementation of the bilinear attention in order to speed up training.

This equation is perfect for it. U should have a significant gain, and it would be nice for once to have highly optimized code available on github.

Best,
T.C

too many values to unpack error

In attention.py, forward_all method returns only one values.

ban-vqa/base_model.py

Line 53 in cf0c8e1

att, logits = self.v_att.forward_all(v, q_emb) # b x g x v x q

Question about Visual Genome version

hi, @jnhwkim I have a question about visual genome version.

I find visual genome version is 1.2 in README.md.

but in dataset.py, the 1.2 version of image_data.json does not have a key called id, this key in version 1.0.

the 1.2 version example:

so which version should I use?

thank you first!

Annotation and Sentence dataset download

nevermind, sovled

cannot reproduce the best result of single model

I followed all the instructions and use the default hyperparameters, which should give me the best results. However, if I set random seed=1204 as default, I can only get 69.84 on test-dev split, which is 0.2 lower than the reported results. And I notice that the standard deviations reported on val split is around 0.11.
Can you give me some advice on how to fix the gap?
Thx!

Unaccessible questions and annotations

Hello, thanks for your work! The links of questions and annotations in download.sh are unaccessible to me, so I use questions and annotations from VQA[https://visualqa.org] like this one (https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip). However, I got huge train_loss while running python main.py --use_both True --use_vg True --batch_size 32.
I was wondering if I used the wrong data. If so, could anyone please tell me or provide another valid link?

flickr30k upperbound

Hello,

I used Bottom-up Attention to get boxes for Flickr30k data. Unfortunately, I could not get the same upperbound you reported in the paper. I get 0.6507 you reported 0.8745. Do you mind providing the details how you used Bottom-up model for inducing boxes. Below I listed mine:

model_name: resnet101_faster_rcnn_final.caffemodel
conf_thresh=0.2
min_boxes=10
max_boxes=100

UPDATE:

When I increase the number of boxes I get better upperbound but still it is not as good as yours, below setup gives me upperbound 0.8530

model_name: resnet101_faster_rcnn_final.caffemodel
conf_thresh=0.01
min_boxes=200
max_boxes=200

bug in bc.py

line 39 in bc.py:
self.h_net = weight_norm(nn.Linear(h_dim, h_out), dim=None)
is this should be
self.h_net = weight_norm(nn.Linear(h_dim*self.k, h_out), dim=None)

How to use the pre

Hello,
I'm the first time try to use a vqa network, and I wonder how can I use the pretrained model to ask a question on a image and get a response? Thank you.

train36_imgid2idx.pkl file

Hi, thank you for sharing your code. I was wondering what does data/train36_imgid2idx.pkl contain exactly ?

error from tools/process.sh

I have downloaded everything listed in tools/download.sh.
Could you provide the missing data as well?
Thank you.

Traceback (most recent call last):
File "tools/adaptive_detection_features_converter.py", line 199, in
extract('train', infiles, args.task)
File "tools/adaptive_detection_features_converter.py", line 94, in extract
imgids = utils.load_imageid(path_imgs[split])
File "/home/sizhangyu/Documents/pytorch_code/ban-vqa/utils.py", line 47, in load_imageid
images = load_folder(folder, 'jpg')
File "/home/sizhangyu/Documents/pytorch_code/ban-vqa/utils.py", line 40, in load_folder
for f in sorted(os.listdir(folder)):
FileNotFoundError: [Errno 2] No such file or directory: 'data/train2014'

how to get the files

I don't have 'data/question_answers.json' and 'image_data/json',how to get it or generate it

tools/download.sh does not work properly

I read your paper very enjoyable.
However, there is a small problem with code execution.

http://visualq.org/ related links in download.sh do not work.
Does the link in https://visualqa.org/download.html work as expected?

Thank you for your research.

Reproducing error

Upon reproducing this result, I encounter the following error.
Traceback (most recent call last):
File "main.py", line 96, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home/tingting/Documents/tingting/ban-vqa/train.py", line 72, in train
pred, att = model(v, b, q, a)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 113, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 118, in replicate
return replicate(module, device_ids)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
RuntimeError: slice() cannot be applied to a 0-dim tensor

After tracing this code, I found that if I delete "nn.DataParallel(model).cuda()", it worked well.

I use 4 GTX 1080 ti. Have you encountered the same thing before?

error when using adaptive_detection_features_converter.py

While running adaptive_detection_features_converter.py for the TSV files, I am getting this error and can't resolve it. Any leads here would be helpful. This error occurs when trying to decode the features/boxes from the tsv file.

File "tools/adaptive_detection_features_converter.py", line 156, in extract
bboxes = np.frombuffer(base64.decodestring(item['boxes']), dtype=np.float32).reshape((item['num_boxes'], -1))
File "/home/reddy/myvenv/lib/python3.6/base64.py", line 554, in decodestring
return decodebytes(s)
File "/home/reddy/myvenv/lib/python3.6/base64.py", line 546, in decodebytes
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

tar cache.pkl.tgz error, when downloading Pickle caches for the pretrained model

Thanks a lot for sharing code!
After downloading cache.pkl.tgz and entering the following command:

tar xvf data/cache/cache.pkl.tgz -C data/cache/

I got:

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Is there something wrong with the cache file on google drive?

Applying BAN to Flickr30K

Hi,

I am wondering if you can share the code for BAN on Flickr30k.

Thanks very much!

Training too slow

My machine has 3 Titan 1080 Ti, 12 Intel i7 CPUs. Its total memory is 65GB. However, the program cost me more than 5800s to run an epoch.
My command is python3 main.py --use_both True --use_vg True --batch_size 128 because batch size 256 will out of memory.

epoch 1, time: 5844.42
        train_loss: 3.32, norm: 4.2468, score: 51.21
gradual warmup lr: 0.0010
epoch 2, time: 5844.72
        train_loss: 3.05, norm: 2.5201, score: 55.44
gradual warmup lr: 0.0014
epoch 3, time: 5839.73
        train_loss: 2.90, norm: 1.7370, score: 58.02
lr: 0.0014
epoch 4, time: 5835.09
        train_loss: 2.75, norm: 1.3749, score: 60.45
lr: 0.0014
epoch 5, time: 5837.11
        train_loss: 2.64, norm: 1.2232, score: 62.33
lr: 0.0014
epoch 6, time: 5829.90
        train_loss: 2.54, norm: 1.1545, score: 63.88
lr: 0.0014
epoch 7, time: 5832.88
        train_loss: 2.46, norm: 1.1238, score: 65.32
lr: 0.0014
epoch 8, time: 5834.77
        train_loss: 2.39, norm: 1.1157, score: 66.59

test.py

Hello,

I'd like to use the model, I expect to enter a question in string, and an image path, but in the test.py, the input is the saved model, I am wondering where to input the question and image to test the model?

Run test get KeyError: 1 error

when I run python test.py --label mytest, get this error:

Traceback (most recent call last):
  File "test.py", line 91, in <module>
    eval_dset = VQAFeatureDataset(args.split, dictionary, adaptive=True)
  File "/home/gwh/Downloads/ban-vqa-master/dataset.py", line 244, in __init__
    self.entries = _load_dataset(dataroot, name, self.img_id2idx, self.label2ans)
  File "/home/gwh/Downloads/ban-vqa-master/dataset.py", line 142, in _load_dataset
    entries.append(_create_entry(img_id2val[img_id], question, None))
KeyError: 1

I find data/test2015_imgid2idx.pkl is {}, the file is generated with python3 tools/adaptive_detection_features_converter.py.

Can you help me? @jnhwkim Thanks in advance for any suggestions.

Pretrained model for Flickr30k

Dear Authors,

the link to download the pre-trained model for Flickr30k no longer works. Could you please update it again?
Link not working: https://drive.google.com/uc?export=download&id=1xiVVRPsbabipyHes25iE0uj2YkdKWv3K

Davide

Error in Flickr30k features

Dear authors,

I saw your previous answer, but I didn't have time to answer before the issue was closed.
I have tried two different Linux systems and have also tried on Windows. I have tried Chrome and Firefox. I can download the package but not unzip it because it gives me an error with the train.hdf5 file. It says the file is corrupted. I also tried two different internet connections. I can't unzip without errors. I have tried to download the file several times, but the result is always the same.

Could you please check the train.hdf5 file?
Davide

Originally posted by @drigoni in #46 (comment)

Can not download the image feature

I can not download the image feature, so can you provide another way to download it?

What does the 'xhyk,bvk,bqk->bhvq' mean???

What does this mean in the code, logits = torch.einsum('xhyk,bvk,bqk->bhvq', (self.h_mat, v_, q_)) + self.h_bias
What does the 'xhyk,bvk,bqk->bhvq' mean???

Which files are needed for inference only?

I want to only inference using this model.

Is it possible to have only the pre-trained model file for inference?
If not, should I run both download.sh and download_data.sh for inference only?

Out of memory while executing loss.backward()

Hello, thanks for your great code! I have some trouble while running

python3 main.py --use_both True --use_vg True

I have 4 TITAN Xps, which has 12.2G memory per GPU, and set the batchsize to 256. Then I get the following error:

nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 97, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home/Project/ban-vqa/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

And If I set batchsize to 128, it will occupy ~12G GPU memory during the early stage and then goes down to ~6G per GPU. Is there something wrong with my execution?
Thx!

Memory error

Hi, I am tring to run your repository, but I keep getting the following error:

Namespace(batch_size=128, epochs=13, gamma=8, input=None, model='ban', num_hid=1280, op='c', output='saved_models/ban', seed=1204, tfidf=True, use_both=False, use_vg=False)
loading dictionary from data/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
  File "main.py", line 50, in <module>
    train_dset = VQAFeatureDataset('train', dictionary, adaptive=True)
  File "/home/michas/Desktop/codes/ban-vqa/dataset.py", line 234, in __init__
    self.features = np.array(hf.get('image_features'))
MemoryError

I suppose it is hapenning because of trying to load the whole dataset as a numpy array into RAM (which I have 32GB). Can you suggest any solution?
Thanks

Ensemble details

Hi, thanks for the library.
Is it possible to share details of your ensemble method?

Trouble creating ID.pkls

Hello :)

first of all thank you for sharing your repo!

i am having trouble creating those files:
indices_file = {
'train': 'data/train_imgid2idx.pkl',
'val': 'data/val_imgid2idx.pkl',
'test': 'data/test2015_imgid2idx.pkl'}
ids_file = {
'train': 'data/train_ids.pkl',
'val': 'data/val_ids.pkl',
'test': 'data/test2015_ids.pkl'}

because the utils.py demands .jpeg images to create the indexes which are not created at this point. Could you be so kind to share the id.pkls?

thank you and best regards
Max

How to get labels for objects?

Hi, I am very interested in your BAN model on flickr30k. I am wondering do you provide labels for detected objects together with bounding boxes and features, just like what faster-rcnn or bottom-up attention would do? Since I am not so sure about how you prepared your dataset, I'm afraid if I use pre-trained models to predict labels myself, the dataloader pipeline would have some problem. Thanks!

I got an error with arguments

When i run the main.py
I got un error

main.py: error: unrecognized arguments: True True

Then, I fixed the command

$ python3 main.py --use_both True --use_vg True

into

$ python3 main.py --use_both --use_vg

Download Flickr30k features

Dear authors,

I am following the instruction reported in file ./tools/download_flickr.sh.
I have succeeded in downloading the Flickr30k Image Features: (Link: https://drive.google.com/file/d/1BmcxeY1kXzMZv54d4wMtl7HGc8Cs9zgO/view?usp=sharing), but I am not able to unzip the file due to an error. I tried different ways to unzip it but I am not able to do it.
Could you please check the file?

Best regards,
Davide

Attention Visualization

Hi,
Love your work and repository

Just want to now how can I get the attention visualization? (like Figures 3,4 in the paper)

Evaluating accuracy on test?

When I run python3 test.py --label mytest, i got a warning 'RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().' and code still complete but the result was evaluated on VQA challenge only 1% for overall. I use the your pretrained model and feature.

Why the learning rate is different with or without evalLoader ?

Hi JinHwa:

I wonder why the learning rate is different with or without evalLoader? Thanks in advance!

Best
Jiasen

jnhwkim / ban-vqa Goto Github PK

ban-vqa's People

Contributors

Stargazers

Watchers

Forkers

ban-vqa's Issues

Recommend Projects

Recommend Topics

Recommend Org