Is there a toy/small dataset provided for study purpose about vqa.pytorch HOT 10 CLOSED

cadene commented on May 20, 2024

Is there a toy/small dataset provided for study purpose

from vqa.pytorch.

Comments (10)

Cadene commented on May 20, 2024 1

My server is made of 4 * GTX1070 (8GB each) + 64 GB RAM + 2 * Intel Xeon E5-2620V4 (2.1 GHz) + 1TB SSD PCIe + 1TB SSD SATA.

However, to be able to train one model with Attention with a tiny data loading time, you will need one GPU pytorch compatible, 3 threads and 1 * SSD SATA of 500GB devoted to storing data and only data (WARNING: not OS).

With a Pascal GPU on VQA 1.0 (VQA 2.0 will be added soon - it has twice the number of question/answer but same images):

on train/val:
- with no att model: 5 hours
- with att model: 10 hours
on trainval/testdev:
- with no att model: 10 hours
- with att model: 20 hours
on trainval+visual_genome/testdev (visual genome will be added soon):
- with no att model: 1 day
- with att model: 2 days

from vqa.pytorch.

Cadene commented on May 20, 2024 1

@ahmedmagdiosman I tried to load data from the SSD I use as boot drive, and got high data loading time when training models with Attention (for instance the OS may sometimes write on it or does some blocking stuff). In fact, you need to load data of dim: (batch_size x 2048 x 14 x 14) which is really big. However, the models without Attention (NoAtt) necessitate to load data of dim: (batch_size x 2048). So it's ok :)

An SSD is not required to run the models with Attention, but if you have high data loading time, be sure to use monitoring tools such as atop or htop to locate the bottleneck.

Lastly I suspect that h5py/HDF5 is not well suited for this kind of read intensive tasks. In fact, it seemed to work better in my old Torch7 code with torchnet.IndexedDataset. If I had the time, I would compare h5py/HDF5 and LMDB

from vqa.pytorch.

ili3p commented on May 20, 2024 1

In my experience, the best is to use pretrained caffe model, eg use MCB code and store the tensors as compressed numpy arrays. In this case it only takes 19gb for the whole train set so you can cache it in RAM.

For torch example see github.com/ilija139/vqa-soft

from vqa.pytorch.

Cadene commented on May 20, 2024

The VQA dataset is the most widely used dataset by the visual question answering community, as it has the largest volume of human-annotated open-ended answers. Other datasets such as DAQAR, COCO-QA or Visual 7W are limited in terms of size and annotation quality. These limitations make them less relevant for evaluation of multimodal fusion models than the VQA dataset. And we do not provide an implementation for those datasets (feel free to contact us, if you need those datasets).

VQA 1.0 is made of several splits: train, val, test-std (including test-dev)
The biggest models are trained on the train + val splits as training set and the test-dev split is used for validation (on the evaluation server).
Thus for study purpose, the smallest datasets provided in this repo are the train split as trainset and val split as valset.
You can train/eval a model on this split using the trainsplit: train option. See mutan_noatt_train.yaml

from vqa.pytorch.

SeekPoint commented on May 20, 2024

would you like share your hardware system configuration like memory/GPU, and how many time you take on training, thanks.

from vqa.pytorch.

SeekPoint commented on May 20, 2024

I guess the training time excludes the process of generating the features of Resnet-152

from vqa.pytorch.

Cadene commented on May 20, 2024

Generating the train features takes 30 mn.

By the way the features used in our paper are available https://github.com/Cadene/vqa.pytorch#features

from vqa.pytorch.

SeekPoint commented on May 20, 2024

I under the impression that VQA task always take weeks training.
OK, I will try it later.

from vqa.pytorch.

ahmedmagdiosman commented on May 20, 2024

Hey,
why is the SSD required to only store data?
Thanks for providing your code!

from vqa.pytorch.

ahmedmagdiosman commented on May 20, 2024

@Cadene Thanks a lot!

I actually had some really slow loading time with the Torch7 code from the MLB paper, I suspect it's also the HDF5 format since I didn't have a problem loading *.npy files in pycaffe in the original MCB code.

from vqa.pytorch.

Is there a toy/small dataset provided for study purpose about vqa.pytorch HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent