Giter Club home page Giter Club logo

Comments (10)

Cadene avatar Cadene commented on May 20, 2024 1

My server is made of 4 * GTX1070 (8GB each) + 64 GB RAM + 2 * Intel Xeon E5-2620V4 (2.1 GHz) + 1TB SSD PCIe + 1TB SSD SATA.

However, to be able to train one model with Attention with a tiny data loading time, you will need one GPU pytorch compatible, 3 threads and 1 * SSD SATA of 500GB devoted to storing data and only data (WARNING: not OS).

With a Pascal GPU on VQA 1.0 (VQA 2.0 will be added soon - it has twice the number of question/answer but same images):

  • on train/val:
    • with no att model: 5 hours
    • with att model: 10 hours
  • on trainval/testdev:
    • with no att model: 10 hours
    • with att model: 20 hours
  • on trainval+visual_genome/testdev (visual genome will be added soon):
    • with no att model: 1 day
    • with att model: 2 days

from vqa.pytorch.

Cadene avatar Cadene commented on May 20, 2024 1

@ahmedmagdiosman I tried to load data from the SSD I use as boot drive, and got high data loading time when training models with Attention (for instance the OS may sometimes write on it or does some blocking stuff). In fact, you need to load data of dim: (batch_size x 2048 x 14 x 14) which is really big. However, the models without Attention (NoAtt) necessitate to load data of dim: (batch_size x 2048). So it's ok :)

An SSD is not required to run the models with Attention, but if you have high data loading time, be sure to use monitoring tools such as atop or htop to locate the bottleneck.

Lastly I suspect that h5py/HDF5 is not well suited for this kind of read intensive tasks. In fact, it seemed to work better in my old Torch7 code with torchnet.IndexedDataset. If I had the time, I would compare h5py/HDF5 and LMDB

from vqa.pytorch.

ili3p avatar ili3p commented on May 20, 2024 1

In my experience, the best is to use pretrained caffe model, eg use MCB code and store the tensors as compressed numpy arrays. In this case it only takes 19gb for the whole train set so you can cache it in RAM.

For torch example see github.com/ilija139/vqa-soft

from vqa.pytorch.

Cadene avatar Cadene commented on May 20, 2024

The VQA dataset is the most widely used dataset by the visual question answering community, as it has the largest volume of human-annotated open-ended answers. Other datasets such as DAQAR, COCO-QA or Visual 7W are limited in terms of size and annotation quality. These limitations make them less relevant for evaluation of multimodal fusion models than the VQA dataset. And we do not provide an implementation for those datasets (feel free to contact us, if you need those datasets).

VQA 1.0 is made of several splits: train, val, test-std (including test-dev)
The biggest models are trained on the train + val splits as training set and the test-dev split is used for validation (on the evaluation server).
Thus for study purpose, the smallest datasets provided in this repo are the train split as trainset and val split as valset.
You can train/eval a model on this split using the trainsplit: train option. See mutan_noatt_train.yaml

from vqa.pytorch.

SeekPoint avatar SeekPoint commented on May 20, 2024

would you like share your hardware system configuration like memory/GPU, and how many time you take on training, thanks.

from vqa.pytorch.

SeekPoint avatar SeekPoint commented on May 20, 2024

I guess the training time excludes the process of generating the features of Resnet-152

from vqa.pytorch.

Cadene avatar Cadene commented on May 20, 2024

Generating the train features takes 30 mn.

By the way the features used in our paper are available https://github.com/Cadene/vqa.pytorch#features

from vqa.pytorch.

SeekPoint avatar SeekPoint commented on May 20, 2024

I under the impression that VQA task always take weeks training.
OK, I will try it later.

from vqa.pytorch.

ahmedmagdiosman avatar ahmedmagdiosman commented on May 20, 2024

Hey,
why is the SSD required to only store data?
Thanks for providing your code!

from vqa.pytorch.

ahmedmagdiosman avatar ahmedmagdiosman commented on May 20, 2024

@Cadene Thanks a lot!

I actually had some really slow loading time with the Torch7 code from the MLB paper, I suspect it's also the HDF5 format since I didn't have a problem loading *.npy files in pycaffe in the original MCB code.

from vqa.pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.