e-bug / volta Goto Github PK

[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"

Home Page: https://aclanthology.org/2021.tacl-1.58/

License: MIT License

Python 49.77% Cuda 4.11% C++ 4.14% Shell 0.36% C 0.17% Makefile 0.02% CSS 0.09% HTML 0.14% Dockerfile 0.05% Jupyter Notebook 40.11% JavaScript 0.11% Jinja 0.01% Lua 0.22% MATLAB 0.60% Cython 0.11%

volta's People

Contributors

Stargazers

Watchers

volta's Issues

Is it possible to provide a pretrained LXMERT model and fine-tune it with your code on RefCOCO?

Hello.
I have a pretrained two LXMERT models, using the official LXMERT GitHub repository.
I want to evaluate my models on RefCOCO.
I was wondering if it is possible to use your implementation to fine-tune in on the RefCOCO task?

I do not want to change to pre-training, just to compare these two models on RefCOCO.
My models are stored in model_LXRT.pth files (same as LXMERT implementation).

Thanks!

Finetuning models on SNLI-VE

Hi,

To finetune the pre-trained models(ctrl) for the snli-ve dataset, all I need to do is to run the training script in the examples?
Is my understanding correct? Or there are some other changes that need to be made?

Thanks!

ViLBERT pretrained weights not the same as official weights

Hello, I wanted to ask if you pretrained vilbert weights yourself using train_concap.py. If the answer is yes, than I guess there is some mistake. I checked your pretrained weights with the one facebook github (link is here: https://github.com/facebookresearch/vilbert-multi-task, the weights are down below Visiolinguistic Pre-training and Multi Task Training ) and the weights are not the same.

What did I do?
I have run this code without being interested in the output as such.
python train_task.py --config_file config/vilbert_base.json --from_pretrained vilbert.bin --tasks_config_file config_tasks/vilbert_tasks.yml --task 16 --adam_epsilon 1e-6 --adam_betas 0.9 0.999 --adam_correct_bias --weight_decay 0.0001 --warmup_proportion 0.1 --clip_grad_norm 1.0 --output_dir checkpoints/foil/vilbert --logdir log/foil
I added this code ( line model = ... is alredy there just so you know where I added the printing)

model = BertForVLTasks.from_pretrained(args.from_pretrained, config=config, task_cfg=task_cfg, task_ids=[task])
    zero_shot = True
    for name, param in model.named_parameters():
        if "bert.encoder.layer.12.attention_self.v_query.weight" in name:
            print(name)
            print(param)
        if "bert.encoder.layer.12.attention_self.v_query.bias" in name:
            print(name)
            print(param)

I did the same for pretrained weights grom facebook github.

What was the output?

I understand weights can be different after pretraining because of many things but here I observed that the bias weight are always non-zero in your weights and zero for facebook weights. The difference I found is in all the attention_self query, keys and values layers biases and that yours are non-zero and from facebook they are zero. I also tried different number of layer (0, 14,...) and I have tried it for v_query and query, and also key and values and the result was always the same. As your train_concap and facebooks train_concap both start from bert_base_uncased I geuss there is some small detail wrong in your model. I didnt found the reason for this, but to me it looks like there should be something setting biases to zero and same other small change in the model ( which to me quite makes sense as in the attention mechanism, when we are creating K,V and Q we only use weights and no bias). I have tried even evaluate some tasks with these two pretrained weights and they lead to different results so something probably should be different. Can you look at this? Thank you.

Alternative Download Link for Preprocessed data

Hi.

Could I ask another download link for preprocessed data?
I tried to download conceptual caption, but it takes 3 months for completing download..

Thank you

Generating hard negative captions in Flick30k dataset

A download link of hard negative captions is provided for the Flick30k dataset here.
But neither explanations nor scripts are found for generating those hard negatives.
Could you help reproduce the generation process?
@e-bug @elliottd

vg-gqa_boxes36.lmdb file

Hi,
Is it possible to make "vg-gqa_boxes36.lmdb" file available?
It seems the h5 version is available to download!

Thanks

Plans to add ERNIE-ViL?

Dear Emanuele,

First of all, thanks for the great library, using it has been great.
I was wondering if there are currently any plans for adding the ERNIE-ViL to Volta? The structured knowledge would be great for my work, but I rather see it as part of a framework where it is easily comparable.

Thanks

Resume Training

Hi,

While resuming the model training the below error appears:

Regards,

multiple GPUS

Hi,
Did you run the code on multiple GPUS, I am getting the following error Caught StopIteration in replica 0 on device 0.

Thanks

Is it possible to share the F1 score or accuracy of pre-training the models in their control setting in the image sentence alignment objective?

Hi, thanks to the great repository! Is it possible to post the F1 score or accuracy of pre-training the models in their control setting in the image sentence alignment objective? I don't see these information in the VOLTA paper.

Unable to find fine-tuned checkpoints and features

I am trying to evaluate the vilbert model on flickr30k dataset using the script, but I am not sure where to download the fine-tuned checkpoints from, as well as the features.

Can someone please help me with this?
Thanks!

On the visual token added to linguistic tokens in VLBertEmbeddings class

Hello.

I have a question about the VLBertEmbeddings class.

In its forward function, a global image feature is added into linguistic tokens
The last token in vision sequence is used as the global image feature like bellow:

volta/volta/embeddings.py

Line 271 in 9e52021

 text_visual_embeddings = final_feats[:, -1].repeat(1, seq_length).view(batch_size, seq_length, -1) 

Using the last token seems reasonable for the original VLBert (vl-bert_base.json) because add_global_imgfeat is last,
but I think this should be the first token for the controlled VLBert (ctrl_vl-bert_base.json), whose add_global_imgfeat is first.

Are there any reason that the last token is always used in the class?

I'm sorry if I misunderstand the way the embeddings classes work.

Thanks.

LMDB file download

Currently to download LMDB files , the link opens up the lmdb file and we download each content separately. Following this link. Can you share a link where we can directly download the lmdb file as we arent able to compress the contents back into lmdb format? @elliottd @e-bug

Is it possible to release pretrained model weights (without downstream task fine-tuned)?

Hello! Thanks for your great repository.
I saw you only release the already fine-tuned V+L models, is it possible to distribute pretrained model weights only for the 5 models in your controlled set-up?

Thanks in advance!

RuntimeError: CUDA error: no kernel image is available for execution on the device

Hi all,

I tried to run the code by using three different setups, but I always get the same error:

Traceback (most recent call last):
File "train_task.py", line 34, in <module>
  from volta.task_utils import LoadDataset, LoadLoss, ForwardModelsTrain, ForwardModelsVal
File "/data/volta/task_utils.py", line 19, in <module>
  from volta.datasets import DatasetMapTrain, DatasetMapEval
File "/data/volta/datasets/__init__.py", line 23, in <module>
  from .SVO_Probes_dataset import SVO_ProbesClassificationDataset
File "/data/volta/datasets/SVO_Probes_dataset.py", line 20, in <module>
  p = Pipeline(lang='english', gpu = True, cache_dir = './cache')
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/trankit/pipeline.py", line 85, in __init__
  self._embedding_layers.half()
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 757, in half
  return self._apply(lambda t: t.half() if t.is_floating_point() else t)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
  module._apply(fn)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
  module._apply(fn)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
  module._apply(fn)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in _apply
  param_applied = fn(param)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 757, in <lambda>
  return self._apply(lambda t: t.half() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I followed the Repository setup steps in the README file (plus, after the setup I also needed to install nltk through anaconda and trankit through pip). These setups were:

Run it on a VM with Ubuntu 22.04, NVIDIA RTX 3090, CUDA 12.1 and NVIDIA driver version 530.30.02
Run it on the same virtual machine, but inside a Docker container nvidia/cuda:10.1-devel-ubuntu18.04
Run it on the same virtual machine, but inside a Docker container pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel

These are the exact commands I executed on both the VM and inside the Docker containers:

conda create -n volta python=3.6
conda activate volta
pip install -r requirements.txt
conda install pytorch=1.4.0 torchvision cudatoolkit=10.1 -c pytorch  #Remove torchvision version as 0.5 is not available

apt install git
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./

cd ..
cd tools/refer; make

cd ../..
python setup.py develop

conda install nltk
pip install trankit

Then, I ran the following command and after that I received the error above:

python train_task.py \
        --config_file config/vilbert_base.json --from_pretrained ctrl_vilbert.bin \
        --tasks_config_file config_tasks/vilbert_tasks.yml --task 20 \
        --adam_epsilon 1e-6 --adam_betas 0.9 0.999 --weight_decay 0.01 --warmup_proportion 0.1 --clip_grad_norm 0.0 \
        --logdir logs/SVO-Probes/ --task_specific_tokens

Any suggestions?
Thanks!

Missing file and unable to load pretrained model

While trying to get the code running, I faced 2 issues:

Unable to find datasets/refcoco+_unc/annotations/cache/refcoco+_val_20_36.pkl The link mentioned didn't have the cache repo and currently I'm trying to run it using an empty file
When I'm trying to load pytorch_model_9.bin, it is expecting the pretrained models present in the dictionary in volta/encoders.py for eg bert-base-encased, roberta, etc
Please help @elliottd @e-bug

Paths to Test checkpoints

Hi
Could you please share the paths to test checkpoints for different models. @e-bug

When will the code be available?

visual location encoding in UNITER

Hi,

I notice that you simply project the object location here https://github.com/e-bug/volta/blob/main/volta/embeddings.py#L495 and that you set the object location dimension to 5 there https://github.com/e-bug/volta/blob/main/config/ctrl_uniter_base.json#L16

How exactly do you represent the location of the object? Chen et al. say they use a 7 dimensional vector: [x_1, y_1, x_2, y_2, w, h, w ∗ h] (normalized top/left/bottom/right coordinates, width, height, and area.) They hard-code it: https://github.com/ChenRocks/UNITER/blob/master/model/model.py#L254

Bests,

Paul

Experienced system hanging?

Hi!

This may be more of asking if there's similar experience than really throwing an issue.

I've been experiencing system hanging (not sure from GPU, dataloader, or any other) while finetuning a pre-trained model on, e.g. NLVR2.
It usually goes like,
(1) hangs at the beginning of the first epoch and the first iteration, which never proceeds.
(2) hangs at the iteration n, where n is some multiple of number of workers set in the starting script, and it never proceeds.

When it hangs, CPU / GPU utilization is down to zero, the system seems doing nothing.
Did you have similar experience? if so, any advice to work around it?
Thanks!

h5 files

"NB: I have noticed that uploading LMDB files made their size grow to the order of TBs. So, instead, I recently uploaded the H5 versions that can quickly be converted to LMDB locally using this script."

Can I check where can I find the h5 files?

Miscalculated GQA score?

Hi,
First - thank you for this great work and repo, it is extremely helpful!

I trained a model on GQA, and it looks like there's a mistake in the calculation of GQA_score:
--truth_file is testdev_balanced_questions.json (as used in test.sh) where each entry has "answer" (of type string) as the truth label, and the accuracy check is whether the prediction is contained in the truth label.

volta/scripts/GQA_score.py

Line 12 in 0d194f1

if pred in label:

This means that for a truth label of "woman", a prediction of either "man" or "woman" would get a full score.

According to the official GQA evaluation script, the accuracy check should be

if pred == label:

After making this change I got a score lower by 2.69 points.

e-bug / volta Goto Github PK

volta's People

Contributors

Stargazers

Watchers

Forkers

volta's Issues

Recommend Projects

Recommend Topics

Recommend Org