antoyang / just-ask Goto Github PK

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Home Page: https://arxiv.org/abs/2012.00451

License: Apache License 2.0

Python 17.11% Shell 0.04% HTML 2.79% Jupyter Notebook 80.06%

vqa visual-question-answering videoqa video-question-answering video-understanding question-generation weakly-supervised-learning vision-and-language pre-training multimodal-learning

just-ask's Issues

RecursionError: maximum recursion depth exceeded in instancecheck

gshell download --with-id 1bMfT9WjBiNWgfdVl2dej4mUaXvICGGRH --recursive

hi when i run above code, i just got this error. do you know what happer here?

Qeury about iVQA features

Helloo,
Thank you for your great work! I wanted to double-check if this file contains the features for the iVQA dataset? I am attempting to fine-tune the cross-modal trained FrozenBiLM on iVQA and when I try to load the features, there appears to be a corruption issue. Could you please let me know if I am processing the data correctly?

Overfitting in finetuning

Hello,
I am trying to use your pretrained model and reproduce the results on MSVD-QA. I'm following the same hyperparameters you mentioned in the paper and use the ckpt_pt_howtovqa69m file to initiate the model. However, I observed an overfitting starting from the early epochs (I obtained 73.97% accuracy on the training set and 41.79% on the validation set). I have also tried to use the fine-tuned model on MSVD-QA to see what happens if I retrain it on the same dataset and I obtained a decrease in the performances (I obtained 30% after 20 epochs then it saturates)!
I tried to search for your loss and accuracy curves but could not find them. Would it be possible to share them here? Did you obtain similar results and if so do you know the origin of this problem? Thank you for your response.

Download through quick start

Hi, thank you for sharing the wonderful work.
I met a issue while run your repo. While I use the .sh files in download folders, I cannot download it through gdrive
The error report is as follow:
Failed getting oauth client: Failed to exchange auth code for token: Post https://accounts.google.com/o/oauth2/token: dial tcp 142.251.43.13:443: i/o timeout (command: bash download/download_checkpoints.sh)
Hope to see your reply, thank you and wish you a good luck

Some results of the VQA-T model are not reproducible

Hi, I ran the VQA-T model from scratch using the command

python main_videoqa.py --checkpoint_dir=ft<dataset> --dataset=<dataset> --lr=0.00001 \ 
--pretrain_path=<CKPT_PATH>

On MSRVTT-QA, MSVD-QA, Anet-QA, How2QA and iVQA, I got the following results: 40.2, 41.5, 33.8, 71.4 and 15.7, while the paper showed 39.6, 41.2, 36.8, 80.8, 23.0. Do the settings of Anet-QA, How2QA and iVQA use some different hyperparameters?

Need verification code to download checkpoints and preprocessed features.

Thanks a lot for sharing great work!

I've met a problem when downloading pretrained checkpoints, pre-processed data and features via below commands:

bash download/download_checkpoints.sh <DEFAULT_CKPT_DIR>
bash download/download_downstream.sh <DEFAULT_DATASET_DIR

It requires a verification code and I cannot access to it.
Can you share them another way?

Thank you again for your sharing.

Pre-training / fine tuning

Is it possible to use the tool for our own videos and dataset? If yes, in addition to videos, what features are required for pre-training or fine tuning?
I assume from your readme that : How to 100M feature extractor with mixture of expert
based on this repository, the features are extracted/exported in addition to the spoken speech to text transcript? Or correct me if I am wrong.

Because, I want to test this system with my own videos to see how much it can handle the explanation of the videos and how can I train for my own videos.

Please guide.

Question about the preprocess of LSMDC-FiB dataset

Hi, I have read your paper "FrozenBiLM". I have several question about the preprocess of LSMDC-FiB dataset. Since I noticed that there are some blanks only contains a part of the word. For example "I went to the place w___e I live." The answer would be "her". Therefore, there exists a problem that the semantic meaning of the question have been destroyed. I am wondering how do you treat these type of questions?

Thanks.

testing

Hi,
After finetuning on downstream VideoQA datasets, how does the model run in the test dataset?
I'm a little confused about this point.
Thanks

Google drive verification code

Hi,

I am running the script on a remote cluster which has no gshell cmodule. Instead I am using gdrive. I am wondering could you send me the verification code for downloading the ckpts and data from the google drive address?

email: [email protected]

VQA-T pretrained on HowTo100M

Hi, thanks for the interesting work. Can you please provide the VQA-T checkpoint pretrained on HowTo100M?

antoyang / just-ask Goto Github PK

just-ask's Issues

RecursionError: maximum recursion depth exceeded in instancecheck

Qeury about iVQA features

Overfitting in finetuning

Download through quick start

Some results of the VQA-T model are not reproducible

Need verification code to download checkpoints and preprocessed features.

Pre-training / fine tuning

Question about the preprocess of LSMDC-FiB dataset

testing

Google drive verification code

VQA-T pretrained on HowTo100M

hi where can i find ivqa dataset （video）？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent