Giter Club home page Giter Club logo

antoyang / just-ask Goto Github PK

View Code? Open in Web Editor NEW
114.0 5.0 15.0 954 KB

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Home Page: https://arxiv.org/abs/2012.00451

License: Apache License 2.0

Python 17.11% Shell 0.04% HTML 2.79% Jupyter Notebook 80.06%
vqa visual-question-answering videoqa video-question-answering video-understanding question-generation weakly-supervised-learning vision-and-language pre-training multimodal-learning

just-ask's Issues

Qeury about iVQA features

Helloo,
Thank you for your great work! I wanted to double-check if this file contains the features for the iVQA dataset? I am attempting to fine-tune the cross-modal trained FrozenBiLM on iVQA and when I try to load the features, there appears to be a corruption issue. Could you please let me know if I am processing the data correctly?

Overfitting in finetuning

Hello,
I am trying to use your pretrained model and reproduce the results on MSVD-QA. I'm following the same hyperparameters you mentioned in the paper and use the ckpt_pt_howtovqa69m file to initiate the model. However, I observed an overfitting starting from the early epochs (I obtained 73.97% accuracy on the training set and 41.79% on the validation set). I have also tried to use the fine-tuned model on MSVD-QA to see what happens if I retrain it on the same dataset and I obtained a decrease in the performances (I obtained 30% after 20 epochs then it saturates)!
I tried to search for your loss and accuracy curves but could not find them. Would it be possible to share them here? Did you obtain similar results and if so do you know the origin of this problem? Thank you for your response.

Download through quick start

Hi, thank you for sharing the wonderful work.
I met a issue while run your repo. While I use the .sh files in download folders, I cannot download it through gdrive
The error report is as follow:
Failed getting oauth client: Failed to exchange auth code for token: Post https://accounts.google.com/o/oauth2/token: dial tcp 142.251.43.13:443: i/o timeout (command: bash download/download_checkpoints.sh)
Hope to see your reply, thank you and wish you a good luck

Some results of the VQA-T model are not reproducible

Hi, I ran the VQA-T model from scratch using the command

python main_videoqa.py --checkpoint_dir=ft<dataset> --dataset=<dataset> --lr=0.00001 \ 
--pretrain_path=<CKPT_PATH>

On MSRVTT-QA, MSVD-QA, Anet-QA, How2QA and iVQA, I got the following results: 40.2, 41.5, 33.8, 71.4 and 15.7, while the paper showed 39.6, 41.2, 36.8, 80.8, 23.0. Do the settings of Anet-QA, How2QA and iVQA use some different hyperparameters?

Need verification code to download checkpoints and preprocessed features.

Thanks a lot for sharing great work!

I've met a problem when downloading pretrained checkpoints, pre-processed data and features via below commands:

bash download/download_checkpoints.sh <DEFAULT_CKPT_DIR>
bash download/download_downstream.sh <DEFAULT_DATASET_DIR

It requires a verification code and I cannot access to it.
Can you share them another way?

Thank you again for your sharing.

Pre-training / fine tuning

Is it possible to use the tool for our own videos and dataset? If yes, in addition to videos, what features are required for pre-training or fine tuning?
I assume from your readme that : How to 100M feature extractor with mixture of expert
based on this repository, the features are extracted/exported in addition to the spoken speech to text transcript? Or correct me if I am wrong.

Because, I want to test this system with my own videos to see how much it can handle the explanation of the videos and how can I train for my own videos.

Please guide.

Question about the preprocess of LSMDC-FiB dataset

Hi, I have read your paper "FrozenBiLM". I have several question about the preprocess of LSMDC-FiB dataset. Since I noticed that there are some blanks only contains a part of the word. For example "I went to the place w___e I live." The answer would be "her". Therefore, there exists a problem that the semantic meaning of the question have been destroyed. I am wondering how do you treat these type of questions?

Thanks.

testing

Hi,
After finetuning on downstream VideoQA datasets, how does the model run in the test dataset?
I'm a little confused about this point.
Thanks

Google drive verification code

Hi,

I am running the script on a remote cluster which has no gshell cmodule. Instead I am using gdrive. I am wondering could you send me the verification code for downloading the ckpts and data from the google drive address?

email: [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.