Giter Club home page Giter Club logo

vast's People

Contributors

lihanddd avatar txh-mercury avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vast's Issues

Error about finetune_qa_msvd task (Miss key 'desc' or 'caption' in descs_qa_trainval.json)

03/26/2024 19:23:18 - INFO - main - load_from_pretrained: ./output/vast/pretrain_vast/ckpt/model_step_204994.pt
03/26/2024 19:23:18 - INFO - main - Load from pretrained dir ./output/vast/pretrain_vast
03/26/2024 19:23:19 - INFO - main - Unexpected keys ['vision_encoder.text.logit_scale']
03/26/2024 19:23:19 - INFO - main - missing_keys ['vision_encoder.logit_scale']
03/26/2024 19:23:20 - INFO - main - ==================learning_rate_settings==================

03/26/2024 19:23:20 - INFO - main - basic_lr : 1e-05
03/26/2024 19:23:20 - INFO - main - clip_lr_visual : 5e-07
03/26/2024 19:23:20 - INFO - main - clip_lr_visual_len : 245
03/26/2024 19:23:20 - INFO - main - new_lr : 0
03/26/2024 19:23:20 - INFO - main - new_params_name: []
0%| | 0/5670 [00:00<?, ?it/s]Traceback (most recent call last):
File "/mnt/workspace/Project/VideoLargeModel/VAST/./run.py", line 63, in
main()
File "/mnt/workspace/Project/VideoLargeModel/VAST/./run.py", line 46, in main
train(model, optimizer, train_loader, val_loaders, args.run_cfg, start_step = start_step, verbose_time=False)
File "/mnt/workspace/Project/VideoLargeModel/VAST/utils/pipeline.py", line 35, in train
for step, (name, batch) in enumerate(train_loader):
File "/mnt/workspace/Project/VideoLargeModel/VAST/data/loader.py", line 101, in iter
self.preload(loader_it)
File "/mnt/workspace/Project/VideoLargeModel/VAST/data/loader.py", line 112, in preload
self.batch = next(it)
File "/mnt/workspace/Project/VideoLargeModel/VAST/data/loader.py", line 48, in iter
batch = next(iter_)
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/workspace/Project/VideoLargeModel/VAST/data/IndexAnno.py", line 69, in getitem
raw_captions = anno['desc'] if 'desc' in anno else anno['caption']
KeyError: 'caption'

I am trying the VQA task on MSVD-QA dataset.
I use the "python3 -m torch.distributed.launch
--nnodes 1
--node_rank 0
--nproc_per_node 4
--master_port 9834
./run.py
--learning_rate 1e-5
--checkpointing true
--first_eval false
--config ./config/vast/finetune_cfg/VQA-msvd.json
--pretrain_dir $output_dir
--save_best true
--output_dir $output_dir/downstream/VQA-msvd " command line and meet above error.

I notice the AnnoIndexedDataset(Dataset) require 'desc' or 'caption' in anno, but the msvd/descs_cap_train.json do not have these info. I want to ask how to fix thie error. Thank you.

How did you get the audio for "datasets/srcdata/msrvtt/audios"?

The original msrvtt folder structure is the below.

msrvtt
├── annotation
│ ├── MSR_VTT.json
├── high-quality
│ ├── structured-symlinks
│ │ ├── jsfusion_val_caption_idx.pkl
│ │ ├── ... many other files....
├── structured-symlinks
│ ├── jsfusion_val_caption_idx.pkl
│ ├── ... many other files....
├── videos
│ ├── all
│ │ ├── video1.mp4
│ │ ├── ....
│ │ ├── video9999.mp4
│ ├── tmp
│ │ ├──MSRVTT.zip
│ ├── vids
│ │ ├──data
│ │ │ ├── MSRVTT.zip

However, there is no audios for msrvtt.

  1. How did you get the audio?
    Is there specific way to extract the audio for example, bitrate, sample rate, audio channel, type of codec.
    Any kind of audio file is valid?

  2. "datasets/src/data/msrvtt/videos" == "msrvtt/videos/all" ?

"/data/IndexAnno.py", "VQA-msrvtt.json", and "descs_qa_trainval.json"

Hi. I am trying to finetune the MSRVTT-QA.

However, it has an error, I can modify the grammar to get rid of the error but I am not sure that I understand right.

line 68 of "/data/IndexAnno.py"
raw_captions = anno['desc'] if 'desc' in anno else anno['caption']
it returns error for the case of MSRVTT-QA.

Simply, because 'descs_qa_trainval.json' does not contain 'desc' nor 'caption'.

{"video_id": "video0", "question": "who drives down the road in an audi?", "answer": "man", "subtitle": ""}, {"video_id": "video0", "question": "what is a man doing?", "answer": "show", "subtitle": ""}, {"video_id": "video0", "question": "what is a man silently narrates his experience doing?", "answer": "drive", "subtitle": ""}, {"video_id": "video0", "question": "what is a person doing?", "answer": "drive", "subtitle": ""}, {"video_id": "video0", "question": "what is a person doing?", "answer": "tell", "subtitle": ""}, {"video_id": "video0", "question": "what is guy doing?", "answer": "drive", "subtitle": ""}, {"video_id": "video0", "question": "what is man doing?", "answer": "talk", "subtitle": ""}, {"video_id": "video0", "question": "what is the man doing?", "answer": "drive", "subtitle": ""}, {"video_id": "video0", "question": "what is a man doing?", "answer": "drive", "subtitle": ""}, {"video_id": "video0", "question": "what is shown?", "answer": "car", "subtitle": ""}, {"video_id": "video0", "question": "what is dancing?", "answer": "group", "subtitle": ""}, {"video_id": "video0", "question": "who is driving?", "answer": "man", "subtitle": ""}, {"video_id": "video0", "question": "what is a man driving?", "answer": "car", "subtitle": ""}

Can I substitute the 'subtitle' for 'desc'/'caption' for the "/data/indexAnno.py" line 68?
but not sure as many 'subtitle' is empty.

The overall pipeline implementations of caption generation for VAST-27M

Thank you for your great contributions!

As described above, I notice that only trained video and audio captioners are provided in this repo.
Would the authors open the implementation process for the LLM part and the overall scripts for the caption generation?

Any reply will be sincerely appreciated.
Best regards,

Code release

Hello
When are you planning to release your code?

Inference code

Hello.
Thanks for awesome work and sharing the code.

Can you please share the inference/demo code?
Thanks

Code Release Please

Hello! Waiting for your code for so long, when are you planning to release it ?

Memory usuage during validation

Hi, When validation set size is large, GPU memory usage is much more than required during training even with small batch size. Can you please suggest where is the issue?
Thanks

Error while captioning using single processor

hi I got the following error while trying to run the code on a set of videos
Traceback (most recent call last):
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/./run.py", line 65, in
main()
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/./run.py", line 58, in main
test(model, val_loaders, args.run_cfg)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/utils/pipeline.py", line 156, in test
eval_log = evaluate_fn(model, test_loader, run_cfg, global_step=0)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/evaluation/evaluation_mm.py", line 25, in evaluate_mm
val_log = evaluate_single(model, loader, task.split('--')[0], run_cfg, global_step,task.split('--')[1])
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/MIST/envs/vast/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/evaluation/evaluation_mm.py", line 46, in evaluate_single
cap_dict = evaluate_cap(model, task, val_loader, run_cfg, global_step, dset_name)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/MIST/envs/vast/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/evaluation/evaluation_mm.py", line 130, in evaluate_cap
for batch in eval_loader:
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/data/loader.py", line 103, in iter
self.preload(loader_it)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/data/loader.py", line 116, in preload
self.batch = next(it)
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/MIST/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/MIST/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/MIST/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/MIST/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/administrator/1b402ff7-f596-4523-a6fd-4ccdd4432680/ego4d/VAST/data/IndexAnno.py", line 68, in getitem
raw_captions = anno['desc'] if 'desc' in anno else anno['caption']
KeyError: 'caption'

On detailed verification of the code I go to know that the batch is not getting created by the data loader while running it on single gpu mode.Basically it is getting one loader in prefetched loader and not going futher . Can anyone please help me in solving this issue?

License?

In the code I can find several licenses (Apache, BSD-3, MIT, ...)
Where/What is the license of this repository?
Cheers

Nice work!

It's a nice work. when will the code be released?

Problem running finetuning on TGIF

I get the following error when trying to finetune on TGIF:

/github/workspace/src/video/video_reader.cc:270: [/scratch-shared/scur1914/gifs/tumblr_nqjzxszVxD1uz6id5o1_500.gif] Failed to measure duration/frame-count due to broken metadata.[23:11:27] /github/workspace/src/video/video_reader.cc:270: [/scratch-shared/scur1914/gifs/tumblr_nqjzxszVxD1uz6id5o1_500.gif] Failed to measure duration/frame-count due to broken metadata.

Should I transform the gifs to frames?
The config file for TGIF has the vision format set to video_rawvideo. I added the following to vision_mapper.py at line 138:

if not os.path.exists(video_path): video_path = video_path.replace('.mkv', '.gif')

Missing config files for pretrain

Hi, Thanks for the great work. In the pretrain_vast.json, the settings for "run_cfg" and "model_cfg" are respectively set to "./config/default_run_cfg.json" and "./config/newvlp/default_model_cfg.json". However, I did not find these two files in the folder ./config. Are they respectively same with "./config/vast/default_run_cfg.json" and "./config/vast/default_model_cfg.json"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.