yehli / xmodaler Goto Github PK

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

License: Other

Python 99.19% Shell 0.81%

image-captioning video-captioning vision-and-language pretraining cross-modal-retrieval visual-question-answering tden

xmodaler's People

Contributors

Stargazers

Watchers

Forkers

zhangzheming33 zhinchenfd grainw jinntao penguin0324 hotlion1987 lidc54 benguang brendaben lasola0 mlsysops leeflora chenjun0210 wenbinlee kii-chan-iine remasterd hadryan vinylow shaunstanislauslau allenwutao cclauss wangtao2668129173 angrymeow ncnnnnn congvmit lovebear2008 a1ch3mi minisoco panda-peter pd666214 ever-k tang-juan trendingtechnology cqray1990 qugou1350636 0x8235 tufo830 ai2047 fskeo obsidian6s arthurth0mas luluchou w90o0u s8xy breaklien hanchenchen zoucan520 rainymoo qianrenjian hust-wayne cv-ip charleoy aaronhd runningleon neudeep young499 brucew91 harry-zhou ske159 mtcai kevinjunwei tanghaoyu258 lihuikenny wzb1005 yanqi1811 liangzongchang haojiepan1 xl2248 westmemoery lab811 wenqiang-china leeyn-43 xk-wang namnaku87 silencelzx hcwei13 tf369 liuwuhomepage codewithflycat cjj2923 mymuli ltp1995 gjc0824 yqgao716 wangyxxjtu delphboy haohuynh0301 cocoakeith shunlinlu hhhh17 zyong812 adeljalalyousif verigle amangupta2303 george-han amazingyx againeureka faiail rafaelpezzuto hufeihu

xmodaler's Issues

关于测试代码

你所开源的库写的太好了，但是我有一个小问题就是你这里没有提供测试的代码吗

Flickr 30k feature

Dear author,

I would like to know how to get the features needed for training retrieval model on Flickr 30k? I have downloaded the mdb features from here, but how to convert them to npy/npz files?

Thanks!

Question about losses

These days I'm trying to use multiple losses to train the model, and I intend to set weights to the losses, but I didn't find where to set weights. It seems that your code could only add them directly without weighting, thus I really hope you can add this tiny function into your code.

BTW, there's a tiny mistake in DefaultTrainer class:

xmodaler/engine/defaults.py, line 523
should be:
self._write_metrics(losses_dict, data_time)
instead of 'loss_dict'

OSError: [Errno 12] Cannot allocate memory

when i run up-down config，the processing is well
but when i run xlan config，there are some errors in the procssing of training.
I change num_workers=0, this error still exsit.
thanks for your reply
Error###################################
eta: 3:10:12 iter: 800 total_loss: 2.673 time: 1.4086 data_time: 0.4650 lr: 2.5031e-05 max_mem: 2680M
Traceback (most recent call last):
File "train_net.py", line 71, in
args=(args,),
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/launch.py", line 83, in launch
daemon=False,
File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/launch.py", line 129, in _distributed_worker
main_func(*args)
File "/data1/wlx/project2021/xmodaler-master/train_net.py", line 59, in main
return trainer.train()
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/defaults.py", line 365, in train
super().train(self.start_iter, self.max_iter)
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/train_loop.py", line 152, in train
self.after_step()
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/train_loop.py", line 182, in after_step
h.after_step()
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/hooks.py", line 407, in after_step
self._do_eval(epoch)
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/hooks.py", line 372, in _do_eval
results = self._func(epoch)
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/defaults.py", line 324, in test_and_save_results
eval_results = self.test(self.cfg, self.model, self.test_data_loader, self.test_evaluator, epoch)
File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/defaults.py", line 478, in test
eval_res = evaluator.eval(results, epoch)
File "/data1/wlx/project2021/xmodaler-master/xmodaler/evaluation/coco_evaler.py", line 44, in eval
cocoEval.evaluate()
File "/data1/wlx/project2021/xmodaler-master/cococaption/pycocoevalcap/eval.py", line 38, in evaluate
gts = tokenizer.tokenize(gts)
File "/data1/wlx/project2021/xmodaler-master/cococaption/pycocoevalcap/tokenizer/ptbtokenizer.py", line 55, in tokenize
stdout=subprocess.PIPE)
File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/subprocess.py", line 1482, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

Predict on a given raw input

Dear contributors, is there a simple way I can directly predict results on my customized input? For example, in the video captioning, I want to generate the caption for my own video, which is in MP4 format. Is there a simple way to do it? Thanks.

Report errors when doing reinforcement learning

FileNotFoundError: [Errno 2] No such file or directory: '../open_source_dataset/mscoco_dataset/mscoco_train_cider.pkl'

Some questions about training on custom MSR-VTT dataset

Hi,

Thank you very much for your great code. May I ask how to extract the features of videos (the .npy files under msrvtt_torch/resnet152) for our extended MSR-VTT dataset ?

Thank you very much!

About the extracted features

Hi,

Thank you very much for your great code. May I ask if you would provide the extracted and pre-processed features (the .npz files) for datasets such as COCO, VCR, MSVTT?

Thank you very much!

I want to calculate the flops of the model. What's a good way？

Folder of MSVD dataset features in google drive is empty

Thank you for sharing this great work
The folder of MSVD dataset features in google drive is empty for these links:
(https://drive.google.com/drive/folders/1vx9n7tAIt8su0y_3tsPJGvMPBMm8JLCZ?usp=sharing)
(https://drive.google.com/drive/folders/1-jvt6aKMDmhZC03DPEpwwgYxeL4PSD5J)
I need the script file for extracting the MSVD features to extract these features for videos by myself, can you send it to me please ([email protected])

How to match video id and video name in provided MSVD data ?

I found that the annotation JSON file for training is missing in provided data.
How can I match the npy file in training to a particular video name?
Could you please give me a matching dictionary?

Some errors caused by randomness in training

Hello, I use my own data for image caption training in 4 GPUS num_workers=0. We found that most of the time, training and verification tests are normal.

In rare cases, the following errors will be encounteredcWe suspect if some erros caused by the distribution makes the scoring process tags mismatch.

Hope to get your reply.

Thanks so much.

Evaluation

Hi
I am using pretrained lstmA3 model. After loading the model successfully and obtaining the dataloader, i am trying to evaluate the model on MSCOCO. I have followed the steps as mentioned in the documentation. However there seems to be some problem with it. Here is the screenshot of the same -
from xmodaler.evaluation.coco_evaler import COCOEvaler evaluator = COCOEvaler(cfg, annfile="/content/captions_val2014.json", output_dir="/content/outputs") print(type(dataloader)) model.eval() with torch.no_grad(): for batch in dataloader: print(batch[0].keys()) print(batch[0]["ATTRIBUTE"].shape) output = model(batch) break

I am able to obtain the dataloader from build_xmodaler_valtest_loader. A batch of this loader contains list of dictionaries.

The features of the provided msvd dataset are incomplete

MSVD dataset contains 1970 video clips, which is split into 1200(train set), 100(val set), 670(test set).
While in what you provided in google drive, the number of video features and annnotations is only 1200 + 97 + 670, which misses 3 samples.

Moreover, the annotation of train set is not provided, and we cannot get the map of original video id to new video id, such as map -4wsuPCjDBc_5_15 to '0'

Some question about lr and warm up

Hello, after I accidentally interrupted the training, I used --resume to resume the training, but I found that the learning rate has not been restored. How can I correct such an error.
In addition, which parameters control the end time of warm up?
Thanks a lot！

Some questions about your CVPR 2022 paper

Dear author,
I have two questions about your latest CVPR2022 paper (Comprehending and Ordering Semantics for Image Captioning).

1. What do you means about "slot" in the figure 2? And how did it works?
2. How can i get "Training Sentences" in Figure 2? wheather it is your own dataset or COCO dataset? Could you show me your
preprocess code?

在CosNet上使用自己的数据集

您好！非常感谢您的工作。在使用新数据集在cosnet上训练时，产生了一些问题，向您请教一下：
1.feature中使用的CLIP_RN101_49是怎么得到的？我可以自己使用CLIP的编解码器进行处理么？
2.cosnet中的pkl文件里，'attr_pred' 'attr_labels' 'missing_labels'分别是什么意思？您是否有计划上传预处理的代码？
非常感谢！

G_TOKENS_IDS

Hi,

I am trying to create a dataloader for a custom dataset and was able to build features using the bottum-up attention model. My current assumption is that G_TARGET_IDS are the token ids for the target text; however, I am unsure what G_TOKENS_IDS should be.

How to preprocess the annotations of given raw video

Dear author, i want to train your released model on other captioning datasets, but now i only have the captions and video_names of given raw videos, so how to generate the following processed json and pickle files (i.e. captions_val.json, msrvtt_caption_anno_train.pkl). Could you provide the official preprocssing codes?

TypeError: list indices must be integers or slices, not numpy.str_

When training with cosnet_rl.yaml, this error occurs:
......
(predictor): BasePredictor(
(logits): Linear(in_features=512, out_features=10200, bias=True)
(dropout): Dropout(p=0.5, inplace=False)
)
(greedy_decoder): GreedyDecoder()
(beam_searcher): BeamSearcher()
)
[09/08 16:48:14] xl.datasets.common INFO: Serializing 113287 elements to byte tensors and concatenating them all ...
[09/08 16:48:23] xl.datasets.common INFO: Serialized dataset takes 162.52 MiB
[09/08 16:48:23] xl.datasets.common INFO: Serializing 5000 elements to byte tensors and concatenating them all ...
[09/08 16:48:23] xl.datasets.common INFO: Serialized dataset takes 2.68 MiB
[09/08 16:48:23] xl.datasets.common INFO: Serializing 5000 elements to byte tensors and concatenating them all ...
[09/08 16:48:23] xl.datasets.common INFO: Serialized dataset takes 2.68 MiB
[09/08 16:48:39] fvcore.common.checkpoint INFO: [Checkpointer] Loading from /home/stormai/userfile/caoshan/code/xmodaler/ModelResult/xe/cosnet_xe.pth ...
[09/08 16:48:44] xl.engine.train_loop INFO: Starting training from iteration 0
[09/08 16:48:45] xl.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/engine/train_loop.py", line 151, in train
self.run_step()
File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/engine/rl_mean_trainer.py", line 47, in run_step
bs_rewards = self.scorer(bs_outputs_dict)
File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/scorer/base_scorer.py", line 66, in call
gts = [self.gts[i] for i in ids]
File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/scorer/base_scorer.py", line 66, in
gts = [self.gts[i] for i in ids]
TypeError: list indices must be integers or slices, not numpy.str_

I try to fix this by forcing ids and gts to convert to list, but it doesn't work. Could you pls help me ?

top k sentences

Hi, This is some really great work.
Is there a way to generate multiple sentence outputs (top k results) for a given video clip for MSR-VTT models? Thanks in advance

During training the net, i got this RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I have used the train_net.py, but some problems appeared,
first i got the 'line 68, in forward wt.index_copy(0, ind, torch.multinomial(prob_prev, 1).view(-1).index_select(0, ind))
RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)' ,but when i cleared the memory and re-run the train_net_py，it could work successfully until the 10 epoch
In 10th epoch,i got this:
File "D:\llk\xmodaler-master\xmodaler\modeling\meta_arch\rnn_att_enc_dec.py", line 68, in forward
wt.index_copy(0, ind, torch.multinomial(prob_prev, 1).view(-1).index_select(0, ind)) #####bug
RuntimeError: probability tensor contains either inf, nan or element < 0
Can somebody help me??😥

Exception: process 0 terminated with signal SIGSEGV

hi
I try to train COCO caption for updown.yaml on 2080T
I meet Exception: process 0 terminated with signal SIGSEGV
thanks so much

How to preprocess the annotations of given raw video

Some question about relation_file, attribute_file and gv_feat_file about COCO

Thank you for your contribution. I read the datacode about COCO carefully. I have a little confused.
As for ext_data, how can I get these data ? Looking forward to your reply.
ext_data = {
"relation": _load_pkl_file(self.relation_file),
"attribute": _load_pkl_file(self.attribute_file),
"gv_feat": _load_pkl_file(self.gv_feat_file)
}

mscoco_train_cider.pkl

Thank you for the awesome code for the multi-modal!
If I want to train my own data, how can I generate the mscoco_train_cider.pkl and the mscoco_train_gts.pkl?
Lokk forward th hearing from you soon.

MSRVTT input json format

Dear Author

Thanks for your great work! I have a question about this file
https://github.com/YehLi/xmodaler/blob/master/tools/msrvtt_preprocess.py.

I follow the Line4 of the above file to build the JSON file, but encounter the following error:

parsed input parameters:
{
"input_json": "./ourds_xmodaler/ourds_description_xm.json",
"output_dir": "./ourds_xmodaler/preprocessed_files/",
"max_length": 40,
"word_count_threshold": 5
}
Traceback (most recent call last):
File "/Users/jackwu/PycharmProjects/nba_gametracker_scraper/msrvtt_preprocess.py", line 264, in
main(params)
File "/Users/jackwu/PycharmProjects/nba_gametracker_scraper/msrvtt_preprocess.py", line 224, in main
sentences, videos, vid2split = load_list(params['input_json'])
File "/Users/jackwu/PycharmProjects/nba_gametracker_scraper/msrvtt_preprocess.py", line 212, in load_list
data = json.load(open(f, 'r'))
IsADirectoryError: [Errno 21] Is a directory: '.'

Could you please specify the format for constructing the JSON file for a new dataset?

Thanks!!!

How to preprocess the annotations of given raw video

[dataset] msvd pre-process error !

Dear authors,
I used the script tools/msvd_preprocess.py to extract json ,but this error occurred !
the json is downloaded from this link
https://drive.google.com/drive/folders/1-jvt6aKMDmhZC03DPEpwwgYxeL4PSD5J
so , it's strange that there's no "sentences" key !
Could you pls help me ?

Some doubts regarding executing the scripts

I was following the instructions on this page regarding how to use the builtin mscoco dataset for image captioning. I have doubts on the following points:

In which directory should i have my images and annotations?
I was trying to run the tools/create_feats.py script for converting the karpathy_train_resnet101_faster_rcnn_genome.tsv.0 into .npz format. however I am running out of space on colab pro. What could be the reason for this and is there any fix available?

Request for extracted features for MSVD and MSR-VTT

Dear authors
can you share your extracted features for msvd and msr-vtt, and their extracted settings?
Btw, I also want to extract these features for videos by myself. Therefore, it would be better if you can share the extracted scripts in this repo!
Thanks!

How to extract a global video feature based on butd?

I notice that butd output a 'npz' file corresponding to a single image.
When i want to extract video caption based on xmodaler, it requires a global video feature.

How to extract the final video feature from butd output of multi frames?

In MSRVTT dataset, I attempted to use topN objects which are voted by multi frames that extracted from a video uniformly. But captions of video is in poor quality. The BLUE of evaluate and test set only up to 0.6 and many <UNK> in captions.

No such file or directory: '../open_source_dataset/mscoco_dataset/features/up_down\\313724.npz'

python train_net.py --num-gpus 1 --config-file configs/image_caption/updown/updown.yaml


SCORER:
  CIDER_CACHED: ../open_source_dataset/mscoco_dataset/mscoco_train_cider.pkl
  EOS_ID: 0
  GT_PATH: ../open_source_dataset/mscoco_dataset/mscoco_train_gts.pkl
  NAME: BaseScorer
  TYPES: ['Cider']
  WEIGHTS: [1.0]
SEED: -1
SOLVER:
  ALPHA: 0.99
  AMSGRAD: False
  BASE_LR: 0.0005
  BETAS: [0.9, 0.999]
  BIAS_LR_FACTOR: 1.0
  CENTERED: False
  CHECKPOINT_PERIOD: 1
  DAMPENING: 0.0
  EPOCH: 30
  EPS: 1e-08
  EVAL_PERIOD: 1
  GRAD_CLIP: 0.1
  GRAD_CLIP_TYPE: value
  INITIAL_ACCUMULATOR_VALUE: 0.0
  LR_DECAY: 0.0
  MOMENTUM: 0.9
  NAME: Adam
  NESTEROV: 0.0
  NORM_TYPE: 2.0
  WEIGHT_DECAY: 0.0
  WEIGHT_DECAY_BIAS: 0.0
  WEIGHT_DECAY_NORM: 0.0
  WRITE_PERIOD: 20
VERSION: 1
[09/27 19:21:02 xmodaler]: Full config saved to ./output\config.yaml
[09/27 19:21:02 xl.utils.env]: Using a generated random seed 2862719
[09/27 19:21:04 xl.engine.defaults]: Model:
RnnAttEncoderDecoder(
  (token_embed): TokenBaseEmbedding(
    (embeddings): Embedding(10200, 1024)
    (embeddings_act): ReLU()
    (embeddings_dropout): Dropout(p=0.5, inplace=False)
  )
  (visual_embed): VisualBaseEmbedding(
    (embeddings): Linear(in_features=2048, out_features=1024, bias=True)
    (embeddings_act): ReLU()
    (embeddings_dropout): Dropout(p=0.5, inplace=False)
  )
  (encoder): UpDownEncoder()
  (decoder): UpDownDecoder(
    (lstm1): LSTMCell(3072, 1024)
    (lstm2): LSTMCell(2048, 1024)
    (att): BaseAttention(
      (w_h): Linear(in_features=1024, out_features=512, bias=False)
      (act): Tanh()
      (w_alpha): Linear(in_features=512, out_features=1, bias=False)
      (softmax): Softmax(dim=-1)
    )
    (p_att_feats): Linear(in_features=1024, out_features=512, bias=True)
  )
  (predictor): BasePredictor(
    (logits): Linear(in_features=1024, out_features=10200, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
  )
  (greedy_decoder): GreedyDecoder()
  (beam_searcher): BeamSearcher()
)
[09/27 19:21:05 xl.datasets.common]: Serializing 113287 elements to byte tensors and concatenating them all ...
[09/27 19:21:06 xl.datasets.common]: Serialized dataset takes 115.74 MiB
[09/27 19:21:06 xl.datasets.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[09/27 19:21:06 xl.datasets.common]: Serialized dataset takes 0.17 MiB
[09/27 19:21:06 xl.datasets.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[09/27 19:21:06 xl.datasets.common]: Serialized dataset takes 0.17 MiB
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
loading annotations into memory...
Done (t=0.07s)
creating index...
index created!
[09/27 19:21:16 fvcore.common.checkpoint]: No checkpoint found. Initializing model from scratch
[09/27 19:21:16 xl.engine.train_loop]: Starting training from iteration 0
ERROR [09/27 19:21:16 xl.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "D:\xmodaler\xmodaler\engine\train_loop.py", line 151, in train
    self.run_step()
  File "D:\xmodaler\xmodaler\engine\defaults.py", line 496, in run_step
    data = next(self._train_data_loader_iter)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\dataloader.py", line 517, in __next__
    data = self._next_data()
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\dataloader.py", line 1225, in _process_data
    data.reraise()
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\_utils.py", line 429, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\_utils\worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\xmodaler\xmodaler\datasets\common.py", line 42, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "D:\xmodaler\xmodaler\datasets\images\mscoco.py", line 103, in __call__
    content = read_np(feat_path)
  File "D:\xmodaler\xmodaler\functional\func_io.py", line 22, in read_np
    content = np.load(path)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\numpy\lib\npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '../open_source_dataset/mscoco_dataset/features/up_down\\369199.npz'

[09/27 19:21:16 xl.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[09/27 19:21:16 xl.utils.events]:  iter: 0    lr: N/A  max_mem: 204M
Traceback (most recent call last):
  File "train_net.py", line 68, in <module>
    args=(args,),
  File "D:\xmodaler\xmodaler\engine\launch.py", line 86, in launch
    main_func(*args)
  File "train_net.py", line 56, in main
    return trainer.train()
  File "D:\xmodaler\xmodaler\engine\defaults.py", line 365, in train
    super().train(self.start_iter, self.max_iter)
  File "D:\xmodaler\xmodaler\engine\train_loop.py", line 151, in train
    self.run_step()
  File "D:\xmodaler\xmodaler\engine\defaults.py", line 496, in run_step
    data = next(self._train_data_loader_iter)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\dataloader.py", line 517, in __next__
    data = self._next_data()
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\dataloader.py", line 1225, in _process_data
    data.reraise()
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\_utils.py", line 429, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\_utils\worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\xmodaler\xmodaler\datasets\common.py", line 42, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "D:\xmodaler\xmodaler\datasets\images\mscoco.py", line 103, in __call__
    content = read_np(feat_path)
  File "D:\xmodaler\xmodaler\functional\func_io.py", line 22, in read_np
    content = np.load(path)
  File "C:\ProgramData\Anaconda3\envs\xmodaler\lib\site-packages\numpy\lib\npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '../open_source_dataset/mscoco_dataset/features/up_down\\369199.npz'

Where is trainval_ans2label.pkl and trainval_label2ans.pkl in VQA benchmark?

Using different learning rates for different parameters

Is it possible to set different parameter sets and different learning rates for the optimizer？

Such as :
backbone_params = list(map(id,model.backbone.parameters()))
base_params = filter(lambda x:id(x) not in backbone_params ),model.parameters())
optimizer = optim.Adam([ {'params': base_params },
{'params': model.backbone.parameters(), 'lr': 0.1*cfg.learning_rate}], lr=cfg.learning_rate)

If so, where should the changes be made? And is there any other more direct way to achieve this requirement?

Thanks a lot.
Waiting for your reply~

Download inquiry

Dear author, I want to train the updown model, but I can't find the. NPY file. Where can I download it?

怎么在我自己的数据集上推理？

首先感谢您们的工作，我在尝试使用MSVD的预训练模型权重在我自己准备的数据集上进行视频摘要推理。

我首先使用resnet152提取我自己的数据集的高维特征，然后组装成npy。

之后按照msvd数据集中的caption_test.json，填入我自己数据集中相应的id和file_name，（因为我只要进行推理任务，而且不需要评估，所以annotation为空）。

之后将正确的路径填写到config/video_caption/msvd/base_caption.yaml，但是现在它运行报错，没有1297.npy这个文件。
所以我想提问，为什么我修改了caption_test.json，它还是不读取我的caption_test.json里的数据？

question for usage of image-captioning pretrained model

hi, I wonder how to use the pre-trained model for the image captioning task, could you please give out a more detailed tutorial?

使用自己的数据集产生的问题

你好，我是一个新手，请问在使用自己的数据集时我该如何获取类似于mscoco_caption_anno_train.pkl的文件？在使用train_net.py时似乎想要这样的.pkl文件

Downloading the model

Hi. The instructions to download the same seems a little difficult to understand. What do I do with the .rpm file that gets downloaded?

Image to text search using clip

Hi, dear author, in your latest CVPR2022 paper (Comprehending and Ordering Semantics for Image Captioning), how to retrieve semantically similar sentences for the input image using clip model, can you give some tutorials?
Thanks a lot!

Ensemble Test

Hi, I am using the xmodaler recently, how can I do model ensemble test for image captioning?

VCR / VQA features Inquire

How can I get the following two files？
eg:
../open_source_dataset/VCR/features/up_down
../open_source_dataset/VQA/features/up_down_fix100

Thanks!

Googel drive about models of reinforcement learning lost

Hi, these days I find that the Googel drive about models of image captioning are lost.

Could you check the link about it?

Captions on New Dataset

I'm trying to get captions from a pre-trained model on a new dataset (crisismmd) and was wondering whether I need to extract the image features from the raw images first? It looks as though "kfg" utilizes features and their locations rather than raw images as I saw in "mscoco.py":

Could I use a model like detectron2 to get these features, if that is what is needed? Or what would be the optimal way of doing this? Thank you in advance!

Add UniT: Multimodal Multitask Learning with a Unified Transformer

Paper

Project

Will you add this in near future?

Preprocessing msrvtt dataset

Dear authors, I tried to use this: https://github.com/YehLi/xmodaler/blob/master/tools/msrvtt_preprocess.py to preprocess the msrvtt input json file: videodatainfo_2016.json which is given in your google drive. I expected that the output files are the same as the preprocessed msrvtt files that you put in your goole drive folder: https://drive.google.com/drive/folders/1U1692bJYZ6geqC-Kkn-TR8eBBD8tq93q. But in fact they are different in two ways. First, the output test files I preprocessed: msrvtt_caption_anno_test.pkl and captions_test_cocostyle.json are empty. Second, the size of train files and validate files I preprocessed are different from that in your google drive. For example, the msrvtt_caption_anno_train.pkl file I generated by using videodatainfo_2016.json and msrvtt_preprocess.py has length of 130260 but the preprocess msrvtt_caption_anno_train.pkl file in your google drive has length of 6513. So when I tried to run xmodaler on msrvtt dataset files preprocessed by using videodatainfo_2016.json and msrvtt_preprocess.py, I got the following error in validating:

Traceback (most recent call last):
File "./xmodaler/train_net.py", line 68, in
args=(args,),
File "/content/xmodaler/xmodaler/engine/launch.py", line 86, in launch
main_func(*args)
File "./xmodaler/train_net.py", line 56, in main
return trainer.train()
File "/content/xmodaler/xmodaler/engine/defaults.py", line 365, in train
super().train(self.start_iter, self.max_iter)
File "/content/xmodaler/xmodaler/engine/train_loop.py", line 152, in train
self.after_step()
File "/content/xmodaler/xmodaler/engine/train_loop.py", line 182, in after_step
h.after_step()
File "/content/xmodaler/xmodaler/engine/hooks.py", line 407, in after_step
self._do_eval(epoch)
File "/content/xmodaler/xmodaler/engine/hooks.py", line 372, in _do_eval
results = self._func(epoch)
File "/content/xmodaler/xmodaler/engine/defaults.py", line 328, in val_and_save_results
eval_results = self.test(self.cfg, self.model, self.val_data_loader, self.val_evaluator, epoch)
File "/content/xmodaler/xmodaler/engine/defaults.py", line 480, in test
eval_res = evaluator.eval(results, epoch)
File "/content/xmodaler/xmodaler/evaluation/coco_evaler.py", line 44, in eval
cocoEval.evaluate()
File "/usr/local/lib/python3.7/dist-packages/pycocoevalcap/eval.py", line 53, in evaluate
score, scores = scorer.compute_score(gts, res)
File "/usr/local/lib/python3.7/dist-packages/pycocoevalcap/bleu/bleu.py", line 33, in compute_score
assert(len(hypo) == 1)
AssertionError

I guess this error arises from these two files: videodatainfo_2016.json and msrvtt_preprocess.py. I would appreciate if you can help me with this issue. Thank you very much!

Possible implementation in Google Colab.

I would need to generate a caption in a video, what would be the easiest way to use it? I don't have to train it, just use it to generate this caption.
I haven't found a working example in the documentation.
Could it possibly be possible to have a colab notebook ready to use?

Details for generating 2048-dim features in MSR-VTT caption.

Are the features in MSR-VTT caption produced by C3D model ? Or just the naive resnet152 model ?

Discrepancy in VQA performance of UNITER model

Hi,

UNITER (base) model has reported 72.9% VQA accuracy, but in this repo, the same model has about 70% accuracy. Do you know what's the reason for this discrepancy?

TIA!

yehli / xmodaler Goto Github PK

xmodaler's People

Contributors

Stargazers

Watchers

Forkers

xmodaler's Issues

Recommend Projects

Recommend Topics

Recommend Org