Giter Club home page Giter Club logo

Comments (8)

ArrowLuo avatar ArrowLuo commented on August 25, 2024

Hi @dawnlh, would you provide your log.txt here? I can not locate the problem through the command.

from univl.

dawnlh avatar dawnlh commented on August 25, 2024

Hi @dawnlh, would you provide your log.txt here? I can not locate the problem through the command.

Thanks a lot! Here is the log file:

2021-05-25 11:15:57,643:INFO: Effective parameters:
2021-05-25 11:15:57,644:INFO:   <<< batch_size: 256
2021-05-25 11:15:57,644:INFO:   <<< batch_size_val: 32
2021-05-25 11:15:57,644:INFO:   <<< bert_model: bert-base-uncased
2021-05-25 11:15:57,644:INFO:   <<< cache_dir: 
2021-05-25 11:15:57,644:INFO:   <<< coef_lr: 0.1
2021-05-25 11:15:57,644:INFO:   <<< cross_model: cross-base
2021-05-25 11:15:57,644:INFO:   <<< cross_num_hidden_layers: 2
2021-05-25 11:15:57,644:INFO:   <<< data_path: data/msrvtt/MSRVTT_data.json
2021-05-25 11:15:57,644:INFO:   <<< datatype: msrvtt
2021-05-25 11:15:57,644:INFO:   <<< decoder_model: decoder-base
2021-05-25 11:15:57,644:INFO:   <<< decoder_num_hidden_layers: 3
2021-05-25 11:15:57,644:INFO:   <<< do_eval: True
2021-05-25 11:15:57,644:INFO:   <<< do_lower_case: True
2021-05-25 11:15:57,644:INFO:   <<< do_pretrain: False
2021-05-25 11:15:57,644:INFO:   <<< do_train: False
2021-05-25 11:15:57,644:INFO:   <<< epochs: 20
2021-05-25 11:15:57,644:INFO:   <<< feature_framerate: 1
2021-05-25 11:15:57,644:INFO:   <<< features_path: data/msrvtt/msrvtt_videos_features.pickle
2021-05-25 11:15:57,644:INFO:   <<< fp16: False
2021-05-25 11:15:57,644:INFO:   <<< fp16_opt_level: O1
2021-05-25 11:15:57,644:INFO:   <<< gradient_accumulation_steps: 1
2021-05-25 11:15:57,644:INFO:   <<< hard_negative_rate: 0.5
2021-05-25 11:15:57,644:INFO:   <<< init_model: weight/univl.pretrained.bin
2021-05-25 11:15:57,644:INFO:   <<< local_rank: 0
2021-05-25 11:15:57,644:INFO:   <<< lr: 0.0001
2021-05-25 11:15:57,644:INFO:   <<< lr_decay: 0.9
2021-05-25 11:15:57,644:INFO:   <<< margin: 0.1
2021-05-25 11:15:57,644:INFO:   <<< max_frames: 100
2021-05-25 11:15:57,644:INFO:   <<< max_words: 20
2021-05-25 11:15:57,644:INFO:   <<< min_time: 5.0
2021-05-25 11:15:57,645:INFO:   <<< n_display: 100
2021-05-25 11:15:57,645:INFO:   <<< n_gpu: 1
2021-05-25 11:15:57,645:INFO:   <<< n_pair: 1
2021-05-25 11:15:57,645:INFO:   <<< negative_weighting: 1
2021-05-25 11:15:57,645:INFO:   <<< num_thread_reader: 4
2021-05-25 11:15:57,645:INFO:   <<< output_dir: ckpts/ckpt_msrvtt_caption
2021-05-25 11:15:57,645:INFO:   <<< sampled_use_mil: False
2021-05-25 11:15:57,645:INFO:   <<< seed: 42
2021-05-25 11:15:57,645:INFO:   <<< stage_two: True
2021-05-25 11:15:57,645:INFO:   <<< task_type: caption
2021-05-25 11:15:57,645:INFO:   <<< text_num_hidden_layers: 12
2021-05-25 11:15:57,645:INFO:   <<< train_csv: data/youcookii_singlef_train.csv
2021-05-25 11:15:57,645:INFO:   <<< use_mil: False
2021-05-25 11:15:57,645:INFO:   <<< val_csv: data/msrvtt/MSRVTT_JSFUSION_test.csv
2021-05-25 11:15:57,645:INFO:   <<< video_dim: 1024
2021-05-25 11:15:57,645:INFO:   <<< visual_model: visual-base
2021-05-25 11:15:57,645:INFO:   <<< visual_num_hidden_layers: 6
2021-05-25 11:15:57,645:INFO:   <<< warmup_proportion: 0.1
2021-05-25 11:15:57,645:INFO:   <<< world_size: 1
2021-05-25 11:15:57,646:INFO: device: cuda:0 n_gpu: 1
2021-05-25 11:15:57,646:INFO: loading vocabulary file /data2/zzh/project/SCI_caption/UniVL/modules/bert-base-uncased/vocab.txt
2021-05-25 11:15:58,017:INFO: loading archive file /data2/zzh/project/SCI_caption/UniVL/modules/bert-base-uncased
2021-05-25 11:15:58,018:INFO: Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

2021-05-25 11:15:58,018:INFO: loading archive file /data2/zzh/project/SCI_caption/UniVL/modules/visual-base
2021-05-25 11:15:58,018:INFO: Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 1,
  "type_vocab_size": 2,
  "vocab_size": 1024
}

2021-05-25 11:15:58,018:INFO: Weight doesn't exsits. /data2/zzh/project/SCI_caption/UniVL/modules/visual-base/visual_pytorch_model.bin
2021-05-25 11:15:58,018:INFO: loading archive file /data2/zzh/project/SCI_caption/UniVL/modules/cross-base
2021-05-25 11:15:58,018:INFO: Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 1024,
  "num_attention_heads": 12,
  "num_hidden_layers": 2,
  "type_vocab_size": 2,
  "vocab_size": 768
}

2021-05-25 11:15:58,018:INFO: Weight doesn't exsits. /data2/zzh/project/SCI_caption/UniVL/modules/cross-base/cross_pytorch_model.bin
2021-05-25 11:15:58,018:INFO: loading archive file /data2/zzh/project/SCI_caption/UniVL/modules/decoder-base
2021-05-25 11:15:58,019:INFO: Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_target_embeddings": 512,
  "num_attention_heads": 12,
  "num_decoder_layers": 1,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

2021-05-25 11:15:58,019:INFO: Weight doesn't exsits. /data2/zzh/project/SCI_caption/UniVL/modules/decoder-base/decoder_pytorch_model.bin
2021-05-25 11:15:58,019:WARNING: Stage-One:False, Stage-Two:True
2021-05-25 11:15:58,019:WARNING: Set bert_config.num_hidden_layers: 12.
2021-05-25 11:15:59,122:WARNING: Set visual_config.num_hidden_layers: 6.
2021-05-25 11:15:59,591:WARNING: Set cross_config.num_hidden_layers: 2.
2021-05-25 11:15:59,763:WARNING: Set decoder_config.num_decoder_layers: 3.
2021-05-25 11:16:02,843:INFO: --------------------
2021-05-25 11:16:02,843:INFO: Weights from pretrained model not used in UniVL: 
   cls.predictions.bias
   cls.predictions.transform.dense.weight
   cls.predictions.transform.dense.bias
   cls.predictions.transform.LayerNorm.weight
   cls.predictions.transform.LayerNorm.bias
   cls.predictions.decoder.weight
   cls_visual.predictions.weight
   cls_visual.predictions.bias
   cls_visual.predictions.transform.dense.weight
   cls_visual.predictions.transform.dense.bias
   cls_visual.predictions.transform.LayerNorm.weight
   cls_visual.predictions.transform.LayerNorm.bias
   similarity_pooler.dense.weight
   similarity_pooler.dense.bias
2021-05-25 11:16:10,136:INFO: ***** Running test *****
2021-05-25 11:16:10,136:INFO:   Num examples = 2990
2021-05-25 11:16:10,136:INFO:   Batch size = 32
2021-05-25 11:16:10,136:INFO:   Num steps = 94
2021-05-25 11:23:31,867:INFO: >>>  BLEU_1: 0.1410, BLEU_2: 0.0450, BLEU_3: 0.0142, BLEU_4: 0.0052
2021-05-25 11:23:31,877:INFO: >>>  METEOR: 0.0684, ROUGE_L: 0.1229, CIDEr: 0.0045

from univl.

ArrowLuo avatar ArrowLuo commented on August 25, 2024

Hi @dawnlh, I suppose that you evaluate the pretrained weight (zero-shot) directly instead of finetuning. You should finetune with --do_train at first.

from univl.

dawnlh avatar dawnlh commented on August 25, 2024

Hi @dawnlh, I suppose that you evaluate the pretrained weight (zero-shot) directly instead of finetuning. You should finetune with --do_train at first.

Yes, I evaluated the pretrained weight (zero-shot) directly. I tried to finetune the model, but failed due to limited GPU memory ๏ผˆeven setting batch_size to 1) . Can you give an estimation about how much GPU memory is needed to finetune the model? Or is it convenient for you to share the weights for captioning task (no transcript) ?

from univl.

ArrowLuo avatar ArrowLuo commented on August 25, 2024

Hi @dawnlh. We finetuned the model with 4 Tesla V100 GPUs. I am so sorry that we can not provide the finetuned weights.

from univl.

dawnlh avatar dawnlh commented on August 25, 2024

Okay, thanks anyway~ I'll try to figure out the GPU limitation problem. Another question is that if you can provide some instructions or codes on making use of finetuned model to deal with video captioning tasks for self-captured videos? I mean the input video processing (how to extract the same feature as the training set to serve as the model input) and output visualization.

from univl.

ArrowLuo avatar ArrowLuo commented on August 25, 2024

More information about the feature extractor can be found at README. The caption results are saved in --output_dir.

from univl.

dawnlh avatar dawnlh commented on August 25, 2024

More information about the feature extractor can be found at README. The caption results are saved in --output_dir.

Got it! Thank you a lot for your patient replying.

from univl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.