Giter Club home page Giter Club logo

Comments (3)

namisan avatar namisan commented on September 2, 2024 1

Pls refer: https://github.com/namisan/mt-dnn/blob/master/experiments/glue/run_glue_finetuning.sh

from mt-dnn.

namisan avatar namisan commented on September 2, 2024

Here is my script and log for electra. Hope it helps.
export CUDA_VISIBLE_DEVICES=0,1,2,3; bash experiments/glue/run_glue_finetuning.sh data/canonical_data/ electra base mnli 128 4

Training Log

02/12/2022 06:15:34 Launching the MT-DNN training
02/12/2022 06:15:34 Loading data/canonical_data//bert-base-uncased/mnli_train.json as task 0
02/12/2022 06:15:47 ####################
02/12/2022 06:15:47 {'log_file': 'glue_app/mnli/bert-base-uncased/mt-dnn-train.log', 'tensorboard': False, 'tensorboard_logdir': 'tensorboard_logdir', 'init_checkpoint': 'google/electra-base-discriminator', 'data_dir': 'data/canonical_data//bert-base-uncased', 'data_sort_on': False, 'name': 'farmer', 'task_def': 'experiments/glue/glue_task_def.yml', 'train_datasets': ['mnli'], 'test_datasets': ['mnli_matched', 'mnli_mismatched'], 'glue_format_on': False, 'mkd_opt': 0, 'do_padding': False, 'update_bert_opt': 0, 'multi_gpu_on': False, 'mem_cum_type': 'simple', 'answer_num_turn': 5, 'answer_mem_drop_p': 0.1, 'answer_att_hidden_size': 128, 'answer_att_type': 'bilinear', 'answer_rnn_type': 'gru', 'answer_sum_att_type': 'bilinear', 'answer_merge_opt': 1, 'answer_mem_type': 1, 'max_answer_len': 20, 'answer_dropout_p': 0.1, 'answer_weight_norm_on': False, 'dump_state_on': False, 'answer_opt': 1, 'pooler_actf': 'tanh', 'mtl_opt': 0, 'ratio': 0, 'mix_opt': 0, 'max_seq_len': 512, 'init_ratio': 1, 'encoder_type': 7, 'num_hidden_layers': -1, 'bert_model_type': 'bert-base-uncased', 'do_lower_case': False, 'masked_lm_prob': 0.15, 'short_seq_prob': 0.2, 'max_predictions_per_seq': 128, 'bin_on': False, 'bin_size': 64, 'bin_grow_ratio': 0.5, 'local_rank': 0, 'world_size': 4, 'master_addr': 'localhost', 'master_port': '6600', 'backend': 'nccl', 'cuda': True, 'log_per_updates': 500, 'save_per_updates': 10000, 'save_per_updates_on': False, 'epochs': 3, 'batch_size': 128, 'batch_size_eval': 8, 'optimizer': 'adamax', 'grad_clipping': 0, 'global_grad_clipping': 1.0, 'weight_decay': 0, 'learning_rate': 5e-05, 'momentum': 0, 'warmup': 0.1, 'warmup_schedule': 'warmup_linear', 'adam_eps': 1e-06, 'vb_dropout': True, 'dropout_p': 0.1, 'dropout_w': 0.0, 'bert_dropout_p': 0.1, 'model_ckpt': 'checkpoints/model_0.pt', 'resume': False, 'have_lr_scheduler': True, 'multi_step_lr': '10,20,30', 'lr_gamma': 0.5, 'scheduler_type': 'ms', 'output_dir': 'glue_app/mnli/bert-base-uncased', 'seed': 2018, 'grad_accumulation_step': 1, 'fp16': False, 'fp16_opt_level': 'O1', 'adv_train': False, 'adv_opt': 0, 'adv_norm_level': 0, 'adv_p_norm': 'inf', 'adv_alpha': 1, 'adv_k': 1, 'adv_step_size': 1e-05, 'adv_noise_var': 1e-05, 'adv_epsilon': 1e-06, 'encode_mode': False, 'debug': False, 'transformer_cache': '.cache', 'rank': 0, 'task_def_list': [{'adv_loss': '<LossCriterion.SymKlCriterion: 7>', 'kd_loss': '<LossCriterion.MseCriterion: 1>', 'loss': '<LossCriterion.CeCriterion: 0>', 'dropout_p': '0.1', 'enable_san': 'False', 'split_names': "['train', 'matched_dev', 'mismatched_dev', 'matched_test', 'mismatched_test']", 'metric_meta': '(<Metric.ACC: 0>,)', 'task_type': '<TaskType.Classification: 1>', 'data_type': '<DataFormat.PremiseAndOneHypothesis: 2>', 'n_class': '3', 'label_vocab': '<data_utils.vocab.Vocabulary object at 0x7f3753a44668>', 'self': '{}', 'class': "<class 'experiments.exp_def.TaskDef'>"}]}
02/12/2022 06:15:47 ####################
02/12/2022 06:15:47 ############# Gradient Accumulation Info #############
02/12/2022 06:15:47 number of step: 9204
02/12/2022 06:15:47 number of grad grad_accumulation step: 1
02/12/2022 06:15:47 adjusted number of step: 9204
02/12/2022 06:15:47 ############# Gradient Accumulation Info #############
02/12/2022 06:15:57
02/12/2022 06:15:57 Total number of params: 109484547
02/12/2022 06:15:57 At epoch 0
02/12/2022 06:16:03 Task [ 0] updates[ 1] train loss[1.10139] remaining[4:41:51]
02/12/2022 06:18:17 Task [ 0] updates[ 500] train loss[0.81959] remaining[0:11:59]
02/12/2022 06:20:30 Task [ 0] updates[ 1000] train loss[0.65015] remaining[0:09:24]
02/12/2022 06:22:48 Task [ 0] updates[ 1500] train loss[0.57299] remaining[0:07:09]
02/12/2022 06:25:08 Task [ 0] updates[ 2000] train loss[0.53009] remaining[0:04:53]
02/12/2022 06:27:22 Task [ 0] updates[ 2500] train loss[0.49991] remaining[0:02:35]
02/12/2022 06:29:36 Task [ 0] updates[ 3000] train loss[0.47922] remaining[0:00:18]
02/12/2022 06:29:54 Evaluation
02/12/2022 06:30:13 Task mnli_matched -- epoch 0 -- Dev ACC: 87.937
02/12/2022 06:30:32 Task mnli_mismatched -- epoch 0 -- Dev ACC: 87.917
02/12/2022 06:30:32 Evaluation
02/12/2022 06:31:10 [new test scores at 0 saved.]
02/12/2022 06:31:13 At epoch 1
02/12/2022 06:33:20 Task [ 0] updates[ 3500] train loss[0.46232] remaining[0:12:59]
02/12/2022 06:35:34 Task [ 0] updates[ 4000] train loss[0.44551] remaining[0:09:59]
02/12/2022 06:37:47 Task [ 0] updates[ 4500] train loss[0.42968] remaining[0:07:30]
02/12/2022 06:40:03 Task [ 0] updates[ 5000] train loss[0.41715] remaining[0:05:12]
02/12/2022 06:42:22 Task [ 0] updates[ 5500] train loss[0.40635] remaining[0:02:55]
02/12/2022 06:44:36 Task [ 0] updates[ 6000] train loss[0.39754] remaining[0:00:37]
02/12/2022 06:45:13 Evaluation
02/12/2022 06:45:32 Task mnli_matched -- epoch 1 -- Dev ACC: 88.630
02/12/2022 06:45:52 Task mnli_mismatched -- epoch 1 -- Dev ACC: 88.548
02/12/2022 06:45:52 Evaluation
02/12/2022 06:46:30 [new test scores at 1 saved.]
02/12/2022 06:46:33 At epoch 2
02/12/2022 06:48:13 Task [ 0] updates[ 6500] train loss[0.38976] remaining[0:12:29]
02/12/2022 06:50:31 Task [ 0] updates[ 7000] train loss[0.38173] remaining[0:10:08]
02/12/2022 06:52:44 Task [ 0] updates[ 7500] train loss[0.37323] remaining[0:07:44]
02/12/2022 06:55:04 Task [ 0] updates[ 8000] train loss[0.36603] remaining[0:05:30]
02/12/2022 06:57:23 Task [ 0] updates[ 8500] train loss[0.35957] remaining[0:03:13]
02/12/2022 06:59:37 Task [ 0] updates[ 9000] train loss[0.35410] remaining[0:00:55]
02/12/2022 07:00:33 Evaluation
02/12/2022 07:00:51 Task mnli_matched -- epoch 2 -- Dev ACC: 88.762
02/12/2022 07:01:11 Task mnli_mismatched -- epoch 2 -- Dev ACC: 88.670

from mt-dnn.

AJSVB avatar AJSVB commented on September 2, 2024

Dear namisan, could you also provide a working configuration for Roberta?

from mt-dnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.