llamazing / numnet_plus Goto Github PK

This is the official code repository for NumNet+(https://leaderboard.allenai.org/drop/submission/blu418v76glsbnh1qvd0)

Dockerfile 0.21% Python 98.90% Shell 0.89%

numnet_plus's Introduction

NumNet+

This is the official code repository for NumNet+ (https://leaderboard.allenai.org/drop/submission/bm60vq8f7g2p7t2ld0j0) and NumNet+ v2 (https://leaderboard.allenai.org/drop/submission/bmfuq9e0v32fq8pskug0). NumNet (https://github.com/ranqiu92/NumNet) was used as a basis for our work.

If you use the code, please cite the following paper:

@inproceedings{ran2019numnet,
  title={{NumNet}: Machine Reading Comprehension with Numerical Reasoning},
  author={Ran, Qiu and Lin, Yankai and Li, Peng and Zhou, Jie and Liu, Zhiyuan},
  booktitle={Proceedings of EMNLP},
  year={2019}
}

Requirements

pip install -r requirements.txt

Usage

Data and pretrained roberta-large preparation.

Download drop data.

wget -O drop_dataset.zip https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip

unzip drop_dataset.zip
Download roberta model.

cd drop_dataset && mkdir roberta.large && cd roberta.large

wget -O pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin
Download roberta config file.

wget -O config.json https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json
- Modify config.json from "output_hidden_states": false to "output_hidden_states": true.
Download roberta vocab files.

wget -O vocab.json https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json

wget -O merges.txt https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt

Train

Train with simple multi-span extraction (NumNet+).

sh train.sh 345 5e-4 1.5e-5 5e-5 0.01
Train with tag based multi-span extraction (NumNet+ v2, tag based multi-span paper: http://arxiv.org/abs/1909.13375, github: https://github.com/eladsegal/tag-based-multi-span-extraction).

sh train.sh 345 5e-4 1.5e-5 5e-5 0.01 tag_mspan

Eval

Save your model as model.pt.
- Simple multi-span extraction (NumNet+).
  
  sh eval.sh drop_dataset/drop_dataset_dev.json prediction.json
- Tag based multi-span extraction (NumNet+ v2).
  
  sh eval.sh drop_dataset/drop_dataset_dev.json prediction.json tag_mspan
python drop_eval.py --gold_path drop_dataset/drop_dataset_dev.json --prediction_path prediction.json

numnet_plus's People

Contributors

Stargazers

Watchers

numnet_plus's Issues

reproducing your numbers

Thanks for making your code available online!

Using your suggested script,

> sh train.sh 345 5e-4 1.5e-5 5e-5 0.01 tag_mspan

I am getting the following scores:

07/09/2020 10:41:00 Eval 9536 examples, result in epoch 5, eval loss 215934.26463018617, eval em 0.78376677852349 eval f1 0.82169672818792.
07/09/2020 10:41:12 Best eval F1 0.82169672818792 at epoch 5
07/09/2020 10:41:12 done training in 76590 seconds!

Just wanted to make sure if these numbers are expected, since it seems like the leaderboard numbers are slightly higher.

Dealing with no answer?

Hi,

Dataset is more like SQUAD 1.0 where it always tries to find answer. SQUAD 2.0 solved that problem.
Is there any way we can solve this for DROP dataset?

Thanks
Mahesh

roberta config fail?

Hi, I follow the README.md but got the following error:

_proj_sequence_g2.ln.bias torch.Size([1024])
_proj_sequence_g2.fc2.weight torch.Size([1, 1024])
_proj_sequence_g2.fc2.bias torch.Size([1])
_proj_span_num.fc1.weight torch.Size([1024, 3072])
_proj_span_num.fc1.bias torch.Size([1024])
_proj_span_num.ln.weight torch.Size([1024])
_proj_span_num.ln.bias torch.Size([1024])
_proj_span_num.fc2.weight torch.Size([9, 1024])
_proj_span_num.fc2.bias torch.Size([9])
10/22/2019 03:58:45 Build optimizer etc...
10/22/2019 03:58:50 At epoch 1
Traceback (most recent call last):
  File "./roberta_gcn_cli.py", line 90, in <module>
    main()
  File "./roberta_gcn_cli.py", line 69, in main
    model.update(batch)
  File "/home/admin/torch/raw_roberta/numnet_plus/tools/model.py", line 47, in update
    output_dict = self.mnetwork(**tasks)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/admin/torch/raw_roberta/numnet_plus/mspan_roberta_gcn/mspan_roberta_gcn.py", line 169, in forward
    sequence_output_list = [ item for item in outputs[2][-4:] ]
IndexError: tuple index out of range

Should we set output_hidden_states=true ?

don't see any `output_hidden_states` variable in the config file

In the current version of roberta config I don't see any output_hidden_states variable.
Here is the config I see here: https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json

{
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

Could it be that the config file has changed.

GPU usage and training time

Hi.
Thanks for your great work.
I wonder which GPU did you use, GPU memory usage, and how much time you spend on training?

Traceback (most recent call last):
File "./roberta_gcn_cli.py", line 94, in
main()
File "./roberta_gcn_cli.py", line 72, in main
model.update(batch)
File "/data/gaoshu562/DROP/numnet_plus-master/tools/model.py", line 47, in update
output_dict = self.mnetwork(**tasks)
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 144, in forward
return self.gather(outputs, self.output_device)
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 156, in gather
return gather(outputs, output_device, dim=self.dim)
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 67, in gather
return gather_map(outputs)
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 61, in gather_map
for k in out))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 61, in
for k in out))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 61, in gather_map
for k in out))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 61, in
for k in out))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 61, in gather_map
for k in out))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 61, in
for k in out))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/data/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration

Missing enhance code in repo.

Hi, thanks for your open source. Where can we find the enhance code mentioned in your paper? Missing the enhance in v2 can not reproduce f1 85 in leaderboard ?

can't get EM=0.79 with the default parameters setting

Appreciate your great works!
But I can't get the result of EM=0.79 with the default setting sh train.sh 345 5e-4 1.5e-5 5e-5 0.01.
Limited by the gpu memory, I set the gradient_accumulation_steps=8, and the final result is eval em 0.7414010067114094 eval f1 0.7768875838926179.
Then I use the FP16 training in this code and the gradient_accumulation_steps can be decrease to 4. The new result of FP16 is eval em 0.7577600671140939 eval f1 0.7950440436241618.

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

01/11/2020 12:16:13 Loading data...
Load data from tmspan_cached_roberta_train.pkl.
Load data size 75943.
Load data from tmspan_cached_roberta_dev.pkl.
Load data size 9536.
01/11/2020 12:18:07 Num update steps 23732!
01/11/2020 12:18:07 Build bert model.
01/11/2020 12:18:18 Build Drop model.
gcn iteration_steps=3
01/11/2020 12:18:18 Build optimizer etc...
01/11/2020 12:18:23 At epoch 1
Traceback (most recent call last):
File "./roberta_gcn_cli.py", line 103, in
main()
File "./roberta_gcn_cli.py", line 82, in main
model.update(batch)
File "/home/srmishr1/numnet_plus/tools/model.py", line 47, in update
output_dict = self.mnetwork(**tasks)
File "/home/srmishr1/.conda/envs/numnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/srmishr1/numnet_plus/tag_mspan_robert_gcn/tag_mspan_roberta_gcn.py", line 232, in forward
sequence_output_list[2] = self._gcn_enc(self._proj_ln(sequence_output_list[2] + gcn_info_vec))
File "/home/srmishr1/.conda/envs/numnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/srmishr1/numnet_plus/mspan_roberta_gcn/util.py", line 22, in forward
output, _ = self.enc_layer(input)
File "/home/srmishr1/.conda/envs/numnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/srmishr1/.conda/envs/numnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

multi GPUs error

when i set the gpu_num=2 in options.py and run the code , it occurs the error .

IndexError: tuple index out of range

11/02/2020 07:49:01 Num update steps 379535!
11/02/2020 07:49:01 Build bert model.
11/02/2020 07:49:13 Build Drop model.
11/02/2020 07:49:14 Build optimizer etc...
11/02/2020 07:49:19 At epoch 1
Traceback (most recent call last):
File "./roberta_gcn_cli.py", line 103, in
main()
File "./roberta_gcn_cli.py", line 82, in main
model.update(batch)
File "/home/bssachde/numnet/numnet_plus/tools/model.py", line 47, in update
output_dict = self.mnetwork(**tasks)
File "/home/bssachde/.conda/envs/allennlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/bssachde/numnet/numnet_plus/mspan_roberta_gcn/mspan_roberta_gcn.py", line 167, in forward
sequence_output_list = [ item for item in outputs[2][-4:] ]
IndexError: tuple index out of range

I am getting the above error when using the Numnet_plus code directly by accessing Github without any changes. As far as I remember this kind of issue usually occurred for migrating between pytorch-pretrained-bert and transformers but I am not able to figure out where to make changes here. Could anyone help me out here?

Warning: masked_fill_

While training, I am getting this warning infinitely and timed out.

[W LegacyDefinitions.cpp:28] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)
Seems some version mismatch issue.
How to fix this?

have paper letter or report in shenzhen?

or related work?
about the modules introduction and the experiment
if you have a plan to give a report to generalize it, when, where and how it happens
3x

Got poor model performance with gradient_accumulation_steps=8

The model performance training with gradient_accumulation_steps=8 is much worse than expeced, but why? All hyperparameters are consistent with yours, except gradient_accumulation_steps, set as 8 for gpu memory consideration。

train batch size=16
gradient_accumulation_steps=8

python drop_eval.py --gold_path drop_dataset/drop_dataset_dev.json --prediction_path prediction.json
Exact-match accuracy 60.57
F1 score 64.48
60.57 & 64.48

date: 147 (1.54%)
Exact-match accuracy 27.211
F1 score 32.129
number: 5911 (61.99%)
Exact-match accuracy 65.353
F1 score 65.549
span: 3010 (31.56%)
Exact-match accuracy 59.336
F1 score 66.323
spans: 468 (4.91%)
Exact-match accuracy 18.590
F1 score 49.218

RuntimeError: CUDA error: device-side assert triggered

10/18/2021 03:06:23 Updates[ 0] train loss[nan] train em[0.00000] f1[0.00000] remaining[13:46:19]
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [38,0,0], thread: [459,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [39,0,0], thread: [255,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [46,0,0], thread: [365,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
Traceback (most recent call last):
File "./roberta_gcn_cli.py", line 104, in
main()
File "./roberta_gcn_cli.py", line 83, in main
model.update(batch)
File "/home/laic2021/fajinyaosu/numnet_plus-master/tools/model.py", line 47, in update
output_dict = self.mnetwork(**tasks)
File "/home/laic2021/anaconda3/envs/numnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/laic2021/fajinyaosu/numnet_plus-master/tag_mspan_robert_gcn/tag_mspan_roberta_gcn.py", line 472, in forward
is_bio_mask)
File "/home/laic2021/fajinyaosu/numnet_plus-master/tag_mspan_robert_gcn/multispan_heads.py", line 205, in log_likelihood
if answer_as_text_to_disjoint_bios.sum() > 0:
RuntimeError: CUDA error: device-side assert triggered

What is the result for 5epochs.

Hi, what is the final f1 and em for sh train.sh 345 5e-4 1e-4 5e-4 0.01 ?

Cuda memory error

Hi,

Great repo! Thanks! When I tried to rerun the code, I met a memory error. The error message shows like this:

RuntimeError: CUDA out of memory. Tried to allocate 56.00 MiB (GPU 0; 7.43 GiB total capacity; 6.59 GiB already allocated; 36.81 MiB free; 6.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The GPU I use is:

I use torch=1.10.0, allennlp=2.8.0 and run the model on Google Cloud Platform.

Even if I changed the batch size (from default 16 to 1), the error message remains the same (no changes with the number of memory allocated). I also tried to add "torch.cuda.empty_cache()" at the beginning of each step, but not worked. I also tried different VM instances on GCP, but same problem happens (with some different number of memory shown in error messages). That's very strange.

Could anyone help me or have some ideas what's going on?

Thanks!