Giter Club home page Giter Club logo

dfgn-pytorch's Introduction


A pytorch implementation of our ACL2019 paper (arXiv)

Dynamically Fused Graph Network for Multi-hop Reasoning
Yunxuan Xiao, Yanru Qu, Lin Qiu, Hao Zhou, Lei Li, Weinan Zhang, Yong Yu
Accepted by ACL 2019

This repo is still under construction. Currently, we have provided the core code of DFGN and pretrained checkpoints. Although the pre-processing part is not available now, we provide processed data for you to start training. Feel free to contact us if you have any questions.

Our result has been published on HotpotQA Leaderboard.


python 3, pytorch 0.4.1, boto3

To install pytorch 0.4.1, you can follow the instruction on this page For exmaple install with CUDA 9 via anaconda:

conda install pytorch=0.4.1 cuda90 -c pytorch

Install boto3

pip install boto3

Download Data

Bert Models

Firstly, you should download and set bert pretrained model and vocabulary properly. You can find the download links in pytorch_pretrained_bert/ row 40-51, and pytorch_pretrained_bert/ row 30-41. After you finish downloading, you should replace the dict value with your own local path accordingly.

Released Checkpoints

We also released our pretrained model for reproduction.

mkdir DFGN/ckpt
tar -xvzf ./DFGN-base.tar.gz -C DFGN/ckpt

Preprocessed Data

Next download our preprocessed train & dev data of HotpotQA distractor setting.

Extract all compressed files into DFGN/data folder.

cd DFGN-pytorch/DFGN
mkdir data
tar -xvzf ./data.tar.gz -C data
cd data

Also you can preprocess by yourself following the instructions in the next section. The official HotpotQA data is available in


Previously we provided intermediate data files for training DFGN. Now we published the code for preprocessing. The preprocessing phase consists of paragraph selection, named entity recognition, and graph construction.

First, download model checkpoints and save them in ./work_dir

Then run as below, replacing ${TRAIN_FILE}, ${DEV_FILE} as the official train/dev file. You can finally get all preprocessed files in \work_dir\dev and \work_dir\train



To train a DFGN model, we need at least 2 GPUs (One for BERT encoding, one for DFGN model). Now training with default parameters:

CUDA_VISIBLE_DEVICES=0,1 python --name=YOUR_EXPNAME --q_update --q_attn --basicblock_trans --bfs_clf

If an OOM exception occurs, you may try to set a smaller batch size with gradient_accumulate_step > 1.

Your predictions and checkpoints in each epoch will be stored in ./output directory. By running local evaluation script, you may get results like this:

best iter em f1 pr re sp_em sp_f1 sp_pr sp_re jt_em jt_f1 jt_pr jt_re
epxx 0.5542 0.6909 0.7169 0.7039 0.5218 0.8196 0.8604 0.8098 0.3325 0.5942 0.6435 0.5993

Local Evaluation

There are two evaluation scripts here.

The first is the official evaluation script, which can evaluate a single prediction file.

python YOUR_PREDICTION data/hotpot_dev_distractor_v1.json

The second one can evaluate all predictions in a folder. For example you have predictions in output/submissions/YOUR_EXPNAME:

python output/submissions/YOUR_EXPNAME data/hotpot_dev_distractor_v1.json

Inference using our released model

python output/submissions/prediction.json data/hotpot_dev_distractor_v1.json

You may get similar results like this:

'em': 0.5567859554355166,
'f1': 0.693802079009206,
'prec': 0.7207548475981969,
'recall': 0.7048612545455903,
'sp_em': 0.5311276164753544,
'sp_f1': 0.8223151063056721,
'sp_prec': 0.865363493135274,
'sp_recall': 0.8101753962895138,
'joint_em': 0.337744767049291,
'joint_f1': 0.5989142669137962,
'joint_prec': 0.6510258098492401,
'joint_recall': 0.6003632270835144

dfgn-pytorch's People


woshiyyya avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dfgn-pytorch's Issues

the process is stop all the time,i can not solve this problem?

Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.6/", line 916, in _bootstrap_inner
File "", line 32, in run
context_encoding = large_batch_encode(self.bert, batch, encoder_gpus, args.max_bert_size)
File "", line 78, in large_batch_encode
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/parallel/", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/DFGN-pytorch-2/DFGN/pytorch_pretrained_bert/", line 635, in forward
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/DFGN-pytorch-2/DFGN/pytorch_pretrained_bert/", line 334, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/DFGN-pytorch-2/DFGN/pytorch_pretrained_bert/", line 319, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/DFGN-pytorch-2/DFGN/pytorch_pretrained_bert/", line 279, in forward
self_output = self.self(input_tensor, attention_mask)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/DFGN-pytorch-2/DFGN/pytorch_pretrained_bert/", line 230, in forward
mixed_query_layer = self.query(hidden_states)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/modules/", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/home/lrh/.virtualenvs/lrh/lib/python3.6/site-packages/torch/nn/", line 1026, in linear
output = input.matmul(weight.t())
RuntimeError: cublas runtime error : resource allocation failed at /pytorch/aten/src/THC/THCGeneral.cpp:333

关于paragraphs selection的问题

您好,请问您的selector network中的paragraphs是测试集已经提供了10个吗?那是不是您的这个方法在fullwiki上并不能很好的先找到相关的段落呢?


1: 在read_hotpot_examples并没有对答案开始位置和结束位置进行判断,并且在返回example.orig_answer_text="",给的是空串,按理说应该获取。

Test on codalab

您好,我在cadalab上测试时,指定使用GPU数量为1,但是报错Cannot assign enough resources: Requested more GPUs (1) than available (0 currently out of 1 on the machine)。请问您在测试时遇到过这个问题吗,如何解决呢

The Requirement of CPU Memory

Hi, can you tell me what's the requirement of CPU memory to reproduce your codes? My cpu memory is 32GB. I just try to train the model, it will out ou my CPU memory when loading the third data(train_graph.pkl.gz). Can you help me solve it? Thank you very much.









The problems of code.

sp_logits = self.sp_linear(sp_logits) # N x max_sent x 1
sp_logits_aux = Variable(, sp_logits.size(1), 1).zero_())
sp_prediction =[sp_logits_aux, sp_logits], dim=-1).contiguous() # N x max_sent x 2
您好,我想问一下第二行是否可以等价为torch.zero_like(sp_logits),因为Variable在pytorch 0.4基本不咋用?其次第三行拼接的意义是什么,为什么这样拼接?谢谢。

关于论文中entity graph的一些问题?

  1. for every pair of entities appear in the same sentence in C (sentencelevel links);2. for every pair of entities with the same mention text in C (context-level links); and 3. between a central entity node and other entities within the same paragraph(paragraph-levellinks).

soft mask使用的位置

在论文中,得到soft mask mt之后,将其与entity state相乘得到 带尾巴的E(t-1),然后使用这个 带尾巴的E(t-1) 去进行GNN的传播。
但是我发现在代码里面,在进行GNN传播时,使用的是 还没有和soft mask(在代码里是adj_mask, 在layers.py的125行得到)相乘的entity state(GNN传播代码在layers.py的127到145行)。在进行完GNN传播之后,才用soft mask去更新entity state(layers.py的148到149行)。请问这个计算顺序是不是有点问题?


重新跑了一遍数据预处理,结果为EM/F1: 31.47/58.96,显著低于论文结果。用作者公布的预处理好的数据跑结果基本一致。


update: 训练不稳定,重跑一遍即可


CUDA_VISIBLE_DEVICES=0,1 python --name=YOUR_EXPNAME --q_update --q_attn --basicblock_trans --bfs_clf
loading data/dev_graph.pkl.gz
Traceback (most recent call last):
File "", line 215 in
File "/root/miniconda3/envs/myconda/lib/python3.6/site-packages/torch/nn/modules/", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/root/miniconda3/envs/myconda/lib/python3.6/site-packages/torch/nn/modules/", line 185, in _apply
File "/root/miniconda3/envs/myconda/lib/python3.6/site-packages/torch/nn/modules/", line 185, in _apply
File "/root/miniconda3/envs/myconda/lib/python3.6/site-packages/torch/nn/modules/", line 185, in _apply
[Previous line repeated 1 more time]
File "/root/miniconda3/envs/myconda/lib/python3.6/site-packages/torch/nn/modules/", line 112, in _apply
File "/root/miniconda3/envs/myconda/lib/python3.6/site-packages/torch/nn/modules/", line 105, in flatten_parameters
self.batch_first, bool(self.bidirection))

我的环境:python 3.6,pytorch 0.4.1,cuda 9.2, cudnn 7.6.5

How many graph edge types are used in implementation? 在实现中使用了多少种边类型?

According to the original paper, 3 types of edges are involved in graph. However, I found there is only one adjacent matrix in each sample in the pre-processed pickle, whose max value is 1, which means only ONE type of edge is involved in current version? Based on the implementation in GATSelfAttentionIntraMask of model/, the value of adjacent matrix should be regarded as an indicator of different edge types?

Inquiry about ‘nan’ in pickle file

Thanks for your excellent work. Some 'nan' occur in pkl file downloading from your google drive, but the final result is as good as you've declared. Accutually, I run some demos and extract bert embedding and find the absolute value in each dimension are mostly under 2 without 'nan'. How can thoses 'nan' effect the whole network, will it trigger the gradient explosion?

在将batchsize=8 gradient_accumulate_step=2时,模型效果会在某个epoch骤降,例如epoch16

ep11 0.5195 0.6570 0.6836 0.6691 0.4739 0.8070 0.8251 0.8201 0.2839 0.5560 0.5893 0.5750
ep12 0.5097 0.6451 0.6779 0.6520 0.4749 0.8023 0.8413 0.7968 0.2783 0.5428 0.5950 0.5458
ep13 0.5228 0.6593 0.6908 0.6678 0.4887 0.8089 0.8493 0.8016 0.2926 0.5593 0.6115 0.5624
ep14 0.5016 0.6384 0.6701 0.6471 0.4816 0.8009 0.8442 0.7907 0.2804 0.5390 0.5938 0.5396
ep15 0.4836 0.6173 0.6402 0.6341 0.4543 0.7882 0.8210 0.7894 0.2564 0.5140 0.5518 0.5294
ep16 0.0321 0.0326 0.0327 0.0330 0.0617 0.4420 0.5404 0.4246 0.0051 0.0193 0.0236 0.0182
ep17 0.4718 0.6070 0.6369 0.6188 0.4640 0.7954 0.8274 0.7955 0.2540 0.5077 0.5528 0.5177
ep18 0.0234 0.0614 0.0653 0.0884 0.0728 0.4275 0.5721 0.3758 0.0020 0.0311 0.0424 0.0374
ep19 0.0641 0.1104 0.1153 0.1312 0.0835 0.4011 0.5517 0.3408 0.0074 0.0566 0.0789 0.0568
ep20 0.1028 0.1568 0.1618 0.1744 0.1693 0.5481 0.6595 0.5117 0.0239 0.0990 0.1210 0.1038
ep21 0.0359 0.0678 0.0756 0.0782 0.4717 0.8048 0.8239 0.8183 0.0234 0.0583 0.0651 0.0684

Results are slightly lower than these on paper

Thanks for your contributions. We trained the model with bert-base-uncased for 35 epochs, getting the results as follows:
'em': 0.5504,
'f1': 0.6899,
'prec': 0.7157,
'recall': 0.7034,
'sp_em': 0.4847,
'sp_f1': 0.8015,
'sp_prec': 0.8283,
'sp_recall': 0.8111,
'joint_em': 0.3072,
'joint_f1': 0.5809,
'joint_prec': 0.6186,
'joint_recall': 0.6001
which is slightly lower than these on readme and paper.
Any modification is not conducted in the project.
Are there any details we have ignored?


请问数据处理部分的代码大概什么时候会上传呀,自己看了一下有很多需要自己去预处理。 还有就是现在是其他部分最终版本了吗?

input dataset

why input data (hotpotqa) is in CSV format? original data is Json? how should I convert it?


| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Tesla P100-PCIE... On | 00000000:00:06.0 Off | 0 |
| N/A 30C P0 28W / 250W | 11675MiB / 12198MiB | 0% Default |
| 1 Tesla P100-PCIE... On | 00000000:00:08.0 Off | 0 |
| N/A 30C P0 29W / 250W | 7823MiB / 12198MiB | 0% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 21430 C python 11665MiB |
| 1 21430 C python 7813MiB |

Avg-LOSS0/batch/step: 6.6137880611419675
Avg-LOSS1/batch/step: 3.8737828445434572
Avg-LOSS2/batch/step: 0.00037345796823501587
Avg-LOSS3/batch/step: 1.449139289855957
Avg-LOSS4/batch/step: 1.2904924607276917
100%|█████████████████████████████████████████| 962/962 [19:47<00:00, 1.06it/s]
1%|▎ | 2/232 [00:02<04:24, 1.15s/it]
Exception in thread Thread-5:
Traceback (most recent call last):
File "/nesi/nobackup/uoa02874/anaconda3/lib/python3.7/", line 926, in bootstrap_inner
File "", line 59, in run
join(args.prediction_path, 'pred_epoch
File "", line 122, in predict
start, end, sp, Type, softmask, ent, yp1, yp2 = model(batch, return_yp=True)
File "/home/zden658/.local/lib/python3.7/site-packages/torch/nn/modules/", line 541, in call
result = self.forward(*input, **kwargs)
File "/scale_wlg_persistent/filesets/project/uoa02874/PycharmProjects/DFGN-pytorch-master/DFGN/model/", line 59, in forward
input_state, entity_state, softmask = self.basicblocks[l](input_state, query_vec, batch)
File "/home/zden658/.local/lib/python3.7/site-packages/torch/nn/modules/", line 541, in call
result = self.forward(*input, **kwargs)
File "/scale_wlg_persistent/filesets/project/uoa02874/PycharmProjects/DFGN-pytorch-master/DFGN/model/", line 245, in forward
entity_state = self.tok2ent(doc_state, entity_mapping, entity_length)
File "/home/zden658/.local/lib/python3.7/site-packages/torch/nn/modules/", line 541, in call
result = self.forward(*input, **kwargs)
File "/scale_wlg_persistent/filesets/project/uoa02874/PycharmProjects/DFGN-pytorch-master/DFGN/model/", line 46, in forward
entity_states = entity_mapping.unsqueeze(3) * doc_state.unsqueeze(1) # N x E x L x d
RuntimeError: CUDA out of memory. Tried to allocate 1.17 GiB (GPU 0; 11.91 GiB total capacity; 5.55 GiB already allocated; 523.38 MiB free; 5.15 GiB cached)

Why are results in readme lower than results in paper?

hi, I am confused that the results in readme are lower than results in paper. Are there any differences between the released code with your paper's experiments? Looking forward to your reply.

'em': 0.5567859554355166,
'f1': 0.693802079009206,
'prec': 0.7207548475981969,
'recall': 0.7048612545455903,
'sp_em': 0.5311276164753544,
'sp_f1': 0.8223151063056721,
'sp_prec': 0.865363493135274,
'sp_recall': 0.8101753962895138,
'joint_em': 0.337744767049291,
'joint_f1': 0.5989142669137962,
'joint_prec': 0.6510258098492401,
'joint_recall': 0.6003632270835144


root@5df95ab01abf:/workspace/pythonprogram_zm/DFGN_0915/bert_ner# python3
Better speed can be achieved with apex installed from
loading data from: ../workdir/zm_out/dev_selected_paras1.json
58%|#########################################################################3 | 1330/2303 [20:45<16:18, 1.01s/it]Traceback (most recent call last):
File "", line 168, in
eval_para(model, eval_iter, eval_dataset.sent_id, args.output_path)
File "", line 82, in eval_para
for i, batch in enumerate(tqdm(iterator)):
File "/usr/local/lib/python3.5/dist-packages/tqdm/", line 1005, in iter
for obj in iterable:
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/", line 560, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/", line 560, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/workspace/pythonprogram_zm/DFGN_0915/bert_ner/", line 103, in getitem
assert len(words)==sum(is_heads)

parser.add_argument('--ckpt_path', type=str, default='../workdir/')
parser.add_argument('--input_path', type=str, default='../workdir/zm_out/dev_selected_paras1.json')
parser.add_argument('--output_path', type=str, default='../workdir/zm_out/entities.json')
parser.add_argument('--batch_size', type=int, default=32)
parser.add_argument('--use_query', action='store_true')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.