Giter Club home page Giter Club logo

glre's Introduction

Global-to-Local Neural Networks for Document-Level Relation Extraction

Contributions Welcome License language-python3 made-with-Pytorch

Relation extraction (RE) aims to identify the semantic relations between named entities in text. Recent years have witnessed it raised to the document level, which requires complex reasoning with entities and mentions throughout an entire document. In this paper, we propose a novel model to document-level RE, by encoding the document information in terms of entity global and local representations as well as context relation representations. Entity global representations model the semantic information of all entities in the document, entity local representations aggregate the contextual information of multiple mentions of specific entities, and context relation representations encode the topic information of other relations. Experimental results demonstrate that our model achieves superior performance on two public datasets for document-level RE. It is particularly effective in extracting relations between entities of long distance and having multiple mentions.

Getting Started

Package Description

GLRE/
├─ configs/
    ├── cdr_basebert.yaml: config file for CDR dataset under "Train" setting
    ├── cdr_basebert_train+dev.yaml: config file for CDR dataset under "Train+Dev" setting
    ├── docred_basebert.yaml: config file for DocRED dataset under "Train" setting
├─ data/: raw data and preprocessed data about CDR and DocRED dataset
    ├── CDR/
    ├── DocRED/
├─ data_processing/: data preprocessing scripts
├─ results/: pre-trained models and results 
├─ scripts/: running scripts
├─ src/
    ├── data/: read data and convert to batch
    ├── models/: core module to implement GLRE
    ├── nnet/: sub-layers to implement GLRE
    ├── utils/: utility function
    ├── main.py

Dependencies

  • python (>=3.6)
  • pytorch (>=1.5)
  • numpy (>=1.13.3)
  • recordtype (>=1.3)
  • yamlordereddictloader (>=0.4.0)
  • tabulate (>=0.8.7)
  • transformers (>=2.8.0)
  • scipy (>=1.4.1)
  • scikit-learn (>=0.22.1)

Usage

Datasets & Pre-processing

The datasets include CDR and DocRED. The data are located in data/CDR directory and data/DocRED directory, respectively. The pre-processing scripts are located in the data_processing directory, and the pre-processing results are located in the data/CDR/processed directory and data/DocRED/processed directory, respectively. The pre-trained models are in the results directory.

Specifically, we pre-processed the CDR dataset following edge-oriented graph:

Download the GENIA Tagger and Sentence Splitter:
$ cd data_processing
$ mkdir common && cd common
$ wget http://www.nactem.ac.uk/y-matsu/geniass/geniass-1.00.tar.gz && tar xvzf geniass-1.00.tar.gz
$ cd geniass/ && make && cd ..
$ git clone https://github.com/bornabesic/genia-tagger-py.git
$ cd genia-tagger-py 

Here, you should modify the Makefile inside genia-tagger-py and replace line 3 with `wget http://www.nactem.ac.uk/GENIA/tagger/geniatagger-3.0.2.tar.gz`
$ make
$ cd ../../

In order to process the datasets, they should first be transformed into the PubTator format. The run the processing scripts as follows:
$ sh process_cdr.sh

Then, please use the following code to preprocess the DocRED dataset:

python docRedProcess.py --input_file ../data/DocRED/train_annotated.json \
                   --output_file ../data/DocRED/processed/train_annotated.data \

Train & Test

First, you should download biobert_base and bert_base from figshare and place them in the GLRE directory.

The default hyper-parameters are in the configs directory and the train&test scripts are in the scripts directory. Besides, the run_cdr_train+dev.py script corresponds to the CDR under traing + dev setting.

python scripts/run_cdr.py
python scripts/run_cdr_train+dev.py
python scripts/run_docred.py

Evaluation

For CDR, you can evaluate the results using the evaluation script as follows:

python utils/evaluate_cdr.py --gold ../data/CDR/processed/test.gold --pred ../results/cdr-dev/cdr_basebert_full/test.preds --label 1:CID:2

For DocRED, you can submit the result.json to Codalab.

License

This project is licensed under the GPL License - see the LICENSE file for details.

Citation

If you use this work or code, please kindly cite the following paper:

@inproceedings{GLRE,
 author = {Difeng Wang and Wei Hu and Ermei Cao and Weijian Sun},
 title = {Global-to-Local Neural Networks for Document-Level Relation Extraction},
 booktitle = {EMNLP},
 year = {2020},
}

Contacts

If you have any questions, please feel free to contact Difeng Wang, we will reply it as soon as possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.