Giter Club home page Giter Club logo

information_retrieval's Introduction

Information_Retrieval

Directly extract relation triples from raw text. Running on Linux.

Example

Demo

We prepared a demo website. You can input some sentences to get relation triples.

Simple Usage

python triple.py

or

import triple
triple.triple("Mr. Scheider played the police chief of a resort town menaced by a shark.")
# output: triples: [['Scheider', 'per:title', 'police chief', 11.785091400146484]]

Prerequisites

Linux Shell

Java 1.8+ (Check with command: java -version) (Download Page)

Python 3

CUDA >= 9.0 (Check with command: nvcc --version)(Download Page)

Installation

1. Install OpenNRE

Clone the repository from OpenNRE github page:

git clone https://github.com/thunlp/OpenNRE.git --depth 1

Copy modified frame into OpenNRE:

cp sentence_re.py OpenNRE/opennre/framework/sentence_re.py

Then install OpenNRE:

cd OpenNRE
pip install -r requirements.txt
python setup.py install 
cd ..

Download Pretrained file:

cd pretrain
bash download_bert.sh
cd ..

2. Install Stanfordcorenlp and NLTK

Install using pip:

pip install stanfordcorenlp

Download and upzip Stanfordcore:

wget -c http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
upzip stanford-corenlp-full-2018-02-27.zip
rm stanford-corenlp-full-2018-02-27.zip

Then install NLTK:

pip install nltk

Download tokenizer:

python
import nltk
nltk.download('punkt')
exit()

3. Data Prepare

Due to copyright of the Tacred data, we are not able to share the data or our distant supervision based crawled data with you, please download it by yourself.

Get Stanford Tacred Data(https://nlp.stanford.edu/projects/tacred/) and put train.json, dev.json, test.josn under benchmark/tacred, then processing data:

cd ./benchmark/tacred
python data.py
cd ../../

If you want to crawl distant supervised data by Google search engine, please do as following steps:

1). Use Google Search API to crawl articles based on distant supervision. Please modify the api key and filename in google_crawling.py.

python ./Google_Crawler/google_crawling.py

2).Split the target sentences from the articles. Please modify the filename in processing.py.

python ./Google_Crawler/processing.py

Train model(For Training Only)

If you want to train your own model, follow this step.

Run python file to start training:

python train_tacred_bert_softmax.py

Please modify batch size in line 44 in order to match the graphic memory on your machine.

Usage

Download pretrained model file:

mkdir ckpt
cd ckpt
wget -c https://www.dropbox.com/s/7f70dy2vatmmly4/tacred_bert_softmax.pth.tar
cd ..

Then, simply run

python triple.py

to get relation triples from raw sentence.

To run on a server:

python server.py --port 12345

Change the port and send GET request to server to get result like:

localhost:12345?content=Mr. Scheider played the police chief of a resort town menaced by a shark.

Performance

model f1-score
BERT-OpenNRE(Origin TARED Data) 0.809
BERT-OpenNRE(Crawled Data based on distant supervision) 0.72
BERT-OpenNRE(Origin TARED Data + Crawled Data based on distant supervision) 0.815

information_retrieval's People

Contributors

lintonylin avatar maggiem1n avatar www49195 avatar

Stargazers

Aiah avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.