Light

lchengit / task2 Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 985 KB

Python 100.00%

task2's Introduction

Task 2 Project

This repo is for task 2 of relation extraction problem

Files in the repo

review.md

literatury review of the RE (done)

Notes

notes for concept and coding

main.py

main entry of code

TO RUN

python main.py

require stanfordcorenlp

pipeline

PDF files from Mendeley database
Sentence splitting
Tokenizing
POS
Name entity recognition
Parsing
Relation extraction So far, step 1 & 2 are finished by previous project. We apply the stanfordcorenlp api to do step 3-6, and focus on the RE. We are using rule-based approaches and manually define the rules. we save all sentences from one PDF file in the database for test, namely 'sen_pdf1.txt' in the repo. Parsing results are transfered as input of step 7 after step 1~6

RE

Negation check

a relation is said to be negated if no node in the candidate relation contains Number.

Effector-effectee detection

effector of the relation: the name entity appearing first in the extracted relation, i.e. with the smaller sentence position
The roles are switched if some form of passive construct is detected

Enumeration resolution

Noun phrase chunks connected to each other by a and, or, nn, det, or dep dependency form an enumeration. If a noun phrase chunk contains more than one protein name, these are considered to describe alternative agents/targets.

Restricting candidate relations to focus domain

The words contained in candidate relations are checked against a set of relation restriction terms.

focus domain corpora will be generated by:
- scanning our database and checking noun frequency
- and check with public corpora in this field
- we can add a filter in NER

Rules:

Negation check
find triple {NN VB CD} in a relation

Problem & TODO

need more rules
corpora
try dependency path
train the parser?

task2's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.