Giter Club home page Giter Club logo

task2's Introduction

Task 2 Project

This repo is for task 2 of relation extraction problem

Files in the repo

review.md

literatury review of the RE (done)

Notes

notes for concept and coding

main.py

main entry of code

TO RUN

python main.py

require stanfordcorenlp

pipeline

  1. PDF files from Mendeley database
  2. Sentence splitting
  3. Tokenizing
  4. POS
  5. Name entity recognition
  6. Parsing
  7. Relation extraction So far, step 1 & 2 are finished by previous project. We apply the stanfordcorenlp api to do step 3-6, and focus on the RE. We are using rule-based approaches and manually define the rules. we save all sentences from one PDF file in the database for test, namely 'sen_pdf1.txt' in the repo. Parsing results are transfered as input of step 7 after step 1~6

RE

Negation check

a relation is said to be negated if no node in the candidate relation contains Number.

Effector-effectee detection

  • effector of the relation: the name entity appearing first in the extracted relation, i.e. with the smaller sentence position
  • The roles are switched if some form of passive construct is detected

Enumeration resolution

Noun phrase chunks connected to each other by a and, or, nn, det, or dep dependency form an enumeration. If a noun phrase chunk contains more than one protein name, these are considered to describe alternative agents/targets.

Restricting candidate relations to focus domain

The words contained in candidate relations are checked against a set of relation restriction terms.

  • focus domain corpora will be generated by:
    • scanning our database and checking noun frequency
    • and check with public corpora in this field
    • we can add a filter in NER

Rules:

  • Negation check
  • find triple {NN VB CD} in a relation

Problem & TODO

  1. need more rules
  2. corpora
  3. try dependency path
  4. train the parser?

task2's People

Contributors

lchengit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.