Giter Club home page Giter Club logo

minhlab / ixa-pipe-ned Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ixa-ehu/ixa-pipe-ned

0.0 2.0 0.0 297 KB

This repository contains the Named Entity Disambiguation tool based on DBpedia Spotlight. Providing that a DBpedia Spotlight Rest server is running, the EHU-ned module will take KAF as input (containing <entities> elements) and perform Named Entity Disambiguation for your language of choice. Developed by IXA NLP Group (ixa.si.ehu.es).

ixa-pipe-ned's Introduction

ixa-pipe-ned

This repository contains the Named Entity Disambiguation tool based on DBpedia Spotlight. Providing that a DBpedia Spotlight Rest server for a given language is running, the ixa-pipe-ned module will take NAF or KAF as input (containing elements) and perform Named Entity Disambiguation for your language of choice.

Developed by IXA NLP Group (ixa.si.ehu.es) for the 7th Framework OpeNER and NewsReader European projects.

Contents

The contents of the repository are the following:

+ src/ source files of ixa-pipe-ned
+ pom.xml 
+ pom-naf.xml
+ README.md: This README

Installation Procedure

In a snapshot:

  1. Install dbpedia-spotlight
  2. Compile ixa-pipe-ned module with mvn clean package
  3. Start dbpedia-spotlight server
  4. cat ner.naf | ixa-pipe-ned/target/ixa-pipe-ned-1.0.jar -p $PORT_NUMBER

If you already have installed in your machine JDK7 and MAVEN 3, please go to step 3 directly. Otherwise, follow the detailed steps:

1. Install JDK 1.7

If you do not install JDK 1.7 in a default location, you will probably need to configure the PATH in .bashrc or .bash_profile:

export JAVA_HOME=/yourpath/local/java17
export PATH=${JAVA_HOME}/bin:${PATH}

If you use tcsh you will need to specify it in your .login as follows:

setenv JAVA_HOME /usr/java/java17
setenv PATH ${JAVA_HOME}/bin:${PATH}

If you re-login into your shell and run the command

java -version

You should now see that your jdk is 1.7

2. Install MAVEN 3

Download MAVEN 3 from

wget http://ftp.udc.es/apache/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz

Now you need to configure the PATH. For Bash Shell:

export MAVEN_HOME=/home/ragerri/local/apache-maven-3.0.4
export PATH=${MAVEN_HOME}/bin:${PATH}

For tcsh shell:

setenv MAVEN3_HOME ~/local/apache-maven-3.0.4
setenv PATH ${MAVEN3}/bin:{PATH}

If you re-login into your shell and run the command

mvn -version

You should see reference to the MAVEN version you have just installed plus the JDK 7 that is using.

3. Download statistical backend - dbpedia spotlight

Downloand from http://spotlight.sztaki.hu/downloads/

  • dbpedia-spotlight-0.7.jar
  • German model: de.tar.gz
  • English model: en_2+2.tar.gz
  • Spanish model: es.tar.gz
  • French model: fr.tar.gz
  • Italian model: it.tar.gz
  • Dutch model: nl.tar.gz

Decompressed the language models

  • tar xvf $lang.tar.gz

Install dbpedia-spotlight

  • go to the directory the dbpedia-spotlight.jar is located
  • execute: mvn install:install-file -Dfile=dbpedia-spotlight-0.7.jar -DgroupId=ixa -DartifactId=dbpedia-spotlight -Dversion=0.7 -Dpackaging=jar -DgeneratePom=true This command will install dbpedia-spotlight jar as a local maven repository

Start the application

  • java -jar dbpedia-spotlight-0.7.jar $lang http://localhost:$port/rest

    note: When working with English, the $lang variable refers to en_2+2

4. Download the ixa-pipe-ned repository

git clone [email protected]:ixa-ehu/ixa-pipe-ned.git

5. Install ixa-pipe-ned

Install the ixa-pipe-ned module

mvn clean package

This command will create a ixa-pipe-ned/target directory containing the ixa-pipe-ned-1.0.jar binary with all dependencies included.

6. ixa-pipe-ned USAGE

The ixa-pipe-ned-1.0.jar requires a NAF document containing elements as standard input and provides Named Entity Disambiguation as standard output. It also requires the port number as argument. The port numbers assigned to each language are the following:

- de: 2010
- en: 2020
- es: 2030
- fr: 2040
- it: 2050
- nl: 2060

**Once you have a DBpedia Spotlight Rest server running you can send queries to it via the ixa-pipe-ned module as follows:

cat ner.naf | java -jar ixa-pipe-ned-1.0.jar -p $PORT_NUMBER

** The default option chooses one dbpedia-entry for each entity. It is also possible to return a ranked list of candidates for each entity:

cat ner.naf | java -jar ixa-pipe-ned-1.0.jar -p $PORT_NUMBER -e candidates

7. ixa-pipe-ned SPECIAL USAGE

When the language is other than English, the module offers an additional feature. It is possible to set the corresponding English entry. To execute this option:

cat ner.naf | java -jar ixa-pipe-ned-1.0.jar -p $PORT_NUMBER -i $INDEX -n $NAME

$INDEX is the path of the 'database' created by MapDB (http://www.mapdb.org/)
$NAME is the name of the HashMap that the module uses

So far, you can download wikipedia-db.tar.gz package, which contains the required resources for Spanish. In this particular distribution, the $INDEX is 'wikipedia-db' and the $NAME is 'esEn':

cat ner.naf | java -jar ixa-pipe-ned-1.0.jar -p 2030 -i wikipedia-db -n esEn

It is possible to use new Maps, if they are created using MapDB. The two elements of the HashMap have to be Strings, delimited by a tab.

For more options running ixa-pipe-ned

java -jar ixa-pipe-ned-1.0.jar -h

Contact information

Rodrigo Agerri and Itziar Aldabe
{rodrigo.agerri,itziar.aldabe}@ehu.es
IXA NLP Group
University of the Basque Country (UPV/EHU)
E-20018 Donostia-San Sebastián

ixa-pipe-ned's People

Contributors

ialdabe avatar ragerri avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.