Giter Club home page Giter Club logo

sia's Introduction

Scalable Interoperable Annotation Server (SIA)

Build Status License

Project description

SIA is an annotation service according to the BioCreative V.5. BeCalm task TIPS. Annotations for mutation mentions are generated using SETH, mirNer, and diseases using a dictionary lookup. Results are returned in JSON according to these definitions.

Citation

To cite SIA, please use the following reference:

@Article{Kirschnick2018,
  title     = {{SIA:} a scalable interoperable annotation server for biomedical named entities},
  author    = {Johannes Kirschnick and Philippe Thomas and Roland Roller and Leonhard Hennig},
  journal   = {Journal of Cheminformatics},
  volume    = {10},
  number    = {1},
  pages     = {63:1--63:7},
  year      = {2018},
  month     = {Dec},
  url       = {https://doi.org/10.1186/s13321-018-0319-2},
  doi       = {10.1186/s13321-018-0319-2}
}

A PDF version of the paper is freely available here

Getting Started

Note

The system uses RabbitMQ to load balance, so make sure it is running locally before starting the application, refer to how to install RabbitMQ for help.

If you want to skip the RabbitMQ installation, for convenience, you can just start it via maven (this might not work on your machine)

./mvnw rabbitmq:start

Check http://localhost:15672/ for the management interface, default login: guest/guest

And issue the following to tear down RabbitMQ afterwards

./mvnw rabbitmq:stop

To start the system in development mode issue

./mvnw spring-boot:run

This starts the backend without submitting results to the tips server, instead results are printed to the console. The server is listening on port 8080 by default.

getAnnotation

Issue the following curl request to trigger a new annotation request with a sample payload

curl -vX POST http://localhost:8080/call -d @src/test/resources/samplepayloadGetannotations.json --header "Content-Type: application/json"

and watch the console for results.

getStatus

To trigger a get status report, use the following curl request

curl -vX POST http://localhost:8080/call -d @src/test/resources/sampleplayloadGetStatus.json --header "Content-Type: application/json"

Adding custom annotators

To extend SIA for additional Named Entity Recognition tools you have to:

Consult the examples in the corresponding package for implementation details. Afterwards, for correct message routing, it is necessary to define the input channel. Input channels can be freely named, but we recommend to use the name of the annotator. For example:

@Transformer(inputChannel = "yourAnnotator")

This annotation placed on the annotator defines that inputs are coming from the yourAnnotator channel. Internally channels are mapped to queues automatically.

  • Add your annotator as recipient in FlowHandler and define the set of PredictionType your annotator responds to accordingly.

For example:

.recipientMessageSelector("yourAnnotator", message -> headerContains(message, CHEMICAL) && enabledAnnotators.yourAnnotator)

Here the yourAnnotator has to match the transformer inputChannel definition. And defines that all requests that need to be tagged with CHEMICAL will be send to the yourAnnotator channel. headerContains(message, CHEMICAL) is a helper method to check if in the header a field called types contains the enum CHEMICAL. The header is automatically populated from the request message containing the annotator types requested.

  • Furthermore enabledAnnotators is an injected configuration bean which allows to specify which annotators to enable.

Simply add a new boolean property with yourAnnotator to the class allows to control which annotators to enable. Check application.properties.

Available Annotators

BannerNER

BANNER is a named entity recognition system, primarily intended for biomedical text.

http://banner.sourceforge.net/

DiseasesNER

DiseasesNER is using a large dictionary of desease mentiones.

Linnaeus

Species name recognition and normalization software.

http://linnaeus.sourceforge.net/

MirNER

mirNer is a simple regex based tool to detect MicroRna mentions in text, following the mi-RNA definition of Victor Ambroset al., (2003). A uniform system for microRNA annotation. RNA 2003 9(3):277-279.

https://github.com/Erechtheus/mirNer

SETH

SNP Extraction Tool for Human Variations.

SETH is a software that performs named entity recognition (NER) of genetic variants (with an emphasis on single nucleotide polymorphisms (SNPs) and other short sequence variations) from natural language texts.

https://rockt.github.io/SETH/

ChemSpot (external)

ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities.

https://www.informatik.hu-berlin.de/de/forschung/gebiete/wbi/resources/chemspot/chemspot

DNorm (external)

DNorm is an automated method for determining which diseases are mentioned in biomedical text, the task of disease normalization. Diseases have a central role in many lines of biomedical research, making this task important for many lines of inquiry, including etiology (e.g. gene-disease relationships) and clinical aspects (e.g. diagnosis, prevention, and treatment).

https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/DNorm.html

External annotators

DNorm and ChemSpot are integrated out of process. This means that you need to start the annotators before you can use them. Communication is handled via a dedicated queue for each handler respectively.

  • Start DNorm

    ./mvnw -f tools/dnorm/pom.xml -DskipTests package
    java -Xmx8g -jar tools/dnorm/target/dnorm-0.0.1-SNAPSHOT.jar
    
  • Start ChemSpot

    ./mvnw -f tools/chemspot/pom.xml package
    java -Xmx16g -jar tools/chemspot/target/chemspot-0.0.1-SNAPSHOT.jar
    

Tagging PubMed Dumps

You can simply tag pubmed articles from ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ by putting them into the directory tools/pubmedcache.

Configure the annotators to use by creating an application.properties file in the current directory and add the annotators you want to use. Then start any external annotators that you want to use.

If you don't customize the annotators, the following default configuration is applied:

sia.annotators.banner=false
sia.annotators.diseaseNer=false
sia.annotators.mirNer=false
sia.annotators.linnaeus=false
sia.annotators.seth=true

# external
sia.annotators.dnorm=false
sia.annotators.chemspot=false

Finally start the SiaPubmedAnnotator class with the driver and backend profile enabled. The driver profile ensures that output is collected into the directory annotated, while the backend profile ensures that the internal annotators are started as well.

./mvnw -DskipTests package
java -cp target/sia-0.0.1-SNAPSHOT.jar \
     -Dloader.main=de.dfki.nlp.SiaPubmedAnnotator \
     org.springframework.boot.loader.PropertiesLauncher \
     --spring.profiles.active=backend,driver

Example output

$ ls -lh annotated
1.0K Jun 28 23:15 annotation-results_2018-06-28_11-15-07.json 
$ head annotated/a*
{"predictionResults":[{"document_id":"10022392","section":"A","init":1085,"end":1090,"score":1.0,"annotated_text":"T337A","type":"MUTATION"} ....

sia's People

Contributors

erechtheus avatar jkirsch avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.