Giter Club home page Giter Club logo

mtl-bioinformatics-2016's Introduction

READMe

This repository contains the models and supplementary data for the paper A Neural Network Multi-Task Learning Approach to Biomedical Named Entity Recognition by Gamal Crichton, Sampo Pyysalo, Billy Chiu and Anna Korhonen.

The supplementary data can be found in the file Additional file 1.pdf.

The corpora used for the experiments (which can be re-distributed) are in the data folder.
Note: The re-distribution status of the BioCreative IV Chemical and Drug (BC4CHEMD) named entity recognition task corpus is unclear but it can be publicly accessed at http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.

The models can be found in the models folder.

There are several files in the models folder:

  • baseline.py: The MLP model used as a baseline for the experiments.

    Example Usage: python baseline.py 'path/to/dataset' 'path/to/vectorfile'

  • baseline_config.py: The configurable variables and their values for the MLP baseline model (baseline.py).

  • config.py: The configurable variables and their values for the convolutional models.

  • MT-dependent.py: The multi-task Dependent Model.

    Example usage: python MT-dependent.py 'path/to/data-files' 'dataset-1,...,dataset-n' 'path/to/vectorfile'

  • multi-output_MT.py: The multi-output multi-task model.

    Example usage: python multi-output_MT.py 'path/to/data-files' 'dataset-1,...,dataset-n' 'path/to/vectorfile'

  • multi-output_MT-var-dataset.py: The model used in the multi-task experiments which investigated the effect of multi-task learning on datasets of various sizes.
    Specify the percent-keep command to determine how much of the training examples of dataset whose size you wish to vary to randomly keep. This must be the first dataset specified, all other datasets will train with full training data.

    Example usage: python multi-output_MT-var-dataset.py --percent-keep 0.5 'path/to/data-files' 'path/to/reduced-dataset,path/to/whole-dataset' 'path/to/vectorfile'

  • single_task.py: The single task model.

    Example usage: python single_task.py 'path/to/dataset' 'path/to/vectorfile'

Note: The experiments in the paper applied the Viterbi algorithm to the outputs. Use the --viterbi flag to replicate this.

License

The code is provided under MIT license and the other materials under Creative Commons Attribution 4.0.

mtl-bioinformatics-2016's People

Contributors

gamalc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.