Giter Club home page Giter Club logo

localizer's Introduction

What is LOCALIZER?

LOCALIZER is a machine learning method for predicting the subcellular localization of both plant proteins and pathogen effectors in the plant cell. It can currently predict localization to chloroplasts and mitochondria using transit peptide prediction and to nuclei using a collection of nuclear localization signals (NLSs).

You can submit your proteins of interest to the webserver at http://localizer.csiro.au/ or install it locally. All training and evaluation data can be found here.

Installing LOCALIZER

LOCALIZER has been written in Python and uses pepstats from the EMBOSS software and the WEKA 3.6 software. It also requires that you have Perl and BioPython installed. LOCALIZER from version 1.0.5 inclusive uses Python 3.

To get LOCALIZER to work on your local machine, you need to install the EMBOSS and WEKA softwares from source. Both are already provided in the LOCALIZER distribution to ensure that compatible versions are used.

  1. Download the latest release from this github repo (or alternatively you can clone the github repo and skip step 1).

  2. Make sure LOCALIZER has the permission to execute. Then unpack LOCALIZER in your desired location

tar xvf LOCALIZER-1.0.5.tar.gz
chmod -R 755 LOCALIZER-1.0.5/
cd LOCALIZER-1.0.5
  1. For the EMBOSS installation, you need to switch to the Scripts directory and unpack, configure and make. Alternatively, if you are on a computer cluster and EMBOSS is already installed, you can change the variable PEPSTATS_PATH in the LOCALIZER.py script to the EMBOSS directory that contains pepstats on the machine you are using.
cd Scripts
tar xvf emboss-latest.tar.gz
cd EMBOSS-6.5.7/
./configure
make
cd ../ 
  1. For WEKA, you need to simply unzip the file weka-3-6-12.zip
unzip weka-3-6-12.zip

If you are having troube installing EMBOSS, please see here for help. If you are having troube installing WEKA, please see here for help.

  1. Test if LOCALIZER is working
python LOCALIZER.py -e -i Effector_Testing.fasta
  1. Problems?

If you are getting an error message like 'ImportError: No module named Bio', you need to install BioPython on your computer. See here for help. For example, you can try and run:

pip install biopython

Note also that you need PERL to be installed on your computer for running NLStradamus.

Running LOCALIZER on plant data

For plant protein localization prediction, submit full-length sequences and run it in 'plant mode' (option -p). Do not submit short sequence fragments to LOCALIZER, it expects the full protein sequences.

python LOCALIZER.py -p -i Plant_Testing.fasta

LOCALIZER will then search for transit peptides in the N-terminus and for nuclear localization signals in the sequence.

Running LOCALIZER on effector data

For effector protein localization prediction, submit full-length sequences and run it in 'effector mode' (option -e). Do not submit short sequence fragments to LOCALIZER, it expects the full protein sequences.

It is recommended to use tools such as SignalP or Phobius to predict first if a protein is likely to be secreted and to obtain the mature sequences without the signal peptide. Alternatively, provide full sequences and let LOCALIZER delete the first 20 aas as the putative signal peptide region.

python LOCALIZER.py -e -i Effector_Testing.fasta

You can set how LOCALIZER treats the signal peptide region with these options:

    -M      : in effector mode, do not remove the signal peptide. Use this if you are providing mature effector sequences.
    -S <x>  : in effector mode, remove the signal peptide by deleting the first x aas (default: 20).

LOCALIZER output format

Run this to get a feel for the output format:

python LOCALIZER.py -e -i Effector_Testing.fasta

# -----------------
# LOCALIZER 1.0.5 Predictions (-e mode)
# -----------------
Identifier      Chloroplast             Mitochondria            Nucleus
CRN15           -                       -                       Y (KRKR)
Ecp2            -                       -                       -
AVR-Pii         -                       -                       -
ToxA            Y (0.877 | 62-130)      -                       -
--------------------------------------
--------------------------------------
# Proteins analyzed: 4 from file: Effector_Testing.fasta

# Number of proteins with cTP: 1 (25.0%)
# Number of proteins with cTP & possible mTP: 0 (0.0%)
# Number of proteins with cTP & NLS: 0 (0.0%)
# Number of proteins with cTP & possible mTP & NLS: 0 (0.0%)
# Number of proteins with mTP: 0 (0.0%)
# Number of proteins with mTP & possible cTP: 0 (0.0%)
# Number of proteins with mTP & NLS: 0 (0.0%)
# Number of proteins with mTP & possible cTP & NLS: 0 (0.0%)
# Number of proteins with NLS and no transit peptides: 1 (25.0%)
--------------------------------------
--------------------------------------
# Summary statistics

# Number of proteins with chloroplast localization (cTP, cTP & possible mTP, cTP & NLS, cTP & possible mTP & NLS): 1 (25.0%)
# Number of proteins with mitochondrial localization (mTP, mTP & possible cTP, mTP & NLS, mTP & possible cTP & NLS): 0 (0.0%)
# Number of proteins with nuclear localization and no transit peptides: 1 (25.0%)
# Number of proteins with nuclear localization and with transit peptides: 0 (0.0%)
--------------------------------------
--------------------------------------

LOCALIZER will return the output as shown in the example above. First, a summary table will be shown which shows the predictions (chloroplast, mitochondria or nucleus) for each submitted protein. If a transit peptide is predicted, the start and end positions in the submitted sequences are shown, alongside the probability. In this example, ToxA has a predicted chloroplast transit peptide with probability 0.885 at position 62-130 in its sequence. LOCALIZER does not return a probability for nucleus localization, because it uses a simple NLS search. In this example, LOCALIZER found a NLS in CRN15, i.e. the sequence KRKR.

In the summary statistic, we count LOCALIZER predictions that are 'chloroplast', 'chloroplast and possible mitochondrial', 'chloroplast and nucleus' and 'chloroplast & possible mitochondrial and nucleus' as chloroplast predictions (same strategy for mitochondrial predictions). A protein that carries a predicted transit peptide with an additional predicted NLS might have experimental evidence only for one of those locations due to the technical hurdles of recognizing dual targeting and should thus not necessarily be counted as a false positive prediction. However, in the LOCALIZER paper, a protein was counted as a nucleus prediction only if it has the category 'nucleus' to avoid assigning a protein to multiple predictions in the evaluation and this is what we recommend.

Citation for LOCALIZER:

Sperschneider, J., Catanzariti, A., DeBoer, K. et al. LOCALIZER: subcellular localization prediction of both plant and effector proteins in the plant cell. Sci Rep 7, 44598 (2017) doi:10.1038/srep44598

localizer's People

Contributors

janasperschneider avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.