Giter Club home page Giter Club logo

bio-interproscanwrapper's Introduction

Bio-InterProScanWrapper

Wrapper around InterProScan

Build Status
License: GPL v3

Contents

Introduction

This is a wrapper around InterProScan. It takes in a FASTA file of proteins, splits them up into smaller chunks, processes them with individual instances of iprscan and then sticks it all back together again. It can run in parallelised mode on a single host or over LSF.

Features

  • Annotates using InterProScan 5.
  • Intermediate files cleaned up as soon as they are finished with.
  • Creates a GFF3 file with the input sequences at the end.

Installation

Bio-InterProScanWrapper has many dependencies, including IPRscan 5. Please look at the Dockerfile if you wish to install it from scratch.

A docker image of Bio-InterProScanWrapper is provided on docker as sangerpathogens/interproscan and this shoud be the preferred way to run it.

Running

As the objective of this wrapper is to run interproscan on compute clusters, therefore, the data directory and the interproscan.properties of the interproscan distribution are not provided with the image. Instead these are soft linked.

Downloading the interproscan data

Use download_db.sh to download the data.

download_db.sh -v <version> -o <output directory>

This will download the data in the subdirectory <output directory>/interproscan-<version>/data.

The output directory can be specified in the environment variable INTERPROSCAN_DATA_DIR.

Setup interproscan.properties

interproscan requires specific setup in file interproscan.properties. A base version can be obtained in the interproscan download.

Download the gene ontology

The gene ontology file go-basic.obo can be downloaded from geneontology.org. Once downloaded, specify its location in the environment variable GO_OBO:

export GO_OBO=/path/to/go-basic.obo

Running the container

The data directory should be mounted as /interproscan/data. The directory containing interproscan.properties should be mounted as /interproscan/config.
To run interproscan in docker:

docker run -v /path/to/config:/interproscan/config -v /path/to/data:/interproscan/data -v <other volume like current dir> -it sangerpathogens/interproscan:<version desired> interproscan.sh

To run farm_interproscan in docker:

docker run -v /path/to/config:/interproscan/config -v /path/to/data:/interproscan/data -v <other volume like current dir> -it sangerpathogens/interproscan:<version desired> farm_interproscan

LSF

LSF executable will need to be provided to the container to use farm_interproscan on lsf.

Usage

Usage: farm_interproscan [options]
Run InterProScan on the farm. It is limited to using 400 CPUs at once on the farm.

# Run InterProScan using LSF
farm_interproscan -a proteins.faa

# Provide an output file name
farm_interproscan -a proteins.faa -o output.gff

# Create 200 jobs at a time, writing out intermediate results to a file
farm_interproscan -a proteins.faa -p 200

# Run on a single host (no LSF). '-p x' needs x*2 CPUs and x*2GB of RAM to be available
farm_interproscan -a proteins.faa --no_lsf -p 10

# Run InterProScan using LSF with GFF input (standard genetic code for translation)
farm_interproscan -a annotation.gff -g

# Run InterProScan using LSF with GFF input (bacterial code for translation)
farm_interproscan -a annotation.gff -g -c 11

# This help message
farm_interproscan -h

Building and unit testing

Building

Bio-InterProScanWrapper is built using dzil:

dzil authordeps | cpanm
dzil listdeps | cpanm
dzil build

Testing

The test can be run with dzil from the top level directory:

dzil test

License

Bio-InterProScanWrapper is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page.

Further Information

[Interpro] (https://www.ebi.ac.uk/interpro/about/interpro/)
[Interproscan wiki] (https://github.com/ebi-pf-team/interproscan/wiki)
geneontology.org

bio-interproscanwrapper's People

Contributors

andrewjpage avatar aslett1 avatar bewt85 avatar lfulcrum avatar oliver-lorenz-dev avatar seretol avatar ssjunnebo avatar vaofford avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.