Giter Club home page Giter Club logo

Welcome to Biopet

Introduction

Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framework of the LUMC Sequencing Analysis Support Core team. It contains our main pipelines and some of the command line tools we develop in-house. It is meant to be used in the main SHARK computing cluster. While usage outside of SHARK is technically possible, some adjustments may need to be made in order to do so.

Full documentation is here: Biopet documentation

Quick Start

Running Biopet in the SHARK cluster

Biopet is available as a JAR package in SHARK. The easiest way to start using it is to activate the biopet environment module, which sets useful aliases and environment variables:

$ module load biopet/v0.8.0

With each Biopet release, an accompanying environment module is also released. The latest release is version 0.4.0, thus biopet/v0.4.0 is the module you would want to load.

After loading the module, you can access the biopet package by simply typing biopet:

$ biopet

This will show you a list of tools and pipelines that you can use straight away. You can also execute biopet pipeline to show only available pipelines or biopet tool to show only the tools. What you should be aware of, is that this is actually a shell function that calls java on the system-wide available Biopet JAR file.

$ java -jar <path/to/current/biopet/release.jar>

The actual path will vary from version to version, which is controlled by which module you loaded.

Almost all of the pipelines have a common usage pattern with a similar set of flags, for example:

$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2

The command above will do a dry run of a pipeline using a config file as if the command would be submitted to the SHARK cluster (the -qsub flag) to the BWA parallel environment (the -jobParaEnv BWA flag). We also set the maximum retry of failing jobs to two times (via the -retry 2 flag). Doing a good run is a good idea to ensure that your real run proceeds smoothly. It may not catch all the errors, but if the dry run fails you can be sure that the real run will never succeed.

If the dry run proceeds without problems, you can then do the real run by using the -run flag:

$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2 -run

It is usually a good idea to do the real run using screen or nohup to prevent the job from terminating when you log out of SHARK. In practice, using biopet as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files here. For the specific structure that each pipeline accepts, please consult the respective pipeline page.

Testing

Our code is tested at our local Jenkins installation for every change. We are using a JenkinsFile in our repository to do this.

Contributing to Biopet

Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.5 release.

We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at https://github.com/biopet/biopet, along with our issue tracker.

For more information please go to our Developer documentation

About

Go to the about page

License

See: License

Biopet's Projects

gatk icon gatk

GATK Official Release Repository: contains the core MIT-licensed GATK framework, free for all uses

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.