Giter Club home page Giter Club logo

bsfpython's Introduction

BSF Python Library

Introduction

The Biomedical Sequencing Facility (BSF) is part of the joint Genomics Core Facility of the Medical University of Vienna (MUW) and the Research Center for Molecular Medicine (CeMM) of the Austrian Academy of Sciences (OeAW). The BSF is Austria’s first technology platform dedicated to next-generation sequencing (NGS) in biomedicine and expected to play a catalyzing role for the development of genomic medicine in Vienna and Austria.

This Python library and the accompanying scripts are used for day-to-day analysis of next-generation sequencing data sets. The library consists of two main functions.

The Analysis class and its subclasses implement the logic required for submitting processes on a cluster login node.

The Runnable class and its subclasses implements the logic required to run processes on a cluster compute node via the common bsf_run_runnable script.

BSF Python General Configuration File

General settings for the BSF Python library are configured via a ${HOME}/.bsfpython.ini file in the user's home directory. This file is site-specific and its information allows for automatic discovery of raw (e.g., Illumina run folders) and pre-processed (e.g., de-multiplexed lanes and samples) NGS data. A template file template_bsfpython.ini can be found in the doc subdirectory. The template, which documents the configuration options and provides, as far as possible, sensible default settings, needs copying to ${HOME}/.bsfpython.ini before editing accordingly.

Analysis

The BSF Analysis is central to the BSF pipeline infrastructure. It encapsulates both, logic and data for a multistep analysis procedure. Specific Analysis objects are available, tailored to recurring tasks.

Analysis Configuration File

BSF Analysis objects are initialised and configured via INI configuration files. Templates for these files are again provided in the doc subdirectory, document configuration options and provide, as far as possible, sensible default settings. Generally, only few configuration options need filling in. Most importantly, the location of sample annotation sheets and, depending on the analysis type, sample comparison sheets, need to be specified.

Sample Annotation Sheet

A sample annotation sheet specifies the file system location of NGS reads. For data pre-processed via Illumina CASAVA, a hierarchy of run folders, projects, samples, and paired reads can be automatically discovered. Additional reads can be linked into the system by specifying the exact file system path. Sample annotation sheets also provide grouping of samples that is available to the analysis.

- Type (CASAVA or External)
- ProcessRunFolder
- Project
- Sample
- Reads1
- File1
- Reads2
- File2
- Group

Analyses

ChIPSeq

The ChIPSeq analysis aligns each BSF Sample object to the genome sequence via BWA. Regions of interest are then defined by means of the MACS2 peak caller.

In the context of the ChIPSeq analysis, BSF Paired Reads objects of BSF Sample objects are aligned as a pool.

RNAseq DESeq

Please see the RNAseq DESeq document.

RNASeq Tuxedo

The RNA-Seq pipeline is based on the Tuxedo suite. NGS reads are aligned with Tophat2 an aligner that implements a splice site model and uses a reference transcriptome as the base. it is based on the Bowtie2 short read aligner.

In the context of the Tuxedo analysis, BSF Sample objects are aligned and assembled into transcriptomes individually. According to a group_replicates configuration option, each BSF Paired Reads object of a BSF Sample object can be processed individually by TopHat and Cufflinks or pooled before alignment. The resulting transcriptome assemblies for each BSF Sample resulting from each BSF Paired Reads object are then merged via Cuffmerge. Cuffdiff then compares the merged assemblies on the basis of the BAM alignments produced by Tophat2.

Genetic Variant Calling

Dependencies

ChIP-seq

References

Licence

Copyright 2013 - 2020 Michael K. Schuster

Biomedical Sequencing Facility (BSF), part of the joint Genomics Core Facility of the Medical University of Vienna (MUW) and the Research Center for Molecular Medicine (CeMM) of the Austrian Academy of Sciences (OeAW).

This file is part of BSF Python.

BSF Python is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

BSF Python is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with BSF Python. If not, see http://www.gnu.org/licenses/.

bsfpython's People

Contributors

mkschuster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

aschoenegger mh11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.