Giter Club home page Giter Club logo

v-pipe's Introduction

Logo

V-pipe is a workflow designed for analysis of next generation sequencing (NGS) data from viral pathogens. It produces a number of results in a curated format.

Snakemake bio.tools License: Apache-2.0

Quick start

Instructions to type in a shell

  1. Install miniconda3

Linux

To obtain the installer for linux use the following:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Then, install miniconda,

sh Miniconda3-latest-Linux-x86_64.sh

MacOS

To obtain the installer for MacOS, you can download it manually or use wget:

curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

Then, install miniconda,

sh Miniconda3-latest-MacOSX-x86_64.sh
  1. Create conda virtual environment
conda create -n V-pipe -c conda-forge -c bioconda python=3.8 snakemake-minimal=5.14.0
conda activate V-pipe

Make sure to use source activate V-pipe everytime you want to run V-pipe

  1. Get V-pipe
git clone https://github.com/cbg-ethz/V-pipe.git /path/to/V-pipe

Running V-pipe

First, open a terminal and change into the working directory where input files are stored (i.e., the reference and the sequencing reads). We use a two-level directory hierarchy and we expect sequencing reads in a folder name raw_data. To initialize a project,

/path/to/V-pipe/init_project.sh

Before actually running the pipeline, we advise to check whether output files can be created from the inputs, using the --dryrun option.

./vpipe --dryrun

Further details can be found in the wiki pages.

Dependencies

  • conda

    Conda is a cross-platform package management system and an environment manager application.

  • Snakemake

    Snakemake is the central workflow and dependency manager of V-pipe. It determines the order in which individual tools are invoked and checks that programs do not exit unexpectedly.

  • VICUNA

    VICUNA is a de novo assembly software designed for populations with high mutation rates. It is used to build an initial reference for mapping reads with ngshmmalign aligner when a references/cohort_consensus.fasta file is not provided. Further details can be found in the wiki pages.

Computational tools

Other dependencies are managed by using isolated conda environments per rule, and below we list some of the computational tools integrated in V-pipe:

  • PRINSEQ

    Trimming and clipping of reads is performed by PRINSEQ. It is currently the most versatile raw read processor with many customization options.

  • Vicuna

    Vicuna is a de novo assembler designed for generating rough reference contigs of viral NGS data. It can deal with the inherent heterogeneity such as high single-base heterogeneity and structural variants.

  • ngshmmalign

    We perform the alignment of the curated NGS data using our custom ngshmmalign that takes structural variants into account. It produces multiple consensus sequences that include either majority bases or ambiguous bases.

  • bwa

    In order to detect specific cross-contaminations with other probes, the Burrows-Wheeler aligner is used. It quickly yields estimates for foreign genomic material in an experiment.

  • MAFFT

    To standardise multiple samples to the same reference genome (say HXB2 for HIV-1), the multiple sequence aligner MAFFT is employed. The multiple sequence alignment helps in determining regions of low conservation and thus makes standardisation of alignments more robust.

  • Samtools

    The Swiss Army knife of alignment postprocessing and diagnostics.

  • SmallGenomeUtilities

    We perform genomic liftovers to standardised reference genomes using our in-house developed python library of utilities for rewriting alignments.

  • ShoRAH

    ShoRAh performs SNV calling and local haplotype reconstruction by using bayesian clustering.

  • HaploClique and SAVAGE

    We use HaploClique or SAVAGE to perform global haplotype reconstruction for heterogeneous viral populations by using an overlap graph.

Preprint

Posada-Céspedes S., Seifert D., Topolsky I., Metzner K.J., and Beerenwinkel N. V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput sequencing data. doi:10.1101/2020.06.09.142919

NOTE: During the Gibbs sampling performed by ShoRAH, several clusters may generate the same haplotype representative. Such collisions result in inflated posterior values. Also, the averaging of the haplotype abundances across iterations can be affected by floating-point precision problems. Fortunately, ShoRAH also reports the number of reads assigned to each haplotype per iteration which we use to correct the aforementioned quantities in post-processing. We are currently implementing the changes required to resolve these issues in future releases of ShoRAH.

Contributions

Contact

We encourage users to use the issue tracker. For further enquiries, you can also contact the V-pipe Dev Team [email protected].

v-pipe's People

Contributors

sposadac avatar dryak avatar soapza avatar kpj avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.