StrainEst

StrainEst is a novel, reference-based method that uses the Single Nucleotide Variants (SNV) profiles of the available genomes of selected species to determine the number and identity of coexisting strains and their relative abundances in mixed metagenomic samples.

Using Docker (that is, on MS Windows, Mac OS X and Linux!)

The easiest way to run StrainEst is through Docker. Docker works similarly to a virtual machine image, providing a container in which all the software has already been installed, configured and tested.

Install Docker for Linux, Mac OS X or Windows.
Download the latest version of StrainEst:
```
docker pull compmetagen/strainest
```
Run an instance of the image, mounting the host working directory (e.g. /Users/davide/strainest) on to the container working directory /strainest:
```
docker run --rm -t -i -v /Users/davide/strainest:/strainest -w /strainest compmetagen/strainest /bin/bash
```
You need to write something like -v //c/Users/davide/strainest:/strainest if you are in Windows or -v /home/davide/micca:/micca in Linux. The --rm option automatically removes the container when it exits.

Now you can use strainest:

root@68f6784e1101:/micca# strainest --help

Sickle, Bowtie2 and samtools are preinstalled in the Docker image.

Install from sources on Ubuntu >= 12.04 and Debian >=7

We suggest to install the following packages through the package manager:

sudo apt-get update
sudo apt-get install build-essential \
    pkg-config \
    python2.7 \
    python-dev \
    python-pip \
    python-numpy \
    python-scipy \
    python-matplotlib \
    gcc \
    gfortran \
    libblas-dev \
    liblapack-dev \
    libfreetype6 libfreetype6-dev \
    libpng-dev \
    liblzma-dev \
    libbz2-dev

Then, upgrade pip and install the following packages:

sudo pip install --upgrade pip
pip install 'Click>=5.1' 'pandas' 'pysam>=0.12' 'scikit-learn>=0.16.1,<0.20' 'biopython>=1.50'

Download the latest version from https://github.com/compmetagen/strainest/releases and complete the installation:

tar -zxvf strainest-X.Y.Z.tar.gz
cd strainest-X.Y.Z
sudo python setup.py install

Usage

Predict strain profiles

This tutorial requires Sickle (https://github.com/najoshi/sickle), Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and samtools (http://samtools.sourceforge.net/) to be installed in your system.

Download the example data (Illumina paired-end reads):

wget ftp://ftp.fmach.it/metagenomics/strainest/example/reads.tar.gz
tar zxvf reads.tar.gz

Now the raw reads will be quality trimmed (e.g. using sickle):

sickle pe -f reads1.fastq -r reads2.fastq -t sanger -o \
    reads1.trim.fastq -p reads2.trim.fastq -s reads.singles.fastq -q 20

Given the species of interest (e.g. P. acnes), download and untar the precomputed Bowtie reference database available at ftp://ftp.fmach.it/metagenomics/strainest/ref/ (e.g. pacnes.tar.gz):

wget ftp://ftp.fmach.it/metagenomics/strainest/ref/pacnes.tar.gz
tar zxvf pacnes.tar.gz

The Bowtie2 database is available in the P_acnes/bowtie directory. At this point we can align the metagenome against the database:

bowtie2 --very-fast --no-unal -x P_acnes/bowtie/align -1 reads1.trim.fastq \
    -2 reads2.trim.fastq -S reads.sam

Now we can sort and index the BAM file:

samtools view -b reads.sam > reads.bam
samtools sort reads.bam -o reads.sorted.bam
samtools index reads.sorted.bam

Finally, run the strainest est command to predict the strain abundances:

strainest est P_acnes/snp_clust.dgrp reads.sorted.bam outputdir

In the output directory we can find:

abund.txt: the predicted abundances for each reference genome;
max_ident.txt: for each reference genome, the percentage of alleles that are present in the metagenome;
info.txt: information about the prediction, including the prediction Pearson R;
counts.txt: number of counts for each SNV position/base pairs;
mse.pdf: Lasso cross-validation plot as a function of the shrinkage coefficient.

(Optional) Build a custom reference SNV profile

For a detailed description of the methods to build a custom reference database see the Methods section of the paper.

Reference databases for a number of species are available at ftp://ftp.fmach.it/metagenomics/strainest/ref2/

Each tar.gz archive contains the following files:

assembly_list.txt: tab separated table containing a list of the sequences that we have used with basic info. The column “Rep” tells you if it is one of the representative strains and the column ClustID tells you in which cluster it is.
log.txt: basic info on the clustering procedure, i.e number of clusters, number of SNVs and % of core genome
map_align.fasta: contains the reference sequences for read alignment. Use this file to build the bowtie or bwa database. For all species we used the default, i.e. 10 assemblies (if available)
snv.txt: contains the snv matrix for the "strainest est” command.

The list of species for which pre-built databases are available is given below, with the number of genomes used to build the database, the number of representative SNV profiles, the number of SNVs and the percentage of the reference genome that is common to all genomes (core)

Species	# of genomes	# of repr	# of SNVs	% of core
Bacteroides dorei	13	10	35506	46.580
Bacteroides eggerthii	4	3	24686	70.537
Bacteroides fragilis	124	49	176883	21.561
Bacteroides intestinalis	6	4	205290	27.325
Bacteroides ovatus	17	11	220623	46.294
Bacteroides thetaiotaomicron	17	14	370873	49.412
Bacteroides uniformis	23	16	97560	48.607
Bacteroides vulgatus	17	16	32365	40.987
Bifidobacterium adolescentis	26	24	43230	52.126
Bifidobacterium bifidum	36	24	41129	61.021
Bifidobacterium catenulatum	5	2	25621	83.984
Bifidobacterium longum	114	84	76932	34.662
Bifidobacterium pseudocatenulatum	17	13	74842	66.678
Clostridium clostridioforme	16	9	233709	43.286
Collinsella aerofaciens	5	5	184750	66.271
Dorea longicatena	12	9	104873	32.200
Escherichia coli	10321	433	124403	10.071
Eubacterium siraeum	5	4	90669	75.687
Parabacteroides distasonis	17	11	117483	58.044
Staphylococcus epidermidis	512	85	34349	19.165
Streptococcus salivarius	45	32	246121	58.651

zpnew / strainest Goto Github PK

strainest's Introduction

StrainEst

Using Docker (that is, on MS Windows, Mac OS X and Linux!)

Install from sources on Ubuntu >= 12.04 and Debian >=7

Usage

Predict strain profiles

(Optional) Build a custom reference SNV profile

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent