Giter Club home page Giter Club logo

metarna-seq's Introduction

metaRNA-seq

Custom scripts for creating, annotating, and determining differential gene expression in multispecies bacterial metagenomes

Copyright (c) 2013 Peter A. Jorth, Keith H. Turner, Pinar Gumus, Nejat Nizam, Nurcan Buduneli, Marvin Whiteley

The scripts contained herein can be used to assemble .gff and .fna files representing a multispecies bacterial "metaorganism" for bioinformatic analyses. Brief descriptions of the scripts follow, but also see the scripts themselves and the usage instructions on the command line for information about how to use them. Please direct any questions on their use or construction to Keith H. Turner (khturner at utexas.edu).

GenomeMerge.pl

This program takes a tab-separated list of bacterial genome directory names (find at ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/) and the associated accession numbers and joins the .fna and the .gff files to make new .fna and .gff files specifying one "metaorganism" encompassing all of them.

usage: ./GenomeMerge.pl (genomes).txt (out_pfx)

Arguments:

(genomes).txt - A tab-separated list of directory names and accession numbers

(out_pfx) - The prefix you want used for the output files

Dependencies:

-Net::FTP (CPAN)

PullEC.pl

This program takes a GFF file from GenomeMerge.pl on STDIN, searches http://www.kegg.jp for the locus_tag and returns a new GFF file with EC and Kegg pathway information on STDOUT.

usage: ./PullEC.pl < (in).gff > (out.gff)

Dependencies:

-wget (Unix)

HOMDpull.sh

This script takes a .gbk file from a dynamically-annotated genome on the Human Oral Microbiome Database (http://www.homd.org/index.php?&name=seqDownload&type=G) with prefix (prefix), generates a .fna file of the DNA sequence and a .gff file with EC number (obtained from the associated _swisstopmatch.xls file) and KEGG pathway annotations (using the same prefix provided).

usage: ./HOMDpull.sh (prefix)

Requires: (prefix).gbk and (prefix)_swisstopmatch.xls from HOMD

Dependencies:

-bp_seqconvert.pl (BioPerl)

-HOMD_GenomeMerge.pl (this package)

HOMD_GenomeMerge.pl

This is a suite of utilities used in a pipeline to merge genomes from the Human Oral Microbiome Database (http://www.homd.org).

Usage: ./HOMD_GenomeMerge.pl (command) [arg]

(command) is a required argument that has one of the following values:

GFF_Cleaner - takes a GFF file generated from a GenBank file from the HOMD by bp_genbank2gff on STDIN and parses and relabels the 9th field to match the GFF file notation as in GenBank proper on STDOUT.

Get_EC - takes an (ID, SwissProt Accession No.) pair on STDIN, searches UniProt for the accession number and returns (ID, EC number) on STDOUT (provide genome prefix at [arg]).

EC_Pathways - takes an (ID, EC number) pair on STDIN, searches http://www.kegg.jp for the EC number and returns (ID, EC number, Kegg pathway information) on STDOUT (provide genome prefix at [arg]).

Tag_GFF - reads (ID, EC number, Kegg pathway information) from file [arg] and prints (EC number, Kegg pathway information) on the correct line of the GFF file read from STDIN on STDOUT.

GFF_Sort - takes a GFF file generated by our HOMD annotation pipeline on STDIN, sorts it by (contig)-(position) and returns it on STDOUT.

Dependencies:

-URI::Escape (CPAN)

-wget (Unix)

ECcounter.pl

This is a perl script that takes a tab-delimited table with EC numbers in the first column and read counts in the following 6 columns. The table needs to be sorted by EC number. The script then scans through each row in the table and sums the total read count for each unique EC number for each replicate.

Usage: ./ECcounter.pl (ECcount_file)

MapCount_RNASeq.sh

This UNIX shell script uses Bowtie 2.0 to map metaRNA-seq reads to a metagenome, and counts the number of reads mapping to each feature in the metagenome using HTSeq.

Usage: ./MapCount_RNASeq.sh (in_file) (out_pfx) (assembly) (threads(x))

in_file Path of the trimmed input fastq file.

out_pfx Desired prefix of output files.

assembly Prefix for assembly.

threads 1 = 1 processing thread; 2 = 2 processing threads, etc.

Requires: a fastq formatted in_file, a metagenome fasta file indexed by Bowtie 2.0, and a metagenome gff annotation file.

The following variable needs to be defined in the user .profile for proper execution:

METARNASEQDIR=/path/to/metagenome

where metagenome is the directory containing the gff file and the Bowtie indices.

Dependencies:

-Bowtie 2.0 (http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.1.0/)

-htseq-count (https://pypi.python.org/pypi/HTSeq)

calcRNASeqPaired.sh

This shell script joins read count files produced by MapCount_RNASeq.sh and then determines differentially expressed genes in the metagenome using a paired EdgeR analysis. The end results are a table with differentially expressed genes, a mutlidimensional scaling plot showing relatedness of samples, and a plot showing the goodness of fit for the variance estimation.

Usage: ./calcRNASeqPaired.sh [-o ] [-c ] [-x <#>] [-t ] [-y <#>] ...

Requires: Pairwise_edgeR.R must be present in the directory defined by the user .profile for proper execution:

METARNASEQDIR=/path/to/metagenome

Dependencies:

-EdgeR

-Pairwise_edgeR.R (this package)

PairwiseEdgeR.sh

This shell script reads in a tab-delimited table with genes or EC numbers in the first column, followed by raw read counts per gene/EC in the following columns. Then the script determines differentially expressed genes in the metagenome using a paired EdgeR analysis. The end results are a table with differentially expressed genes, a mutlidimensional scaling plot showing relatedness of samples, and a plot showing the goodness of fit for the variance estimation.

Usage: ./PairwiseEdgeR.sh [-o ] [-c ] [-x <#>] [-t ] [-y <#>] ...

Requires: Pairwise_edgeR.R must be present in the directory defined by the user .profile for proper execution:

METARNASEQDIR=/path/to/metagenome

Dependencies:

-EdgeR

-Pairwise_edgeR.R (this package)

metarna-seq's People

Contributors

pjorth avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.