Giter Club home page Giter Club logo

outbryk's Introduction

outbryk

Repository for the Cortex-based pipeline we are setting up to enter the "2015 Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens"

"We" = Henk denBakker, Zam Iqbal, Phelim Bradley, Jen Gardy, Rachel Norris, Sarah Walker, Derrick Crook, Tim Peto

Given a new set of fastq, called the "challenge" for the moment, Outbryk will:

  1. Complete the Cortex independent workflow with challenge+selected background, producing a single VCF 2, Build a tree and look for clusters
  2. For each cluster, use MASH to choose a closer reference genome https://github.com/marbl/Mash, http://mash.readthedocs.org/en/latest/index.html
  3. Now use the Cortex joint workflow (ie compare samples directly, with no reference involved) using this closer reference for coordinates.
  4. See if this new callset provides better resolution of the cluster
  5. Study contig sharing, phage presence and indel presence within clusters

Outbryk is not ready for outside use, but feel free to take a look, and trial it if you wish. We need to make it more robust before we push it out, but the ASM NGS challenge has been a very useful trial

outbryk's People

Contributors

hcdenbakker avatar zamiqbal avatar phelimb avatar iqbal-lab avatar

Stargazers

Lauren Hudson avatar

Watchers

 avatar James Cloos avatar Jennifer Gardy avatar  avatar Rachel Colquhoun avatar  avatar

Forkers

melakbet

outbryk's Issues

Prophage file redundancy

Current sets of prophages are probably highly redundant, should use some kind of CD-HIT like clustering procedure to get a set of unique prophage sequences

Add script to pull out clade-specific indels and supernodes

Henk has warned that it can be hard to separate outbreak samples from other environmental samples in Listeria monocytogenes, as the mutation rate is low, so there may be very few SNPs.
But Jen Gardy pointed out this very interesting paper
http://jcm.asm.org/content/early/2015/08/20/JCM.00202-15.full.pdf
which says that you can use insertions/deletions and mobile elements to improve resolution.

Basically, we might be able to
a) look at the indels in the VCF and see which samples have them, to pull apart clusters
b) use --pan_genome_matrix on all supernodes in a cluster, and see if there are big contigs that split the cluster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.