Giter Club home page Giter Club logo

mate-pair_qc's Introduction

mate-pair QC
Version 0.1
2015-04-24

Disclaimer

This pipeline is made available with no waranty of usefulness of any kind. It has been put together for quality control of mate-pair library conformation and insert sizes for use in scaffolding assemblies. This pipeline uses valuable tools developed by other groups (see 'Requires' below)

mate_pair-QC

Quality trim mate pairs, bin by conformation MP/PE/Unkn, align to reference then estimate insert size

overview:

  1. Trim for quality
  2. Separate mate-pairs based on adapter presence and position
  3. Index reference with bwa
  4. Align mate-pairs against reference
  5. Evaluate your mate-pair libraries

Requires the following:
Trimmomatic http://www.usadellab.org/cms/?page=trimmomatic
NxTrim https://github.com/sequencing/NxTrim
bwa http://bio-bwa.sourceforge.net
samtools http://samtools.sourceforge.net
picard tools http://broadinstitute.github.io/picard/

General comments

Raw *fastq.gz mate-pair data in 02_raw_data; run all jobs from the main directory. Job files are specific to Katak at IBIS, but with some minor editing can be adapted for other servers.

Trim for quality

Generates a 'paired' and 'single' fastq file for each mate-pair library
requires Trimmomatic

Edit 01_scripts/01_trimming.sh by giving the path to trimmomatic
Run locally:

01_scripts/01_trimming.sh

Run on Katak:

qsub 01_scripts/jobs/01_trimming_job.sh

Separate mate-pairs based on adapter presence and position

Bin quality trimmed paired mate-pairs for presence of adapter
requires NxTrim

Mate pairs can be in mate-pair conformation, paired-end conformation (i.e. shadow library) or unknown. NxTrim sorts each mate pair into these options based on presence of connecting adapter in forward or reverse read.

Edit 01_scripts/02_NxTrim_binning.sh by giving path to nxtrim

Locally:

01_scripts/02_NxTrim_binning.sh

On Katak:

qsub 01_scripts/jobs/02_NxTrim_binning_job.sh

Normally mate-pairs are in RF directionality, but NxTrim will revcomp and output as FR (ready for alignment)
The output of this will give *.mp.fastq.gz in 04_binned_mps

index reference with bwa

Note: only need to do this once
requires bwa

Locally: ensure reference is in fasta format (not compressed), and change REFERENCE to the path to your reference you want to align against

bwa index REFERENCE

On Katak:

qsub 01_scripts/jobs/03a_indexRef_job.sh

align mate-pairs against reference

requires bwa and samtools

Edit 01_scripts/03b_BWAaln.sh by giving path and names to each sample and RG indexes (e.g. replace short identifier (bolded for clarity)) for each RG at RG[1]='@RG\tID:lib208\tSM:lib208\tPL:Illumina'

Locally:

01_scripts/03b_BWAaln.sh

On Katak:

qsub 01_scripts/jobs/03b_BWAaln_job.sh

evaluate your mate-pair libraries

estimate number of mapped reads using:

samtools flagstat <your.bam>

estimate insert size

requires picard tools locally:

./01_scripts/04_estInsertSizes.sh

On Katak:

qsub 01_scripts/jobs/04_estInsertSizes_job.sh

Enjoy! Hopefully your insert sizes are as you expect.

mate-pair_qc's People

Contributors

bensutherland avatar

Stargazers

 avatar

Watchers

 avatar

mate-pair_qc's Issues

README needs to be improved

Need to expand the README to give a walkthrough explaining how to use the pipeline. Also what are the required programs.

Index only once

For bwa index (only necessary once), this could be separated from the alignment script to prevent repeated unnecessary indexing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.