Giter Club home page Giter Club logo

gbs_qc_nf's Introduction

GitHub release (latest by date)

pytest check

GBS QC Nextflow Pipeline for farm5

About

This pipeline provides QC information for lanes of Group B Strep (GBS) sequences that are imported on farm5 and QC-ed, assembled and mapped on pf. This pipeline gives:

  • Relative abundance of GBS reads from Kraken
  • Number of contigs
  • GC content
  • Genome length
  • Coverage breadth
  • Coverage depth
  • Percentage HET SNPs out of total SNPs

Installation

  1. Download pipeline in a directory where you keep your software or pipelines:
git clone https://github.com/sanger-bentley-group/GBS_QC_nf.git

Usage

  1. Go into pipeline directory
cd GBS_QC_nf
  1. Load nextflow module
module load nextflow
  1. Run QC analysis using bsub:
bsub -o gbs_qc.%J.out -e gbs_qc.%J.err -R"select[mem>4000] rusage[mem=4000]" -M4000 'nextflow run main.nf --qc_reports_directory /path/to/gbs_qc_reports --lanes /path/to/gbs_lanes.txt'

Change:

  • /path/to/gbs_lanes.txt to the file location of your list of lanes (that are imported and can be accessed via pf), e.g.
20280_5#1
20280_5#10
20280_5#100
20280_5#101
20280_5#102
20280_5#103
20280_5#104
20280_5#105
20280_5#106
20280_5#107
  • /path/to/gbs_qc_reports to the directory location of the generated reports. (Default is the current directory)

Output

You should get two tab-delimited output reports qc_report_summary.txt and qc_report_complete.txt in the --qc_reports_directory you specified. qc_report_summary.txt gives the lane_id and PASS/FAIL status. qc_report_complete.txt gives all the PASS/FAIL status for each QC.

Missing Data

In qc_report_summary.txt, if there are empty values:

  • rel_abundance then these lanes may not have been imported/imported properly with a kraken report. Solution: Contact [email protected] to import those lanes again
  • contig_no, gc_content or genome_len then these lanes may not have been assembled/assembled properly. Solution: Check the status of the assemblies using the pf status command. If -, contact [email protected] to assemble those lanes. If Failed/Running/Pending, ask path-help to re-trigger the assemblies again (Although Failed assemblies can suggest a problem with the read coverage)
  • cov_breadth or cov_depth then these were not calculated in pf. Solution: Contact [email protected] to ask why these values for these lanes are not available in pf data -s.
  • HET_SNPs then these lanes may not have had SNPs called. Solution: Check the status of the SNP call using pf status command. If -, contact [email protected] to call SNPs. If Failed/Running/Pending, ask path-help to re-trigger call SNPs again.

Additional options

--rel_abund_threshold           Pass read QC if rel_abundance is > rel_abund_threshold. (Default: 70)
--species                       Species of interest. (Default: 'Streptococcus agalactiae')
--contig_no_threshold           Pass contig number QC if contig_no < contig_no_threshold. (Default: 500)
--assembler                     Assemblies of interest e.g. velvet or spades. (Default: spades)
--gc_content_lower_threshold    QC content must be >= gc_content_lower_threshold to pass. (Default: 32)
--gc_content_higher_threshold   QC content must be <= gc_content_higher_threshold to pass. (Default: 38)
--genome_len_lower_threshold    Genome length/total number of bases > genome_len_lower_threshold to pass. (Default: 1450000)
--genome_len_higher_threshold   Genome length/total number of bases < genome_len_higher_threshold to pass. (Default: 2800000)
--cov_depth_threshold           Genome depth of coverage > cov_depth_threshold to pass. (Default: 20)
--cov_breadth_threshold         Genome breadth of coverage > cov_breadth_threshold to pass. (Default: 70)
--het_snps_threshold            Number of HET SNPs <= het_snps_threshold to pass. (Default: 20)

The methods

The methods used for finding relative abundance from Kraken, coverage breadth, coverage depth and percentage HET SNPs out of total SNPs are described here (Sanger access only).

For developers

To run Python unit tests:

pytest tests

To test this pipeline on the farm:

module load nextflow/20.10.0-5430
bsub -G <YOUR GROUP> -o gbs_qc.o -e gbs_qc.e -R"select[mem>4000] rusage[mem=4000]" -M4000 'nextflow run main.nf --qc_reports_directory gbs_qc_report --lanes tests/test_data/test_lanes.txt'

gbs_qc_nf's People

Contributors

blue-moon22 avatar harryhung avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.