Giter Club home page Giter Club logo

nanopore_lsk109_assembly's Introduction

Nanopore assembly pipeline for LSK109 chemistry and r9 series flowcells

This is a Nanopore assembly pipeline aimed at LSK109 chemistry and flowcell r9 series generated reads.

It then does some basic QC by removing control DNA which is sometimes used during a run to debug potential problems, but which should not end up in the final assembly.

Assembly happens using two different assemblers, Flye and nextDenovo. Both are very fast and have different strenghts. I have found that the nextDenovo assembly overall is better with fewer contigs, but tends to trim telomeres and sometimes loses the mitogenome. Flye however is great at maintaining telomeres and the mitogenome tends to fall out as a single, non-concatenated contig (other assemblers create tandem copies as they don't expect a circular sequence). Canu is used to generate corredted reads which I use to manually check and curate the assemblies in Geneious.

Currently this pipeline is optimised to run on a Nimbus instance with 16 cores and 64 GB of RAM.

Input

The pipeline requires you to basecall your raw fast5 or pod5 files with dorado and then compress the reads wigh gzip. The files need to be names "SampleID.fastq.gz" where 'SampleID' is whatever you want to call your sample. This ID will be used throughout the pipeline to name files.

Basecall with Dorado or Guppy:

If you are recalling old fast5 or pod5 data generated with LSK109 chemistry on an r9 series flowcell:

  • First download the latest model, this one is last available model for LSK-109 chemistry on R9 flowcells:
dorado download --model [email protected]
  • Then run basecalling:
dorado basecaller [email protected] pod5s/ --emit-fastq > sampleID.dorado.fastq && \

gzip -9 sampleID.dorado.fastq

If you have the compute resources available you can try to correct the reads with the new dorado correct module, but the reads cannot be gzipped for that:

dorado correct sampleID.dorado.fastq > sampleID.dorado.corrected.fasta

Running the pipeline

nextflow run jwdebler/nanopore_LSK109_assembly -resume -latest -profile docker,nimbus --reads "reads/"

Profiles

We have a few profiles available to customise how the pipeline will run.

  • nimbus sets the canu assembler to use 15 CPUs and 60GB RAM.
  • zeus sets the canu assembler to use 14 CPUs and 64GB RAM, and sets some cluster specific options to use the slurm based scheduler at Pawsey.
  • docker and docker_sudo sets it to use docker containers, docker_sudo is identical except that docker is run as root (required for some installations of docker).

Parameters


    --reads <glob>
        Required
        A folder containing 1 files per sample. 
        The basename of the file is used as the sample ID.
       
        Example of file names: `Sample1.fastq.gz`, `Sample2.fastq.gz`.
        (Default: a folder called `reads/`)

    --genomeSize <glob>
        not required
        Size of genome, for example "42m" (Default: 42m)

    --medakaModel <glob>
        not required
        Which basecaller model was used?
        r941_min_sup_g507 (kit109, sup)
        (Default: r941_min_sup_g507)

    --minlen
        Min read length to keep for assembly
        (Default: 1000)

    --quality
        Min read q-score to keep for read filtering
        (Default: 10)

    --outdir <path>
        The directory to store the results in.
        (Default: `assembly`)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.