Giter Club home page Giter Club logo

indexreferencefasta-nf's Introduction

IndexReferenceFasta-nf


Description

This is a flexible pipeline for generating common reference genome index files for WGS data analysis. IndexReferenceFasta-nf is a Nextflow (DSL2) pipeline that runs the following tools using either Docker or Singularity to run containerised software for:

  • Samtools faidx
  • BWA index
  • GATK CreateSequenceDictionary

Diagram

User guide

1. Set up

Clone this repository by running:

git clone https://github.com/Sydney-Informatics-Hub/IndexReferenceFasta-nf.git
cd IndexReferenceFasta-nf

2. Generate indexes

Users can specify which index files to create by using the --bwa, and/or --gatk flags. GATK and BWA indexes are optional, while Samtools is run by default. Run the pipeline with:

nextflow run main.nf --ref /path/to/ref.fasta --bwa --gatk -profile <nimbus/gadi/standard>

If you are running the pipeline on NCI Gadi or Pawsey's Nimbus cloud you should specify this with the -profile flag at runtime. This will allow you to use Singularity to run containers at Gadi and Docker to run the containers at Nimbus.

Standard

To run the pipeline on your own system, you will need to have Nextflow installed. You can adjust the standard.config configuration file depending on your own system needs. Currently it runs containers with Singularity. You can test your customised config file using the test fasta available in testData.

To run the pipeline with the standard.config, run the following:

nextflow run main.nf --ref /path/to/ref.fasta --bwa --gatk -profile standard 

NCI Gadi HPC

To run the pipeline at NCI Gadi, first load the Gadi-specific Nextflow installation:

module load nextflow

Then run the pipeline:

nextflow run main.nf --ref /path/to/ref.fasta --bwa --gatk -profile gadi --whoami <us1111> --pbs_account <aa00>

Pawsey Nimbus cloud

To run the pipeline at Pawsey's Nimbus cloud:

nextflow run main.nf --ref /path/to/ref.fasta --bwa --gatk -profile nimbus

Infrasturcture-specific config files can be found in config/

Benchmarking

Human hg38 reference assembly @ Pawsey's Nimbus (NCPU/task = 1)

task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
3 27/33fffc 131621 samtools_index COMPLETED 0 55:44.9 12.2s 12s 99.20% 6.3 MB 11.8 MB 3 GB 19.1 KB
1 80/f03e46 131999 gatk_index COMPLETED 0 55:46.7 22.6s 22.3s 231.90% 3.8 GB 37.1 GB 3.1 GB 726 KB
2 ea/e29535 131594 bwa_index COMPLETED 0 55:44.9 1h 50m 16s 1h 50m 15s 99.50% 4.5 GB 4.5 GB 12.1 GB 8.2 GB

Workflow summaries

Metadata

metadata field workflow_name / workflow_version
Version 1.0
Maturity stable
Creators Georgie Samaha
Source NA
License GPL-3.0 license
Workflow manager NextFlow
Container None
Install method NA
GitHub Sydney-Informatics-Hub/IndexReferenceFasta-nf
bio.tools NA
BioContainers NA
bioconda NA

Component tools

  • samtools/1.15.1
  • gatk/4.3.0.0
  • bwa/0.7.17

Required (minimum) inputs/parameters

  • A reference genome file in fasta format.

Additional notes

Help/FAQ/Troubleshooting

  • A subset fasta file for testing is available in testData/

Acknowledgements/citations/credits

Authors

  • Georgie Samaha (Sydney Informatics Hub, University of Sydney)

Acknowledgements

Cite us to support us!

Acknowledgements (and co-authorship, where appropriate) are an important way for us to demonstrate the value we bring to your research. Your research outcomes are vital for ongoing funding of the Sydney Informatics Hub and national compute facilities. We suggest including the following acknowledgement in any publications that follow from this work:

The authors acknowledge the technical assistance provided by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney and the Australian BioCommons which is enabled by NCRIS via Bioplatforms Australia.

indexreferencefasta-nf's People

Contributors

georgiesamaha avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

indexreferencefasta-nf's Issues

Add small input dataset for testing

Hi @georgiesamaha ,

may I suggest you include a small testing dataset in the repo?

It does not have to output any meaningful data, but just enable executing the whole pipeline to the end, using a pre-canned provided script.
Ideally it be small enough to run say under 5 minutes, to allow for quick testing of both 1. pipeline logical correctness and 2. new deployments to infrastructures and platforms.

Nimbus config broken

Docker runOptions needs fixing in nimbus.config.

Running:

nextflow run main.nf --ref /data/RefGenomes/hg38.fa --bwa --samtools -profile nimbus

Gives error:

Error executing process > 'bwa_index'

Caused by:
  Process `bwa_index` terminated with an error exit status (1)

Command executed:

  bwa index             -a bwtsw                "/data/testData/testHg38.fa"

Command exit status:
  1

Command output:
  (empty)

Command error:
  .command.run: line 275: ${params.work_dir}:${params.work_dir}: bad substitution

Work dir:
  /data/IndexReferenceFasta-nf/work/46/62269b9e8cf0e83354606c54a2e51a

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.