Giter Club home page Giter Club logo

rnaseq-nf's Introduction

RNAseq-NF pipeline

A basic pipeline for quantification of genomic features from short read data implemented with Nextflow.

nextflow

Requirements

  • Unix-like operating system (Linux, macOS, etc)
  • Java 11

Quickstart

  1. If you don't have it already install Docker in your computer. Read more here.

  2. Install Nextflow (version 23.10.0 or later):

     curl -s https://get.nextflow.io | bash
    
  3. Launch the pipeline execution:

     ./nextflow run nextflow-io/rnaseq-nf -with-docker
    
  4. When the execution completes open in your browser the report generated at the following path:

     results/multiqc_report.html 
    

You can see an example report at the following link.

Note: the very first time you execute it, it will take a few minutes to download the pipeline from this GitHub repository and the associated Docker images needed to execute the pipeline.

Pipeline flowchart

Here is a visual representation of the design of RNASeq-NF pipeline, generated using the visualization functionality of Nextflow.

%%{init: { 'theme': 'forest' } }%%
flowchart TD
    p0((Channel.fromFilePairs))
    p1(( ))
    p2[RNASEQ:INDEX]
    p3[RNASEQ:FASTQC]
    p4[RNASEQ:QUANT]
    p5([concat])
    p6([collect])
    p7(( ))
    p8[MULTIQC]
    p9(( ))
    p0 -->|read_pairs_ch| p3
    p1 -->|transcriptome| p2
    p2 --> p4
    p3 --> p5
    p0 -->|read_pairs_ch| p4
    p4 -->|pair_id| p5
    p5 --> p6
    p6 -->|$out0| p8
    p7 -->|config| p8
    p8 --> p9
Loading

Cluster support

RNASeq-NF execution relies on Nextflow framework which provides an abstraction between the pipeline functional logic and the underlying processing system.

This allows the execution of the pipeline in a single computer or in a HPC cluster without modifying it.

Currently the following resource manager platforms are supported:

  • Univa Grid Engine (UGE)
  • Platform LSF
  • SLURM
  • PBS/Torque

By default the pipeline is parallelized by spawning multiple threads in the machine where the script is launched.

To submit the execution to a UGE cluster create a file named nextflow.config in the directory where the pipeline is going to be executed with the following content:

process {
  executor='uge'
  queue='<queue name>'
}

To lean more about the avaible settings and the configuration file read the Nextflow documentation.

Components

RNASeq-NF uses the following software components and tools:

rnaseq-nf's People

Contributors

abhi18av avatar evanfloden avatar jordeu avatar marcodelapierre avatar molecules avatar pditommaso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rnaseq-nf's Issues

Error GCP/S tutorial

I have been trying to follow GCP/S tutorial to use this proof of concept pipeline https://cloud.google.com/batch/docs/nextflow#command-line using gcb profile. My GCP credentials looks like working fine. But I am getting the error as follows. I would appreciate any help here.

amnah_cchmc@cloudshell:~/rnaseq-nf (pmod-410207)$ ../nextflow run nextflow-io/rnaseq-nf -profile gcb 

Output error:

N E X T F L O W  ~  version 23.04.1
Launching `https://github.com/nextflow-io/rnaseq-nf` [stoic_kare] DSL2 - revision: 8253a586cc [master]
Jan 07, 2024 9:45:35 PM com.google.auth.oauth2.DefaultCredentialsProvider warnAboutProblematicCredentials
WARNING: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/.
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: gs://rnaseq-nf/data/ggal/transcript.fa
 reads        : gs://rnaseq-nf/data/ggal/gut_{1,2}.fq
 outdir       : results
 
Uploading local `bin` scripts folder to gs://gcpbucket_toy/testres/tmp/9a/b274c07f0819f8241f8408664ff288/bin
executor >  google-batch (2)
[ba/0829c0] process > RNASEQ:INDEX (transcript)     [100%] 1 of 1 ✔
[a6/bad50e] process > RNASEQ:FASTQC (FASTQC on gut) [  0%] 0 of 1
[-        ] process > RNASEQ:QUANT                  -
[-        ] process > MULTIQC                       -
ERROR ~ Error executing process > 'RNASEQ:QUANT (1)'

Caused by:
  Oops.. something went wrong while creating task 'RNASEQ:QUANT' unique id -- Offending keys: [
 - type=java.util.UUID value=12d80190-0c61-4bc8-a7d3-3e24a9340bfd, 
 - type=java.lang.String value=RNASEQ:QUANT, 
 - type=java.lang.String value="""
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
executor >  google-batch (2)
[ba/0829c0] process > RNASEQ:INDEX (transcript)     [100%] 1 of 1 ✔
[-        ] process > RNASEQ:FASTQC (FASTQC on gut) [  0%] 0 of -1 ✔
[-        ] process > RNASEQ:QUANT                  -[-        ] process > MULTIQC                       -
ERROR ~ Error executing process > 'RNASEQ:QUANT (1)'

Caused by:
  Oops.. something went wrong while creating task 'RNASEQ:QUANT' unique id -- Offending keys: [ - type=java.util.UUID value=12d80190-0c61-4bc8-a7d3-3e24a9340bfd, 
 - type=java.lang.String value=RNASEQ:QUANT, 
 - type=java.lang.String value="""
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
, 
 - type=java.lang.String value=quay.io/nextflow/rnaseq-nf:v1.2.1, 
 - type=java.lang.String value=index, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/testres/ba/0829c0f5dfe57924224b6e20581701/index, storePath:/testres/ba/0829c0f5dfe57924224b6e20581701/index, stageName:index)],  - type=java.lang.String value=pair_id, 
 - type=java.lang.String value=gut, 
 - type=java.lang.String value=reads, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/data/ggal/gut_1.fq, storePath:/data/ggal/gut_1.fq, stageName:gut_1.fq), FileHolder(sourceObj:/data/ggal/gut_2.fq, storePath:/data/ggal/gut_2.fq, stageName:gut_2.fq)], 
 - type=java.lang.String value=$, 
 - type=java.lang.Boolean value=true, 
 - type=java.util.HashMap$EntrySet value=[task.cpus=null]]


 -- Check '.nextflow.log' file for details

`kuberun` doesn't put data into the right place

Hello,
Hope you are doing well. I am testing nextflow on a kubernetes cluster, and am getting errors. It seems that the test data that is bundled with the repo is not getting copied to the pod. Do you know what may be happening?
Thank you!
Warmest,
Olga

The nextflow.config file:

k8s {
   namespace = 'default'
   serviceAccount = 'nextflow'
   storageClaimName = 'nextflow-pvc'
}

process.scratch = true

nextflow kuberun command and output

(nf-core)
 ✘  Wed  1 Jun - 22:30  ~ 
 olgabot@ip-172-31-9-54  nextflow -c ./nextflow.config kuberun nextflow-io/rnaseq-nf -v nextflow-pvc:/mnt/nextflow -profile docker
zsh: correct 'docker' to '.docker' [nyae]? n
Pod started: peaceful-mccarthy
N E X T F L O W  ~  version 22.04.0
Launching `https://github.com/nextflow-io/rnaseq-nf` [peaceful-mccarthy] DSL2 - revision: 37c5039435 [master]
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: null/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : null/data/ggal/ggal_gut_{1,2}.fq
 outdir       : results

No files match pattern `ggal_gut_{1,2}.fq` at path: null/data/ggal/


Oops .. something went wrong

salmon quant for paired-end reads?

Hi,

The command for salmon quantification in this pipeline is specifying single-end reads:
salmon quant --threads $task.cpus --libType=U -i index -r $reads -o $pair_id

But the pipeline seems to take paired-end reads.
params.reads = "$baseDir/data/ggal/*_{1,2}.fq"

Can you clarify please, which is it?

From the salmon manual:

-r [ --unmatedReads ] arg   List of files containing unmated reads (e.g. single-end reads)
-1 [ --mates1 ] arg         File containing the #1 mates  
-2 [ --mates2 ] arg         File containing the #2 mates

Thanks

Running on M1 macs

This is a long shot, but I'll try anyway. I'm trying to run this with Docker on M1 Mac. Here's the error:

➜  nextflow nextflow run nextflow-io/rnaseq-nf -with-docker
N E X T F L O W  ~  version 21.04.1
Launching `nextflow-io/rnaseq-nf` [tender_volhard] - revision: 1f5a9060aa [master]
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /Users/f6v/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /Users/f6v/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/*_{1,2}.fq
 outdir       : results
 
executor >  local (3)
[78/8554c0] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [  0%] 0 of 1
executor >  local (3)
[-        ] process > RNASEQ:INDEX (ggal_1_48850000_49020000) -
[8b/e55b25] process > RNASEQ:FASTQC (FASTQC on ggal_liver)    [100%] 1 of 1, failed: 1
[-        ] process > RNASEQ:QUANT                            -
[-        ] process > MULTIQC                                 -
Error executing process > 'RNASEQ:FASTQC (FASTQC on ggal_gut)'

Caused by:
  Process `RNASEQ:FASTQC (FASTQC on ggal_gut)` terminated with an error exit status (139)

Command executed:

  fastqc.sh "ggal_gut" "ggal_gut_1.fq ggal_gut_2.fq"

Command exit status:
  139

Command output:
  (empty)

Command error:
  WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
  /home/micromamba/.bashrc: line 50: PS1: unbound variable
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  /Users/f6v/.nextflow/assets/nextflow-io/rnaseq-nf/bin/fastqc.sh: line 6:    59 Segmentation fault      fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}

Work dir:
  /Users/f6v/dev/nextflow/work/49/1b3cb66e3a0f16b96b37a01649e29b

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

I'm new to Docker on ARM, but it seems like a wrong image is pulled? The base image should be available for arm64, should I force it somehow?

Thanks!

Need to build new container image

Hi @pditommaso , we need to build a new version 1.3 and push it to quay.io.

Version 1.2 is broken due to the Dockerfile update with micromamba, which initially had no PATH update.

I have already bumped the version here: https://github.com/nextflow-io/rnaseq-nf/blob/master/docker/Makefile

So it is just a matter of running the Makefile; if you want to give me admin access to quay.io/nextflow I can do it (I am marcodelapierre there as well), or otherwise I will leave it to you

Error while running nf file with docker image.

Command
./nextflow run rna_script.nf -with-docker a40008e6257f

Error
Error executing process > 'buildIndex (1)'

Caused by:
Process buildIndex (1) terminated with an error exit status (127)

Command executed:

bowtie2-build /home/srinka/VirTect2/human_reference/GRCh37.primary_assembly.genome.fa genome.index

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: bowtie2-build: command not found

Use modules or not

Hi there,

I have a general question if I should modules or not for my project.

I was re-directed from the blog post. I like the code in the module branch. But I found the master branch does not use modules. I wonder in which situations using modules is recommended. Thanks!

-Mark

Add Azure batch example to the config

Since azure batch support is relative new and not as well documented as AWS and Google, it will be very helpful to add Azure batch example to the config file.

Issue with multiqc installation with micromamba

When installing with the newer micromamba-based image, the image does not work.

The installation itself succeeds, but then any multiqc command gives an error of missing the typing_extensions python module. If I manually install it in the image, it then complains about imp, which I am unable to fix.

A quick look at the internet suggests that imp has been deprecated for a while, and hence has probably been removed in newer Pythons. importlib should be used instead.

So I think:

  • either we manage to use an older python with micromamba, or
  • if only we knew the multiqc developer, who might look into this ... ... @ewels ? 😊

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.