Giter Club home page Giter Club logo

ipsa-nf's Introduction

IPSA-nf

nextflow Build Status

An Integrative Pipeline for Splicing Analyses (IPSA) written in the Nextflow DSL.

The pipeline allows to perform the following splicing analyses:

  • Quantification of splice junctions and splice sites
  • Calculation of exon-centric and intron-centric splicing metrics
  • Identification of micro-exons

Quickstart

Install nextflow with the following command:

curl -fsSL get.nextflow.io | bash

Pull the docker image:

docker pull guigolab/ipsa-nf@sha256:29750072f2b42ee8ea094a331f53bc5906183591a71ed26619ea76b12b6be3ed

Launch the test pipeline with the following command:

./nextflow run guigolab/ipsa-nf

Pipeline usage

Launching the pipeline with the --help parameter shows the help message:

nextflow run ipsa-nf --help
N E X T F L O W  ~  version 0.24.4
Launching `guigolab/ipsa-nf` [dreamy_aryabhata] - revision: v4.0

I P S A ~ Integrative Pipeline for Splicing Analyses
----------------------------------------------------
Run IPSA on a set of data.

Usage: 
    ipsa-nf [options]

Options:
--index INDEX_FILE        the index file in TSV format
--genome GENOME_FILE      the genome file in FASTA format
--annot ANNOTATION_FILE   the annotation file in gtf format
--merge MERGE             prefix for merged output files (default: all)
--dir DIRECTORY           the output directory
--sjcount-params PARAMS   additional `sjcount` paramters
--margin MARGIN           margin for aggregate (default: 5)
--mincount MIN_COUNT      minimum number of counts for denominators when calculationg fractions (default: 10)
--deltaSS DELTA           distance threshold for splice sites (default: 10)
--entropy ENTROPY         entropy lower threshold (default: 1.5)
--status STATUS           annotation status lower threshold (default: 0)
--microexons              include microexons, default=false

Input format

IPSA-nf takes as input a tab separated file containing information about the input data. The file must be specified using the --index parameter. The format of the index file is as follows:

  1. sample identifier
  2. path to the bam file to be processed
  3. library type:
    • Single-End
    • Paired-End
  4. strandnesss of the data:
    • NONE for unstranded
    • SENSE/ANTISENSE for Single-End stranded
    • MATE1_SENSE/MATE2_SENSE for Paired-End stranded

Here is an index file example:

E14_rep1	E14AlnRep1.sub.bam	Paired-End	MATE2_SENSE
E14_rep2	E14AlnRep2.sub.bam	Paired-End	MATE2_SENSE
E18_rep1	E18AlnRep1.sub.bam	Paired-End	MATE2_SENSE
E18_rep2	E18AlnRep2.sub.bam	Paired-End	MATE2_SENSE

Pipeline results

Analyses results are saved into the folder specified with the --dir parameter. By default it is the data directory within the current working folder.

Output files are organinzed into folders corresponding to the different pipeline endpoints:

  • A01 - splice junctions and splice sites counts
  • A02 - aggregated splice junctions and splice sites counts
  • A03 - aggregated junctions with annotation status and splice site nucleotides
  • A04 - aggregated junctions with selected strand and constrained splice sites
  • A06 - aggregated counts filtered by annotation status and entropy
  • A07 - splicing indices
  • E06 - BED files with splicing indices from A06

And if --microexons is used:

  • D01 - aggregated 2-splits
  • D02 - aggregated constrained 2-splits
  • D06 - extracted microexons from constrained 2-splits

Requirements

IPSA-nf is configured to run using the Docker container engine by default. See the included Dockerfile for the configuration details.

In order to run the pipeline with Doecker the following dependencies have to be met:

The pipeline can also be used without Docker by installing the following software components on your system (natively or by using Environemnt Modules):

ipsa-nf's People

Contributors

emi80 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ipsa-nf's Issues

Use also .fasta genome reference files

@emi80, currently only .fa files are recognized. However many users may use .fasta extension for their genome files. It would be useful to add .fasta to the default extensions for the genome reference.

Documentation missing

In the pipeline results section there is no mention of the merged output, i.e. all.A.psi.tsv, all.A.cosi.tsv, etc.

Also it would be nice if there is a small description of how the pipeline work. Maybe you can just add a link to this pdf from ipsa-full. It helped me understand a bit better the output of the pipeline.

aggregate.awk command not recognized

Hi dear,

my working directory where I run IPSA pipeline is located directly on my home directory.
While trying to run the IPSA pipeline I got the following error:
command run : ./nextflow run guigolab/ipsa-nf

Success : false
workDir : /home/jelhasnaoui/IPSA/work
exit status : 127
Error report: Error executing process > 'aggregateSSJ (E14_rep2.A02.ssj)'
Caused by:
Process aggregateSSJ (E14_rep2.A02.ssj) terminated with an error exit status (127)
Command executed:
aggregate.awk -v degree=1 -v readLength= -v margin=5 -v prefix= -v logfile=E14_rep2.A02.ssj.log E14_rep2.A01.ssj.tsv > E14_rep2.A02.ssj.tsv
Command exit status:
127
Command output:
(empty)
Command error:
.command.sh: line 2: aggregate.awk: command not found
Work dir:
/home/jelhasnaoui/IPSA/work/93/81fb6b38edf80890ad1b622455d148
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
WARN: Killing pending tasks (7)
Error executing process > 'aggregateSSJ (E14_rep2.A02.ssj)'
Caused by:
Process aggregateSSJ (E14_rep2.A02.ssj) terminated with an error exit status (127)
Command executed:
aggregate.awk -v degree=1 -v readLength= -v margin=5 -v prefix= -v logfile=E14_rep2.A02.ssj.log E14_rep2.A01.ssj.tsv > E14_rep2.A02.ssj.tsv
Command exit status:
127
Command output:
(empty)
Command error:
.command.sh: line 2: aggregate.awk: command not found
Work dir:
/home/jelhasnaoui/IPSA/work/93/81fb6b38edf80890ad1b622455d148
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

The error says that the aggregate.awk command is not found, like if the program is not able to recognize it.

Could you please help me to fix this issue?
Thank you in advance!

Best regards,
Jamal.

Perl utils cannot be found when running the pipeline outside of the user home directory

Dear,

Thank you for sharing and making publicly available the great tool IPSA.

While running the test pipeline with the following command:
./nextflow run guigolab/ipsa-nf

I got the following error:
exit status : 2
Error report: Error executing process > 'annotate (E14_rep1.A03.ssj)'

Caused by:
Process annotate (E14_rep1.A03.ssj) terminated with an error exit status (2)

Command executed:

annotate.pl -annot annotation.gfx -dbx genome.dbx -idx genome.idx -deltaSS 10 -in E14_rep1.A02.ssj.tsv > E14_rep1.A03.ssj.tsv

Command exit status:
2

Command output:
(empty)

Command error:
Can't locate Perl/utils.pm in @inc (you may need to install the Perl::utils module) (@inc contains: /home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.2 /usr/local/share/perl/5.20.2 /usr/lib/x86_64-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl .) at /home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf/bin/annotate.pl line 2.
BEGIN failed--compilation aborted at /home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf/bin/annotate.pl line 2.

Work dir:
/vol3/Reference/OriginalData/DeBortoli/03_2016_RNASeq_Noncoding/pipelines/IPSA/work/98/a1b7a3951b7f892b36677c61b4337b

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option -resume
WARN: Killing pending tasks (3)
ERROR ~ Error executing process > 'annotate (E14_rep1.A03.ssj)'

Caused by:
Process annotate (E14_rep1.A03.ssj) terminated with an error exit status (2)

Command executed:

annotate.pl -annot annotation.gfx -dbx genome.dbx -idx genome.idx -deltaSS 10 -in E14_rep1.A02.ssj.tsv > E14_rep1.A03.ssj.tsv

Command exit status:
2

Command output:
(empty)

Command error:
Can't locate Perl/utils.pm in @inc (you may need to install the Perl::utils module) (@inc contains: /home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.2 /usr/local/share/perl/5.20.2 /usr/lib/x86_64-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl .) at /home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf/bin/annotate.pl line 2.
BEGIN failed--compilation aborted at /home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf/bin/annotate.pl line 2.

Work dir:
/vol3/Reference/OriginalData/DeBortoli/03_2016_RNASeq_Noncoding/pipelines/IPSA/work/98/a1b7a3951b7f892b36677c61b4337b

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option -resume

-- Check '.nextflow.log' file for details

I checked the location of the utils.pm file and it's located at :
/home/jelhasnaoui/.nextflow/assets/guigolab/ipsa-nf/Perl/utils.pm

I think that the problem is related to the location of the utils.pm file at the annotate process step. Could you please help me to fix this bug?
Thank you so much in advance!
Best regards,
Jamal.

Precondition Failed Error on GCP

I'm not sure if this is an issue with ipsa-nf or nextflow in general, so please redirect me if there's a better place to get help...

I have been trying to run this pipeline on google cloud and it keeps failing with a 412 "Precondition Failed" error, which is thrown after the first successful completion of an sjcount process. See the attached log file, line 785.
nextflow.log It seems there is some issue copying the .ssc and .ssj from the work directory to the A01 folder in the results directory. The pipeline runs fine for me on the included test dataset, so if there's any issue you know that may be causing this problem, I would be very appreciative!

Thanks for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.