bimsbbioinfo / pigx Goto Github PK

Pipelines in genomics

Home Page: http://bioinformatics.mdc-berlin.de/pigx

License: GNU General Public License v3.0

Python 28.16% Makefile 2.01% Shell 9.29% M4 5.51% Scheme 1.25% TeX 0.31% R 18.17% CSS 0.66% HTML 34.31% Vim Script 0.32%

bisulfite-sequencing rnaseq dropseq chipseq snakemake guix genomics

pigx's Introduction

What is PiGx?

PiGx is a collection of genomics pipelines. More information can be found in PiGx website

It includes the following pipelines:

PiGx BSseq for raw fastq read data of bisulfite experiments
PiGx RNAseq for RNAseq samples
PiGx scRNAseq for single cell dropseq analysis
PiGx ChIPseq for reads from ChIPseq experiments
PiGx CRISPR (work in progress) for the analysis of sequence mutations in CRISPR-CAS9 targeted amplicon sequencing data

All pipelines are easily configured with a sample sheet (in CSV format) and a descriptive settings file (in YAML format). For more detailed information see the README.md file for each of the pipelines in the pipelines directory.

Publication

Wurmus R, Uyar B, Osberg B, Franke V, Gosdschan A, Wreczycka K, Ronen J, Akalin A. PiGx: Reproducible genomics analysis pipelines with GNU Guix. Gigascience. 2018 Oct 2. doi: 10.1093/gigascience/giy123. PubMed PMID: 30277498.

Getting started

To run PiGx on your experimental data, describe your samples in a CSV file sample_sheet.csv, provide a settings.yaml to override the defaults defaults, and select the pipeline.

To generate a settings file template for any pipeline:

pigx [pipeline] --init=settings

To generate a sample sheet template for any pipeline:

pigx [pipeline] --init=sample-sheet

Here's a simple example to run the RNAseq pipeline:

pigx rnaseq my-sample-sheet.csv --settings my-settings.yaml

To see all available options run pigx --help.

Install

Pre-built binaries for PiGx are available through GNU Guix, the functional package manager for reproducible, user-controlled software management. Install the complete pipeline bundle with the following command:

guix install pigx

If you want to install PiGx from source, please make sure that all required dependencies are installed and then follow the common GNU build system steps after unpacking the latest release tarball:

./configure --prefix=/some/where
make install

You can enable or disable each of the pipelines with the --enable-PIPELINE and --disable-PIPELINE arguments to the configure script. PIPELINE is one of bsseq, rnaseq, scrnaseq, chipseq, and crispr. For more options run ./configure --help.

License

All PiGx pipelines are free software: you can redistribute PiGx and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

See LICENSE for the full license text.

pigx's People

Contributors

Stargazers

Watchers

Forkers

pythseq rbarraud inambioinfo ericdanner bobia9991

pigx's Issues

blah bla

Speed up the dag building process

For the ChIPseq pipeline I had the feeling that the dag building took way too long for the dry-run (in total around 250 jobs), so I started searching a bit around and found this to be not only the case for me as it seems to be a general issue for larger number of jobs (https://bitbucket.org/snakemake/snakemake/issues/534/dryrun-takes-a-long-time-for-complex).
One thing that I found that we (as developers) could improve is to use less wildcards and rather reuse the outputs of dependent rules (https://bitbucket.org/snakemake/snakemake/issues/745/building-dag-of-jobs-slower).

This is something that definitely is not urgent, but we might want to change this at some point.

documentation for development

A detailed documentation for development which will track the most recent changes - i.e. when a .in file appears, that we know where it is used, and why, and that it appeared.

Ignore R_LIBS

Konsta reported this on the mailing list.

We are nulling R_LIBS_USER to keep R from loading user packages, but when users have set R_LIBS R will still happily load all the wrong packages. All pipelines should also null out the R_LIBS environment variable before proceeding.

That's a simple change in all of the snakemake wrapper scripts.

Add website

Add a simple website for advertising the features of the pigx pipelines.

Replace `pigx rnaseq` with recording of terminal session

The code example pigx rnaseq is very short and somewhat anticlimatic, because it doesn't show any example output. Replace this with an animated gif of a short command line session of running the pipeline.

refactoring .in files

Develop a system without the .in files - i.e. have a local config/settings file which would define which filea need to be changed during the configuration steps (which would also enable much easier documentation of such files). The reason for this is that the .in suffix makes the development much harder, specially when it appears on scripts.

write the snakejob.* log files to folder

For me it is annoying to always have a blown of working directory for any project.
Maybe qsub has an option to write the logs into a folder.

Add support for additional arguments to qsub

Hi Guys,

As an feature for advanced users I like to add the possibility to pass arguments to qsub other than the default ones.

For this we need to adjust our driver scripts at this lines (example for rnaseq):

    qsub = "qsub -V -l h_stack={cluster.h_stack}  -l h_vmem={cluster.MEM} %s -b y -pe smp {cluster.nthreads} -cwd" % contact_email_string
    if config['execution']['cluster']['args']:
        qsub += " " + config['execution']['cluster']['args']
    command += [
        "--cluster-config={}".format(cluster_config_file),
        "--cluster={}".format(qsub),
        "--latency-wait={}".format(config['execution']['cluster']['missing-file-timeout'])
    ]

The additional arguments are passed in the settings file:

execution:
  submit-to-cluster: yes
  jobs: 6
  nice: 19
  cluster:
    missing-file-timeout: 120
    memory: 8G
    stack: 128M
    queue: all
    contact-email: alexander.gosdschan@mdc-berlin.de
    args: '-l h_rt=0:0:10'
  rules:
    __default__:
      threads: 1

on verbosity

In some of the pipelines we already have some verbosity options included, see BIMSBbioinfo/pigx_bsseq#107.
Here we added the snakemake commands --printshellcmds and --verbose directly as arguments.
As Rekado actually pointed out there are different levels of verbosity, a nice example is shown here how it could be done and what the levels mean.

Maybe at some point we could decide on the different levels,
but I would like to already propose a structure, with lower numbers meaning lower verbosity:

quiet: do not print anything to the screen, (maybe redirect all output into a log file )
using the fmt() function from bsseq to print a helpful message
normal snakemake output, if no message is set:
e.g.

    input: /home/agosdsc/pigx/pigx_chipseq/Tests/in/ChIP.fq.gz
    output: /home/agosdsc/pigx/pigx_chipseq/Tests/out_cluster/Trimmed/Trim_Galore/ChIP1/ChIP1_R.fastq.gz
    log: /home/agosdsc/pigx/pigx_chipseq/Tests/out_cluster/Log/trim_galore_ChIP1.log
    jobid: 3
    wildcards: sample=ChIP1

--printshellcmds
--verbose

docker image not working

Hi, It seems that the pigx docker hub doesn't work anymore. Could you please help me check it? Thanks.

$ sudo docker pull bimsbbioinfo/pigx
Pulling repository bimsbbioinfo/pigx
FATA[0001] Could not reach any registry endpoint

broken links

The links following »It includes the following pipelines« are referring to github but the targets are missing.

Move the config validation from Pipeline Script to Driver Script

Right all of us have the validate_input() call at the top of the Pipeline Scripts, such that Snakemake is already invoked, even though we do not know if we have proper input.

It might be better to put this after the generate_config() step in the driver script, such that we fail before any further step is taken.

logo and paragraphs should share the same center

On large screens the logo is in the center of the page, but the text is not.

Add descriptions to document on authorea:

Altuna shared the manuscript on Authorea

We are supposed to include descriptions for our respective sections (bs-seq, rnaseq, ChIPseq, ssSeq). I'm currently going through iterations on the bs-seq section, and when Altuna and I are both happy with it, I'll send out a notification to people on the other pipelines to use that section as a model for their own sections. Will update soon. In the meantime, each pipeline should have a nominee to write this section (added as an "asignee" to this issue.)

RNASeq pipeline without running STAR?

Hi there!

I have been trying to setup RNAseq pipeline to run without using STAR for indexing and so far I was not able to figure it out.

The main reason is that for my purposes I just need to use SALMON.
Additionally, when using STAR the pipeline always falls into RAM memory issues and I have tried all the possible parameters to limit the amount used. I have 32Gb RAM and is not sufficient (Mouse as model). Also, I cannot use any other machine/cluster with higher RAM capacity.

Given this, I am wondering the following:

Is possible to disable STAR? If so, how to do it?
By disabling STAR will this affect the outputs generated by the pipeline?

Thank you.
Paulo

blah bla 2

Control verbosity

By default the pipelines really shouldn't print all that much. We could filter the output and only print a progress report. Using terminal control codes we could do this on the same line. Filtering would have to be stopped upon encountering an error and for the onsuccess printing of generated files.

Add support for --init-settings

pigx bsseq --init-settings my-settings.yaml should produce a settings template in my-settings.yaml.

Ignore .Renviron

The pipeline reads the .Renviron files in the current directory.

This can mess up R dependencies.