Giter Club home page Giter Club logo

pigx's Introduction

install with Guix badge

PiGx logo

What is PiGx?

PiGx is a collection of genomics pipelines. More information can be found in PiGx website

It includes the following pipelines:

  • PiGx BSseq for raw fastq read data of bisulfite experiments

  • PiGx RNAseq for RNAseq samples

  • PiGx scRNAseq for single cell dropseq analysis

  • PiGx ChIPseq for reads from ChIPseq experiments

  • PiGx CRISPR (work in progress) for the analysis of sequence mutations in CRISPR-CAS9 targeted amplicon sequencing data

All pipelines are easily configured with a sample sheet (in CSV format) and a descriptive settings file (in YAML format). For more detailed information see the README.md file for each of the pipelines in the pipelines directory.

Publication

Wurmus R, Uyar B, Osberg B, Franke V, Gosdschan A, Wreczycka K, Ronen J, Akalin A. PiGx: Reproducible genomics analysis pipelines with GNU Guix. Gigascience. 2018 Oct 2. doi: 10.1093/gigascience/giy123. PubMed PMID: 30277498.

Getting started

To run PiGx on your experimental data, describe your samples in a CSV file sample_sheet.csv, provide a settings.yaml to override the defaults defaults, and select the pipeline.

To generate a settings file template for any pipeline:

pigx [pipeline] --init=settings

To generate a sample sheet template for any pipeline:

pigx [pipeline] --init=sample-sheet

Here's a simple example to run the RNAseq pipeline:

pigx rnaseq my-sample-sheet.csv --settings my-settings.yaml

To see all available options run pigx --help.

Install

Pre-built binaries for PiGx are available through GNU Guix, the functional package manager for reproducible, user-controlled software management. Install the complete pipeline bundle with the following command:

guix install pigx

If you want to install PiGx from source, please make sure that all required dependencies are installed and then follow the common GNU build system steps after unpacking the latest release tarball:

./configure --prefix=/some/where
make install

You can enable or disable each of the pipelines with the --enable-PIPELINE and --disable-PIPELINE arguments to the configure script. PIPELINE is one of bsseq, rnaseq, scrnaseq, chipseq, and crispr. For more options run ./configure --help.

License

All PiGx pipelines are free software: you can redistribute PiGx and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

See LICENSE for the full license text.

pigx's People

Contributors

al2na avatar alexg9010 avatar blosberg avatar borauyar avatar chaoran-chen avatar dohmjan avatar egeulgen avatar frenkiboy avatar jonathanronen avatar katwre avatar mirifax avatar rcuadrat avatar rekado avatar smoe avatar vicfabienne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pigx's Issues

Speed up the dag building process

For the ChIPseq pipeline I had the feeling that the dag building took way too long for the dry-run (in total around 250 jobs), so I started searching a bit around and found this to be not only the case for me as it seems to be a general issue for larger number of jobs (https://bitbucket.org/snakemake/snakemake/issues/534/dryrun-takes-a-long-time-for-complex).
One thing that I found that we (as developers) could improve is to use less wildcards and rather reuse the outputs of dependent rules (https://bitbucket.org/snakemake/snakemake/issues/745/building-dag-of-jobs-slower).

This is something that definitely is not urgent, but we might want to change this at some point.

documentation for development

A detailed documentation for development which will track the most recent changes - i.e. when a .in file appears, that we know where it is used, and why, and that it appeared.

Ignore R_LIBS

Konsta reported this on the mailing list.

We are nulling R_LIBS_USER to keep R from loading user packages, but when users have set R_LIBS R will still happily load all the wrong packages. All pipelines should also null out the R_LIBS environment variable before proceeding.

That's a simple change in all of the snakemake wrapper scripts.

Add website

Add a simple website for advertising the features of the pigx pipelines.

refactoring .in files

Develop a system without the .in files - i.e. have a local config/settings file which would define which filea need to be changed during the configuration steps (which would also enable much easier documentation of such files). The reason for this is that the .in suffix makes the development much harder, specially when it appears on scripts.

Add support for additional arguments to qsub

Hi Guys,

As an feature for advanced users I like to add the possibility to pass arguments to qsub other than the default ones.

For this we need to adjust our driver scripts at this lines (example for rnaseq):

    qsub = "qsub -V -l h_stack={cluster.h_stack}  -l h_vmem={cluster.MEM} %s -b y -pe smp {cluster.nthreads} -cwd" % contact_email_string
    if config['execution']['cluster']['args']:
        qsub += " " + config['execution']['cluster']['args']
    command += [
        "--cluster-config={}".format(cluster_config_file),
        "--cluster={}".format(qsub),
        "--latency-wait={}".format(config['execution']['cluster']['missing-file-timeout'])
    ]

The additional arguments are passed in the settings file:

execution:
  submit-to-cluster: yes
  jobs: 6
  nice: 19
  cluster:
    missing-file-timeout: 120
    memory: 8G
    stack: 128M
    queue: all
    contact-email: alexander.gosdschan@mdc-berlin.de
    args: '-l h_rt=0:0:10'
  rules:
    __default__:
      threads: 1

on verbosity

In some of the pipelines we already have some verbosity options included, see BIMSBbioinfo/pigx_bsseq#107.
Here we added the snakemake commands --printshellcmds and --verbose directly as arguments.
As Rekado actually pointed out there are different levels of verbosity, a nice example is shown here how it could be done and what the levels mean.

Maybe at some point we could decide on the different levels,
but I would like to already propose a structure, with lower numbers meaning lower verbosity:

  1. quiet: do not print anything to the screen, (maybe redirect all output into a log file )
  2. using the fmt() function from bsseq to print a helpful message
  3. normal snakemake output, if no message is set:
    e.g.
    input: /home/agosdsc/pigx/pigx_chipseq/Tests/in/ChIP.fq.gz
    output: /home/agosdsc/pigx/pigx_chipseq/Tests/out_cluster/Trimmed/Trim_Galore/ChIP1/ChIP1_R.fastq.gz
    log: /home/agosdsc/pigx/pigx_chipseq/Tests/out_cluster/Log/trim_galore_ChIP1.log
    jobid: 3
    wildcards: sample=ChIP1
  1. --printshellcmds
  2. --verbose

docker image not working

Hi, It seems that the pigx docker hub doesn't work anymore. Could you please help me check it? Thanks.

$ sudo docker pull bimsbbioinfo/pigx
Pulling repository bimsbbioinfo/pigx
FATA[0001] Could not reach any registry endpoint

broken links

The links following »It includes the following pipelines« are referring to github but the targets are missing.

Move the config validation from Pipeline Script to Driver Script

Right all of us have the validate_input() call at the top of the Pipeline Scripts, such that Snakemake is already invoked, even though we do not know if we have proper input.

It might be better to put this after the generate_config() step in the driver script, such that we fail before any further step is taken.

Add descriptions to document on authorea:

Altuna shared the manuscript on Authorea

We are supposed to include descriptions for our respective sections (bs-seq, rnaseq, ChIPseq, ssSeq). I'm currently going through iterations on the bs-seq section, and when Altuna and I are both happy with it, I'll send out a notification to people on the other pipelines to use that section as a model for their own sections. Will update soon. In the meantime, each pipeline should have a nominee to write this section (added as an "asignee" to this issue.)

RNASeq pipeline without running STAR?

Hi there!

I have been trying to setup RNAseq pipeline to run without using STAR for indexing and so far I was not able to figure it out.

The main reason is that for my purposes I just need to use SALMON.
Additionally, when using STAR the pipeline always falls into RAM memory issues and I have tried all the possible parameters to limit the amount used. I have 32Gb RAM and is not sufficient (Mouse as model). Also, I cannot use any other machine/cluster with higher RAM capacity.

Given this, I am wondering the following:

  1. Is possible to disable STAR? If so, how to do it?
  2. By disabling STAR will this affect the outputs generated by the pipeline?

Thank you.
Paulo

Control verbosity

By default the pipelines really shouldn't print all that much. We could filter the output and only print a progress report. Using terminal control codes we could do this on the same line. Filtering would have to be stopped upon encountering an error and for the onsuccess printing of generated files.

Ignore .Renviron

The pipeline reads the .Renviron files in the current directory.

This can mess up R dependencies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.