Giter Club home page Giter Club logo

deeptools's Introduction

deepTools

Documentation Status PyPI Version install with bioconda European Galaxy server test

User-friendly tools for exploring deep-sequencing data

deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.

For support or questions please post to Biostars. For bug reports and feature requests please open an issue on github.

Citation:

Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research. 2016 Apr 13:gkw257.

Documentation:

Our documentation contains more details on the individual tool scopes and usages and an introduction to our deepTools Galaxy web server including step-by-step protocols.

Please see also the FAQ, which we update regularly. Our Gallery may give you some more ideas about the scope of deepTools.

For more specific troubleshooting, feedback, and tool suggestions, please post to Biostars.


Installation

deepTools are available for:

  • Command line usage (via pip / conda / github)
  • Integration into Galaxy servers (via toolshed/API/web-browser)

There are many easy ways to install deepTools. More details can be found here.

In Brief:

Install through pypi

$ pip install deeptools

Install via conda

$ conda install -c bioconda deeptools

Install by cloning the repository

$ git clone https://github.com/deeptools/deepTools
$ cd deepTools
$ pip install .

Galaxy Installation

deepTools can be easily integrated into Galaxy. Please see the installation instructions in our documentation for further details.

Note: From version 2.3 onwards, deepTools support python3.


This tool suite is developed by the Bioinformatics Facility at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg.

Documentation | deepTools Galaxy | FAQ

deeptools's People

Contributors

adrn-s avatar asrichter avatar bgruening avatar daler avatar dependabot[bot] avatar dpryan79 avatar drakeeee avatar fidelram avatar friedue avatar icebert avatar joseespinosa avatar kilpert avatar leilyr avatar lldelisle avatar martenson avatar mblue9 avatar mvdbeek avatar nehhen avatar opplatek avatar pavanvidem avatar sarah-peter avatar simon-coetzee avatar sklasfeld avatar smoe avatar steffenheyne avatar t-heide avatar thomasmanke avatar vivekbhr avatar vreuter avatar warddeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeptools's Issues

bamCorrelate problem with option --region

If there is not sufficient reads in the given region, bamCorrelate reports an error:
File "bamCorrelate", line 447, in main num_reads_per_bin[:, col])[0]
ValueError: setting an array element with a sequence.
It would be good to print a (warning) message saying that not enough data in the selected region instead of this error.

setup.py is broken

With the current setup.py script it is not possible to use:

  • python setup.py --home /foo/bar or
  • python setup.py --install-lib

Galaxy script for renaming chromosomes

I have not figured out a quick and painless way to remove or add "chr" for chromosome names with the tools that are present currently. Perhaps we should add a small python script.

bamCorrelate issue: error: argument --numberOfProcessors/-p: lot and a 2 other issues

Dear developers,

  • I came across an issue in bamCorrelate. For a reason, I was unable to make it run.
    It is always prompting the error:

"bamCorrelate bins: error: argument --numberOfProcessors/-p: lot is not a valid number of processors"

Specifying the number of processors did not make any difference.

  • Also, I noticed differences in the arguments that bamCorrelate takes between the example run:
    "An example usage is: bamCorrelate bins -b treatment.bam input.bam -plot
    correlation.png -f 200 -method pearson" and the list of arguments from the help menu.
  • Also, python-2.7 $KH/bin/deepTools/bin/bamCorrelate -h is not detailed enough to know the parameters for bamCorrelate

Sorry for all that,
Hope that it will help though and many thanks for the tool,

P.K

Galaxy: update of the explanations within Galaxy?

Currently, the wiki pages are a bit more elaborate than the explanations that users see when they select a tool within Galaxy.
My questions:

  • Should we include the long descriptions? (I lean towards no)
  • Should we include the links to the wiki pages (I lean towards yes)
  • Where could I modify these information, i.e. if we really migrate the wiki once more to the deepTools website eventually, all these links need to be updated etc. I could do it if I knew where to look for it.

Galaxy: correctGCbias output for bigWig and bedGraph not working

with correctGCBias 1.5.6-45-g7190dd0 I get the following message when trying to output a bedgraph file:

format: bedgraph, database: dm3
applying correction genome partition size for multiprocessing: 362078 using region 4 sh: open: No such file or directory

it's a warning, but the file empty.

with output = bigWig I get a similar message:

end (1321) before start (1321484) line 23342 of /tmp/tmplJWLcp

Make deepTools tolerant towards chromosome naming (chr1 vs. 1)

This is related to the issue I first raised for the Galaxy implementation, but Fidel and I agreed that it would be meaningful to make deepTools capable of processing BAM and BED files even if their chromosome naming conventions do not match 100%. After all, IGV and other tools can do it, too.

Discrepancies should still be reported as to make the user aware of the fact the chromosomes are labelled differently, but this should not break the code.

bamCoverage mappingQualityFilter not working as expected

from Fidel:
The problem with the normalization (either RPKM or normalizeTo1X) is that they are based on the total number of mapped reads and not on the total number of reads of quality > minMappingQuality.

That means that people still need to filter the bamFile before running bamCoverage (which is contrary to our intention)

bamCompare should not be heavily affected by this.

script for turning help texts into markdown for wiki

1 wiki page should contain all the options for all the programs

everytime the help within the python scripts is updated, one should just run this script that should generate a wiki-compatible markdown page

commandLine: BED-output of clustered heatmap cannot be directly re-used with computeMatrix

The issue is caused by the lack of # lines in the output of heatmapper nowadays. To make heatmapper etc. work with Galaxy, Bjoern added the feature that the cluster ID is indicated in the 7th column of a bed-file, so that users can easily separate the file into separate data sets and supply them individually to computeMatrix. On the command line, computeMatrix expects just one BED file where the groups are separated by #. That means that currently the user needs to know that he has to turn the file with the format:

chr1 10 12 Cluster1
chr1 20  22 Cluster2

into a file like this:

chr1 10 12
# cluster1
chr1 20  22
# cluster2 

Perhaps we can come up with a more elegant solution in the future.

--quiet option for computeMatrix does not keep quiet

Hi guys,

I appreciate that computeMatrix is telling me all the problems that it encounters (I know, it's a tough job, poor little bugger), but I would appreciate it even more if it would stop littering my stderror when I set the --quiet option. In short: I don't think the -q option works completely properly :)

documentation fixes

I saw in the supplement of the paper several small mistakes/inconsistencies with the commandline interface of heatmapper. I tried to fixed most of them. If anyone can have a brief look ... that would be nice. We can update the paper, during the proof read I hope.

Simplify the installation

The installation is not as easy as it could be. Especially more verbose warnings/errors would be useful, to assist the user.

from distutils import spawn
spawn.find_executable('samtools')

That can be used to determine binaries at runtime and fail with a meaningful and helpful message.

bamCompare: using --verbose with --scaleFactors throws an error

Hi,
I came across the following bug in bamCompare. Specifying the "--scaleFactors" in the command line in the --verbose mode throws the error below:
Hope this will help,
P.

Traceback (most recent call last):
File "/g/furlong1/khoueiry/bin/deepTools/bin/bamCompare", line 315, in
main(args)
File "/g/furlong1/khoueiry/bin/deepTools/bin/bamCompare", line 279, in main
"RPKM is {0}".format(scaleFactor)
UnboundLocalError: local variable 'scaleFactor' referenced before assignment

add names of regions from BED file to bamCorrelate --outRawCounts

at the moment, --outRawCounts returns a matrix without row names. if a BED file is given for bamCorrelate, it makes sense to output the name of the regions in the BED file as the rowname. this matrix could then be directly plugged into DESeq and other downstream applications that require an unnormalized read count matrix

Galaxy: labeling of effective genome size selections is inconsistent

another thing pointed out by our proof-readers: the way users are asked to supply genome or effective genome sizes is currently not consistent between the different Galaxy tools

bamCompare

when "normalize to 1x sequencing depth" is selected, the following header and an empty field appear:

Report normalized coverage to 1x sequenceing depth:

--> this entry should probably be named "Effective genome size" instead

bamCompare, bamCoverage

There's an explanation for the genome size entry:

Enter the genome size to normalize the reads counts. Sequencing depth is defined as the total number of mapped reads * fragment length / effective genome size. To use this option, the effective genome size has to be given. Common values are: mm9: 2150570000, hg19:2451960000, dm3:121400000 and ce10:93260000.

This should read instead:

Enter the effective genome size to normalize the reads counts (the part of the genome that can be mapped, i.e. without undefined bases and highly repetitive regions). Sequencing depth is defined as the total number of mapped reads * fragment length / effective genome size. Common values are: mm9: 2150570000, hg19:2451960000, dm3:121400000 and ce10:93260000.

in correctGCbias and computeGCbias
there is a very elaborate description of the effective genome size including a drop down box (while in the other tools, it's an empty field where the user has to enter the value himself)

I think, we should make it consistent. Actually, I like the way it's done for the GCbias tools better than the empty field. The empty field, however, is consistent with how MACS has it...

move the manual pages to a proper wiki

I'm aiming at something like this:
https://github.com/snowplow/snowplow/wiki

Structure should be something along these lines:
0. Installation (this should be kept in the README.md so that it's still the first thing people see)

  1. How we use deepTools
  2. Documentation of the tools (QC, normalization, visualization)
  3. Recipes (Galaxy and command line)
  4. FAQ (Galaxy and command line)

Suggestions welcome.
Cheers,
Friederike

Galaxy should be much more file-format-restrictive when showing the selection of available files

For example, for computeMatrix only true BED or INTERVAL files should be shown in the drop down menu for the regions while only true BIGWIG files should be shown in the drop down menu for the scores.

At the moment, almost all files in a history tend to be shown, very often with the comment ("as BED" or "as BIGWIG") which, in most cases, will not work and will lead to confusion.

Is this something that needs to changed every time the tools are updated or (re-)installed in the Galaxy? And where does it need to be changed?

The effective genome size numbers for bamCoverage and bamCorrelate are for uniquely map reads

The effective genome size (mappable portion of a genome) numbers for bamCoverage and bamCorrelate are for uniquely map reads. However, nowadays is common to have larger effective genome sizes when using random mapping of multi-reads. Furthermore, depending on the read length used the mappable portion of the genome changes.

Maybe we should add all options to the tools or point to Table 2 of this paper: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0030377

Implement alternative coverage measure (distinct reads)

From: https://www.biostars.org/p/101413/

I'm quite interested if it is possible to integrate side tools to your framework. Actually when working with exome data I've come across the alternative coverage measure that uses distinct reads, i.e. reads that cover a given position and have different offsets. It is also referred as molecular coverage. The inspiration comes from this paper http://genomebiology.com/content/12/1/R6, Fig1. Such measure could be more prone to sequencing artifacts. If you're interested the basic implementation using Picard API is here https://github.com/mikessh/exome-misc/blob/master/src/molcount/MolCountExome.groovy and here https://github.com/mikessh/exome-misc/releases/download/v1.0.0/exome-tools-v1.0.0.jar compiled as jar.

FAQ: Error with UCSCtool bedGraph to bigWig

Now that we have deepTools being tolerant towards chromosome naming, I totally forgot that other tools are not that forgiving. I just put the error here so I eventually make an FAQ entry out of it:

BedGraph to bigWig on data 76
An error occurred with this dataset: 2L is not found in chromosome sizes file

Galaxy: add abbreviation behind the name of each tool

and another thing pointed out by innocent users: they would like to have the abbreviation that we use in the overview table to be represented , i.e.:

  • bamCorrelate (QC) correlates pairs..
  • bamFingerprint (QC) plots profiles...
  • computeGCbias (QC) ...
  • correctGCbias (N) ...
  • bamCoverage (N) ...
  • bamCompare (N) ...
  • computeMatrix (V) ...
  • heatmapper (V) ...
  • profiler (V)

make terms in wiki/help content with command line version AND galaxy consistent

perhaps a glossary would be meaningful? Fabian reports that he's struggling with the terms and them not being consistently used...(which is most likely very true)

any opinion on whether one should use the term NGS or HTS? I tried to use HTS because Thomas preferrred it, but I have the feeling that NGS is much more commonly used

last, but not least: ANY idea on how to unify the terms in the wiki, the help texts in the python scripts and the galaxy help texts most efficiently? perhaps a "help-text-a-thon" is needed where we edit all 3 texts simultaneously? I honestly cannot bring myself to go through these massive texts all on my own, but I think it would tremendously decrease the frustration potential if we did it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.