deeptools / hicmatrix Goto Github PK

License: GNU General Public License v3.0

Python 100.00%

hicmatrix's Introduction

deepTools

User-friendly tools for exploring deep-sequencing data

deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.

For support or questions please post to Biostars. For bug reports and feature requests please open an issue on github.

Citation:

Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research. 2016 Apr 13:gkw257.

Documentation:

Our documentation contains more details on the individual tool scopes and usages and an introduction to our deepTools Galaxy web server including step-by-step protocols.

Please see also the FAQ, which we update regularly. Our Gallery may give you some more ideas about the scope of deepTools.

For more specific troubleshooting, feedback, and tool suggestions, please post to Biostars.

Installation

deepTools are available for:

Command line usage (via pip / conda / github)
Integration into Galaxy servers (via toolshed/API/web-browser)

There are many easy ways to install deepTools. More details can be found here.

In Brief:

Install through pypi

$ pip install deeptools

Install via conda

$ conda install -c bioconda deeptools

Install by cloning the repository

$ git clone https://github.com/deeptools/deepTools
$ cd deepTools
$ pip install .

Galaxy Installation

deepTools can be easily integrated into Galaxy. Please see the installation instructions in our documentation for further details.

Note: From version 2.3 onwards, deepTools support python3.

This tool suite is developed by the Bioinformatics Facility at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg.

Documentation | deepTools Galaxy | FAQ

hicmatrix's People

Stargazers

Watchers

Forkers

rungetf gtrichard fkarg qbss xjyx lldelisle bitfan hugoguillen

hicmatrix's Issues

reorderChromosomes

I suspect that reorderChromosomes generates bins of negative length. I have noticed this issue when running hicPlotDistVsCounts on the matrices where their chromosomes have been reordered at hicAdjustMatrix keep

get_chromosome_sizes() on modified matrices

Chromosome size are computed in get_chromosome_sizes , taking only the end of the chromosome into account. This is correct for modified matrices and needs to be fixed. https://github.com/deeptools/HiCMatrix/blob/master/hicmatrix/HiCMatrix.py#L963

Question about part of the code

Hi,
Could you tell me what this is suppose to do? (This is in the save method).

HiCMatrix/hicmatrix/lib/cool.py

Lines 309 to 317 in 811a9a8

 if self.scaleToOriginalRange: 

 min_value = self.matrix.data.min() 

 max_value = self.matrix.data.max() 

 desired_range_difference = max_value - min_value 

 self.matrix.data = (self.matrix.data - self.minValue) 

 self.matrix.data /= (self.maxValue - self.minValue) 

 self.matrix.data *= desired_range_difference 

 self.matrix.data += min_value

HiCMatrix installation via conda requires HiCExplorer 2.1.1

So if one installs HiCExplorer 2.1.4 first and then HiCMatrix, he will end up with HiCExplorer 2.1.1. I don't know if it is already known/wanted so I report it here.

unexpected behaviour in fit_cut_intervals

In [1]: from hicmatrix import HiCMatrix
INFO:numexpr.utils:NumExpr defaulting to 4 threads.

In [2]: cut_intervals = [('a', 0, 10, 1), ('a', 10, 20, 1), ('b', 0, 10, 1), ('c', 0, 10, 1)]

In [3]: HiCMatrix.hiCMatrix.fit_cut_intervals(cut_intervals)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-3-1da7828147de> in <module>()
----> 1 HiCMatrix.hiCMatrix.fit_cut_intervals(cut_intervals)

/home/ldelisle/.conda/envs/hicexplorer_dev/lib/python3.6/site-packages/hicmatrix/HiCMatrix.py in fit_cut_intervals(cut_intervals)
    344                 resi = [-1 * (start_x % m), -start_x % m]
    345                 return start_x + resi[np.argmin(np.abs(resi))]
--> 346             start = [snap_nearest_multiple(x, median) for x in start]
    347             end = [snap_nearest_multiple(x, median) for x in end]
    348             cut_intervals = list(zip(chrom, start, end, extra))

/home/ldelisle/.conda/envs/hicexplorer_dev/lib/python3.6/site-packages/hicmatrix/HiCMatrix.py in <listcomp>(.0)
    344                 resi = [-1 * (start_x % m), -start_x % m]
    345                 return start_x + resi[np.argmin(np.abs(resi))]
--> 346             start = [snap_nearest_multiple(x, median) for x in start]
    347             end = [snap_nearest_multiple(x, median) for x in end]
    348             cut_intervals = list(zip(chrom, start, end, extra))

/home/ldelisle/.conda/envs/hicexplorer_dev/lib/python3.6/site-packages/hicmatrix/HiCMatrix.py in snap_nearest_multiple(start_x, m)
    342             # of the median
    343             def snap_nearest_multiple(start_x, m):
--> 344                 resi = [-1 * (start_x % m), -start_x % m]
    345                 return start_x + resi[np.argmin(np.abs(resi))]
    346             start = [snap_nearest_multiple(x, median) for x in start]

ZeroDivisionError: integer division or modulo by zero

I know it is quite unlikely to have so many small chromosomes compared to the number of large chromosomes but...
I think the issue is:

HiCMatrix/hicmatrix/HiCMatrix.py

Line 337 in 5c19a34

median = int(np.median(np.diff(start)))

Because in this case the diff gives a lot of negative values:

In [7]: np.diff(start)
Out[7]: array([ 10, -10,   0])

The best would be to do the diff only between start from the same chromosomes.
Something like:

chrom, start, end, extra = zip(*cut_intervals)
int(np.median(np.concatenate([np.diff([start for chro, start, end, extra in cut_intervals if chro == cur_chrom]) for cur_chrom, nb in Counter(chrom).items() if nb > 1])))

Tell me if it seems a good idea.

In addition, I don't fully understand, is cut_intervals supposed to be exhaustive? I mean continuous? For example is this valid?

cut_intervals = [('a', 0, 10, 1), ('a', 10, 20, 1), ('b', 0, 10, 1), ('c', 0, 10, 1), ('c', 20, 30, 1), ('d', 0, 10, 1)]

('c', 10, 20, 1) is not present.

no nan_bins update in fillLowerTriangle

I do not manage to fix the other bug with the nan_bins:
I put an example with one interaction (one_interaction_4chr.cool):
chr1 10000 chr1 200000
with bins of 50k.
So one interaction between bin 0 and bin 3
with cool.load()
I get a matrix with a 1 in matrix[0][3], so in indices I have 3 and in nan_bins I have 0, 1, 2, 4, ...
In HiCMatrix, there is first cool.load, then fillLowerTriangle().
After this operation, indices become 0 and 3 perfect but the nan_bins are not adjusted.
For the moment, it looks like cool is the only format to support use nan.
Should I modify fillLowerTriangle?

using numpy.string_ for info dict generation prevents generated cooler files from usage with HiGlass

I am using HiCexplorer for downstream analysis of our Hi-C data. To view the results I wanted to use HiGlass which uses the multicooler format to view the data at different resolutions. The HiCexplorer provides a nice utility for conversion between the h5 and cooler format. However, when trying to view the results in HiGlass i get an import error. In brief, my script does the following:

hicConnvertMatrix -m sample_100kb.h5 --inputFormat h5 --outputFormat cool -o sample_100kb.cool
cooler zoomify -r 500000,1000000 -o sample.mcool sample_100kb.cool

After investigating this issue I found that the way the conversion is implemented by the hicmatrix library contains a bug. In particular, what HiGlass seems to try when importing a dataset is reading in the metadata of the cooler containers in the multicooler as JSON. While this is fine for the coarser resolutions generated with cooler zoomify the metadata of the cool file generated with hicConvertMatrix cannot be read since it contains binary objects. A quick check with cooler attrs gives:

'@attrs':
      bin-size: 100000
      bin-type: fixed
      creation-date: 2020-04-09 10:42:50.675971
      format: !!binary |
        SERGNTo6Q29vbGVy
      format-url: !!binary |
        aHR0cHM6Ly9naXRodWIuY29tL21pcm55bGFiL2Nvb2xlcg==
      format-version: 3
      generated-by: !!binary |
        SGlDTWF0cml4LTEx
      generated-by-cooler-lib: !!binary |
        Y29vbGVyLTAuOC41
      genome-assembly: unknown
      metadata: {}
      nbins: 24723
      nchroms: 19
      nnz: 141621591
      storage-mode: symmetric-upper
      sum: 21271216939042.75
      tool-url: !!binary |
        aHR0cHM6Ly9naXRodWIuY29tL2RlZXB0b29scy9IaUNNYXRyaXg=

These values cannot be interpreted during the JSON file generation and therefore the import to HiGlass fails.

A quick lookup in the cool.py file of the hicmatrix library reveals the source of this. On line 364 - 397 the info dictionary of the new cooler file is generated where string conversion is explicitly handled by numpy.string_. However, the hdf5 library seems to be unable to understand this datatype and converts it to a binary object. Replacing numpy.string_ with the native Python str function resolves this problem and a quick check with cooler attrs gives:

'@attrs':
  bin-size: 100000
  bin-type: fixed
  creation-date: 2020-04-09 12:57:38.542590
  format: HDF5::Cooler
  format-url: https://github.com/mirnylab/cooler
  format-version: 3
  generated-by: HiCMatrix-11
  generated-by-cooler-lib: cooler-0.8.5
  genome-assembly: unknown
  metadata: {}
  nbins: 24723
  nchroms: 19
  nnz: 141621591
  storage-mode: symmetric-upper
  sum: 21271216939042.75
  tool-url: https://github.com/deeptools/HiCMatrix

I therefore propose to change replace numpy.string_ with str to ensure compatibility with HiGlass.

Release 16 on pypi?

Hi,

it looks like latest available release on pypi is 15. Any chance to make 16 available there too? Thanks.

MatrixFileHandler.load() crashes interpreter quite brutally

I was debugging hicexplorer execution using the current bioconda image, and there the interpreter quits with exit code 1.
That seems less than ideal:

Python 3.6.7 | packaged by conda-forge | (default, Nov  6 2019, 16:19:42)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from hicmatrix.lib.matrixFileHandler import MatrixFileHandler
>>> m = MatrixFileHandler('h5', '0_matrix.h5')
INFO:numexpr.utils:NumExpr defaulting to 3 threads.
>>> m.load()
INFO:hicmatrix.lib.h5:No h5 file. Please check parameters concerning the file type!
bash-4.2#

I think this is because of

HiCMatrix/hicmatrix/lib/h5.py

Line 34 in fa86129

exit(1)

... calling sys.exit shouldn't be done in a library, as there's no way to handle this gracefully. Additionally the helpful info message should be a error message, so that there is something one can work with.

Last bin with data of cool file is always nan

Hi,
In the commit b8c2749

            nan_bins = np.array(range(shape))
            nan_bins = np.setxor1d(nan_bins, matrix.indices)

            i = 0
            while i < len(nan_bins):
                if nan_bins[i] >= shape:
                    break
                i += 1
            nan_bins = nan_bins[:i]

was changed to

            nan_bins = np.arange(shape)
            nan_bins = np.setdiff1d(nan_bins, matrix.indices[:-1])

which means that the last column which has data is now always nan whereas before it was not the case.
Do you remember why?

surprising behaviour with cool load

Hi,
I noticed 2 surprising behaviours or at least I think:

When a bin has a weight of nan, the weight is transformed to 1 with:

HiCMatrix/hicmatrix/lib/cool.py

Line 148 in fa86129

correction_factors = convertNansToOnes(np.array(correction_factors_data_frame.values).flatten())

But then this bin is not put back in the nan_bins.
There is a section to rescale matrices which would have a strange range:

HiCMatrix/hicmatrix/lib/cool.py

Lines 198 to 199 in fa86129

# check if max smaller one or if not same mangnitude

if max_value < 1 or (np.absolute(int(math.log10(max_value)) - int(math.log10(self.maxValue))) > 1):

The problem is that if you load the matrix with self.chrnameList is None or with self.chrnameList is not None, you may have different outputs... Do you think there could be a parameter which allows to keep the values as they were in the cool file without scaling them?

pyGenomeTracks error

In trying to run

pyGenomeTracks --help

I get the following error:

ImportError: No module named hicmatrix

Not sure why it was not installed when installing pyGenomeTracks initially? Is there a way for me to add this?

Thanks.

use of 'non-standard dependency specifiers' in setup.py and requirements.txt

Hi,
when using pip 23.2.1 together with an existing hicmatrix 15 installation I get the following warnings:

DEPRECATION: hicmatrix 15 has a non-standard dependency specifier numpy>=1.16.*. pip 23.3 will enforce this behaviour change. A possible repacement is to upgrade to a newer version of hicmatrix or contact the author to suggest that they release a version with a conforming dependecy specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: hicmatrix 15 has a non-standard dependency specifier scipy>=1.2.*. pip 23.3 will enforce this behaviour change. A possible replcement is to upgrade to a newer version of hicmatrix or contact the author to suggest that they release a version with a conforming dependeny specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: hicmatrix 15 has a non-standard dependency specifier tables>=3.5.*. pip 23.3 will enforce this behaviour change. A possible repacement is to upgrade to a newer version of hicmatrix or contact the author to suggest that they release a version with a conforming dependecy specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: hicmatrix 15 has a non-standard dependency specifier pandas>=0.25.*. pip 23.3 will enforce this behaviour change. A possible relacement is to upgrade to a newer version of hicmatrix or contact the author to suggest that they release a version with a conforming dependncy specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: hicmatrix 15 has a non-standard dependency specifier intervaltree>=3.0.*. pip 23.3 will enforce this behaviour change. A possibe replacement is to upgrade to a newer version of hicmatrix or contact the author to suggest that they release a version with a conforming dpendency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063

Further on, trying an install with:

#> pip install git+https://github.com/deeptools/HiCMatrix.git@340de4b136a52730d95f5087f0c0f63d9b089242

unhappily fails with:

      wheel.vendored.packaging.requirements.InvalidRequirement: Expected end or semicolon (after version specifier)
          scipy>=1.2.*
               ~~~~~^
      [end of output]

By removing the .* in the files mentionend above the install succeeded.

Best,
Thomas

Homer read gzip

We write a gzip file atm, but read only an unzipped one.

ModuleNotFoundError: No module named 'importlib.metadata'

In the last version (17.1), I use from importlib.metadata import version. In python 3.8 and later, this is included into CPython but in 3.7, it requires importlib-metadata.
If you are using python 3.7, you need to install importlib-metadata the same way you installed HiCMatrix.
@bgruening , do you know if I should add importlib-metadata as requirements? Or if there is a way to add importlib-metadata as a requirement only for python 3.7 both in pypi and conda recipe?

	if self.scaleToOriginalRange:
	min_value = self.matrix.data.min()
	max_value = self.matrix.data.max()
	desired_range_difference = max_value - min_value

	self.matrix.data = (self.matrix.data - self.minValue)
	self.matrix.data /= (self.maxValue - self.minValue)
	self.matrix.data *= desired_range_difference
	self.matrix.data += min_value

	# check if max smaller one or if not same mangnitude
	if max_value < 1 or (np.absolute(int(math.log10(max_value)) - int(math.log10(self.maxValue))) > 1):