Giter Club home page Giter Club logo

mustache's Introduction

Mustache PyPI Latest Release License: MIT DOI

Mustache (Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation) is a tool by Abbas Roayaei Ardakany, Halil Tuvan Gezer, Stefano Lonardi and Ferhat Ay ([email protected]).

Mustache is a tool for multi-scale detection of chromatin loops from Hi-C and Micro-C contact maps in high resolutions (10kbp all the way to 500bp and even more). Mustache uses recent technical advances in scale-space theory in Computer Vision to detect chromatin loops caused by interaction of DNA segments with a variable size. Here is an example of Mustache loops detected for HFFc6 Micro-C in 1kb resolution (loops are enlarged):

For more information, please read the full paper in Genome Biology. You can also download and visualize our loop calls on Epigenome Browser as a Custom Track Hub using JSON files in the WashU-output folder.

Release notes corresponding to version 1.2.0 (July 16, 2021)

We added differential loop detection to mustache. You can find the differential loops between two contact maps (together with their corresponding loops) by running the following command:

python3 ./mustache/mustache/diff_mustache.py -f1 data1.hic -f2 data2.hic -pt 0.05 -pt2 0.1 -o output -r 10000 -st 0.8

where "pt2" specifies the fdr threshold for finding differential loops. This command will output 4 different files:

  1. output.loop1: loops found in data1.hic
  2. output.loop2: loops found in data2.hic
  3. output.diffloop1: loops that are present in data1.hic and weakened/disappeared in data2.hic
  4. output.diffloop2: loops that are present in data2.hic and weakened/disappeared in data1.hic

Installation

For convenience, we provide several ways to install Mustache.

Conda

Conda is the recommended way of running Mustache as it will take care of the dependencies.

Suggested way to install conda is to use the installer that is appropriate for your system from the Miniconda page.

Make sure your "conda" command specifically calls the executable under the miniconda distribution (e.g., ~/miniconda3/condabin/conda).

If "conda activate" command gives an error when you run it the first time then you will have to run "conda init bash" once.

git clone https://github.com/ay-lab/mustache
conda env create -f ./mustache/environment.yml
conda activate mustache

and then run one of these three commands:

1) python -m mustache  -f ./mustache/data/chr21_5kb.RAWobserved -b ./mustache/data/chr21_5kb.KRnorm -ch 21 -r 5kb -o chr21_out5.tsv -pt 0.1 -st 0.8
2) python3 ./mustache/mustache/mustache.py  -f ./mustache/data/chr21_5kb.RAWobserved -b ./mustache/data/chr21_5kb.KRnorm -ch 21 -r 5kb -o chr21_out5.tsv -pt 0.1 -st 0.8
3) ./mustache/mustache/mustache.py  -f ./mustache/data/chr21_5kb.RAWobserved -b ./mustache/data/chr21_5kb.KRnorm -ch 21 -r 5kb -o chr21_out5.tsv -pt 0.1 -st 0.8

Docker

We have a Docker container that allows running Mustache out of the box. You can mount the necessary input and output locations and run Mustache as follows.

docker run -it aylab/mustache
mustache -f /mustache/data/chr21_5kb.RAWobserved -b /mustache/data/chr21_5kb.KRnorm -ch 21 -r 5kb -o ./chr21_out5.tsv -pt 0.1 -st 0.8

PIP

pip3 install mustache-hic

Github

Make sure you have Python >=3.6 installed, along with all the dependencies listed.

git clone https://github.com/ay-lab/mustache
cd mustache
./mustache/mustache.py ...arguments

Dependencies

Mustache uses these Python packages: Check here for a list of dependency versions that we know are working with Mustache.

  1. python >= 3.6
  2. numpy
  3. pandas
  4. matplotlib
  5. seaborn
  6. scipy
  7. statsmodels
  8. pathlib
  9. cooler
  10. hic-straw

Examples

Example 1: Running Mustache with a contact map and a normalization/bias vector

  • Run Mustache on provided example data for chromosome 21 of HMEC cell line from Rao et al. (selected due to file size restrictions) with KR normalization in 5kb resolution as follows.
mustache -f ./data/chr21_5kb.RAWobserved -b ./data/chr21_5kb.KRnorm -ch 21 -r 5kb -pt 0.1 -o chr21_out.tsv -st 0.8

where -f is the raw contact map, -b is the bias (normalization vector) file, -ch is the subject chromosome, -r is the resolution, and -o is the output file.

Example 2: Running Mustache with a .hic file

mustache -f ./4DNFIPC7P27B.hic -ch 1 2 X -r 1kb -pt 0.01 -o hic_out.tsv

where -f is our input file, -ch is the subject chromosome, -r is the resolution, and -o is the output file.

Example 3: Running Mustache with a .cool file

wget ftp://cooler.csail.mit.edu/coolers/hg19/Rao2014-GM12878-MboI-allreps-filtered.5kb.cool
mustache -f ./Rao2014-GM12878-MboI-allreps-filtered.5kb.cool -ch chr12 chr19 -r 5kb -pt 0.05 -o cooler_out.tsv
OR
mustache -f ./Rao2014-GM12878-MboI-allreps-filtered.5kb.cool -r 5kb -pt 0.05 -o cooler_out.tsv

where -f is our input file, -ch is the subject chromosome, -r is the resolution, and -o is the output file. If you don't specify the chromosome (-ch) for a .[m]cool or .hic files mustache will run on all chromosomes and outputs the results in the output file specified by -o.

Parameters

Short Long Meaning
Required Parameters
-f --file Location of contact map. (See below for format.)
-r --resolution Resolution of the provided contact map.
-o --outfile Name of the output file.
-ch --chromosome List of the chromosome names you want to run mustache on. If not specified for .[m]cool or .hic formats mustache will automatically run on all chromosomes.
Optional Parameters
-b --biases Location of biases (normalization) file for contact map (required only for text format).
-p --processes Number of parallel processes to run. Default is 4. Increasing this will also increase the memory usage.
-pt --pThreshold P-Value threshold for an interaction to be reported in the final output file. Default is 0.1
-sz --sigmaZero Sigma0 parameter for Mustache. Default is experimentally chosen for 5Kb resolution.
-st --sparsityThreshold The sparsity threshold mustache uses tp filter out loops in sparse regions. Default value is 0.88.
-norm --normalization For .[m]cool or .hic files, you can specify what normalization mustache should use (-norm KR). For .[m]cool files, by default, mustache assumes that bias values are stored in the 'weight' column when this parameter is not specified. For .hic format, the default is 'KR' normalization if not specified.
-V --version Shows the version of the tool.

Tips

  • For sparser datasets use smaller sparsity thresholds , e.g., -st 0.7 (default=0.88).
  • For very high resolutions (e.g., 1kb) use:
    • smaller sparsity thresholds , e.g., -st 0.7
    • less stringnet q-value thresholds, e.g., -pt 0.1

Input Formats

Input map can be one of the following types.

1. Text format (contact counts file + bias file)

Similar to Hi-C analysis tools previously developed by our lab (Selfish and FitHiC), we allow a simple, readable textual input format for Mustache.

To use this input mode, we require a contact map and a bias/normalization vector file.

1a. Contact map files need to have the following format. They must not have a header. The values must be separated by a tab.
Chromosome 1 Midpoint 1 Chromosome 2 Midpoint 2 Contact Count
chr1 5000 chr1 65000 438
chr1 5000 chr1 85000 12
... ... ... ... ...
1b. Bias files need to have the following format. They must not have a header. Bias file must use the same midpoint format as the contact maps.

Bias file is a list of normalization factors. This means contact counts will be divided by their corresponding factors.

Chromosome Midpoint Factor
chr1 5000 NaN
chr1 10000 1.12
chr1 15000 0.1

2. Juicer .hic Files

Mustache uses Juicer's straw tool to read .hic files.

3. Cooler .cool, and .mcool Files

Mustache uses Cooler package to read .cool, and .mcool files.

Output format

Output of Mustache is a TSV file and is formatted as follows

| Bin 1 Chromosome | Bin 1 Start | Bin 1 End | Bin 2 Chromosome | Bin 2 Start | Bin 2 End | FDR | Mustache Scale for this Detection |

Citation

If you use Mustache in your work, please cite our paper:

Roayaei Ardakany, A., Gezer, H.T., Lonardi, S. et al. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol 21, 256 (2020). https://doi.org/10.1186/s13059-020-02167-0

Contact

For problems about installation, technical questions, parameter settings, interpretation of the results and other help please email Abbas Roayaei Ardakany or Ferhat Ay ([email protected], [email protected]).

mustache's People

Contributors

ay-lab avatar roayaei avatar sa501428 avatar tuvangezer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mustache's Issues

mustache outputs on .cool file only chrX

Hi, new bug to report, mustache only output chrX when run on a cool (or mcool) file with all chromosome, i think it rewrite over the output for every chromosome instead of appending results to the file. Should be an easy fix i think...

Problem:
mustache only output chrX when processing .cool file

Expected behavior:
All chromosomes loop are written to a single file

How to replicate:

Download test data and pull updated docker image:

wget https://dovetail-public.s3-us-west-2.amazonaws.com/publicData/mustacheEx/test_xlarge.cool
docker pull mblanche/mustache

Running only on chr1 reports loop on that chromosome:

docker run -i -v ${PWD}:/mnt mblanche/mustache mustache -p 24 -f /mnt/test_xlarge.cool -r 1000 -ch chr1 -o /mnt/test_chr1.tsv

Output produced

$ cat test_chr1.tsv 
chr1	27703000	27704000	chr1	27719000	27720000	0.058730870088279145	2.111212657236631
chr1	12123000	12124000	chr1	12209000	12210000	4.781826487842977e-05	2.111212657236631
chr1	7667000	7668000	chr1	7750000	7751000	3.352190756977791e-06	2.111212657236631
chr1	8027000	8028000	chr1	8313000	8314000	8.84808224321354e-05	2.599207668339954
chr1	9287000	9288000	chr1	9492000	9493000	0.000389912598719869	2.111212657236631
chr1	9290000	9291000	chr1	9457000	9458000	0.0001244653836892512	4.222425314473262
chr1	31461000	31462000	chr1	31548000	31549000	0.03206428934260979	4.222425314473262
chr1	40847000	40848000	chr1	40952000	40953000	0.054851710059485304	2.111212657236631
chr1	42913000	42914000	chr1	43073000	43074000	7.436971843910456e-06	3.2
chr1	46037000	46038000	chr1	46180000	46181000	0.02078321219285685	2.599207668339954
chr1	46038000	46039000	chr1	46171000	46172000	0.04800154402043615	3.2
chr1	46477000	46478000	chr1	46528000	46529000	0.04800154402043615	3.2
chr1	45358000	45359000	chr1	45491000	45492000	0.00904187572043084	2.599207668339954
chr1	48109000	48110000	chr1	48201000	48202000	0.0001698415416786503	2.111212657236631
chr1	48144000	48145000	chr1	48159000	48160000	0.18259572493930876	2.599207668339954
chr1	51897000	51898000	chr1	52002000	52003000	4.820291779283892e-06	2.111212657236631
chr1	53279000	53280000	chr1	53392000	53393000	0.00020036054520844448	2.111212657236631
chr1	54949000	54950000	chr1	54996000	54997000	0.12706120965450662	3.6758347359905126
chr1	55141000	55142000	chr1	55276000	55277000	0.0001074960142544712	2.599207668339954
chr1	60110000	60111000	chr1	60124000	60125000	0.19322931371207366	2.111212657236631
chr1	60446000	60447000	chr1	60459000	60460000	0.18772677728337867	2.599207668339954
chr1	62176000	62177000	chr1	62407000	62408000	1.645776177316982e-05	5.198415336679908
chr1	62180000	62181000	chr1	62407000	62408000	0.09162950944878309	2.111212657236631
chr1	59880000	59881000	chr1	59942000	59943000	0.019402142871441708	2.111212657236631
chr1	63616000	63617000	chr1	64245000	64246000	1.1532565734051481e-05	4.525483399593905
chr1	65959000	65960000	chr1	66243000	66244000	6.364387955537509e-07	5.571523605095194
chr1	66253000	66254000	chr1	66447000	66448000	2.2407036802674085e-05	3.6758347359905126
chr1	67532000	67533000	chr1	67761000	67762000	0.0004813658877778864	3.2
chr1	68117000	68118000	chr1	68332000	68333000	0.05540207118585941	2.599207668339954
chr1	68119000	68120000	chr1	68344000	68345000	2.3734640783201755e-05	5.571523605095194
chr1	68120000	68121000	chr1	68322000	68323000	0.003970168735433349	4.222425314473262
chr1	70132000	70133000	chr1	70510000	70511000	0.020736325968217128	3.2
chr1	84142000	84143000	chr1	84433000	84434000	6.501898864508249e-05	3.2
chr1	78131000	78132000	chr1	78615000	78616000	0.005649500976740371	2.111212657236631
chr1	89801000	89802000	chr1	89989000	89990000	0.0489313992913291	2.599207668339954
chr1	89804000	89805000	chr1	89984000	89985000	0.020683880990569985	3.2
chr1	92859000	92860000	chr1	92993000	92994000	4.20230147606393e-05	3.6758347359905126
chr1	94062000	94063000	chr1	94336000	94337000	0.025447951344766226	3.2
chr1	94532000	94533000	chr1	94803000	94804000	0.01121793676525708	2.111212657236631
chr1	94853000	94854000	chr1	95046000	95047000	0.06889366121647796	2.111212657236631
chr1	94857000	94858000	chr1	95040000	95041000	0.06410723481868681	2.111212657236631
chr1	91530000	91531000	chr1	91927000	91928000	0.0013394100846255328	2.599207668339954
chr1	110207000	110208000	chr1	110312000	110313000	0.000573967156147992	3.2
chr1	110214000	110215000	chr1	110311000	110312000	0.026663498059293307	2.111212657236631
chr1	110522000	110523000	chr1	110613000	110614000	0.08918673928601688	3.2
chr1	110631000	110632000	chr1	110763000	110764000	0.17129192353496844	2.111212657236631
chr1	100464000	100465000	chr1	100866000	100867000	1.1728536570831949e-06	6.4
chr1	101311000	101312000	chr1	101356000	101357000	0.07105042602875537	3.2
chr1	101662000	101663000	chr1	101678000	101679000	0.12616200620090073	2.599207668339954
chr1	107782000	107783000	chr1	108126000	108127000	1.6692939353024627e-06	3.2
chr1	109278000	109279000	chr1	109396000	109397000	0.0017137333362626796	3.6758347359905126
chr1	109777000	109778000	chr1	109983000	109984000	1.5112811002637727e-09	5.198415336679908
chr1	115140000	115141000	chr1	115465000	115466000	0.06180791321841151	2.111212657236631
chr1	112394000	112395000	chr1	112467000	112468000	0.016755969346429117	2.111212657236631
chr1	116080000	116081000	chr1	116094000	116095000	0.19923356135035813	2.111212657236631
chr1	116174000	116175000	chr1	116190000	116191000	0.19923356135035813	2.111212657236631
chr1	116383000	116384000	chr1	116398000	116399000	0.19923356135035813	2.111212657236631
chr1	116397000	116398000	chr1	116410000	116411000	0.19923356135035813	2.111212657236631
chr1	116453000	116454000	chr1	116468000	116469000	0.19923356135035813	3.2
chr1	116507000	116508000	chr1	116521000	116522000	0.19923356135035813	2.111212657236631
chr1	116697000	116698000	chr1	116709000	116710000	0.19937031255857887	2.111212657236631
chr1	116775000	116776000	chr1	116786000	116787000	0.19923356135035813	2.111212657236631
chr1	116792000	116793000	chr1	116809000	116810000	0.19923356135035813	3.2
chr1	116802000	116803000	chr1	116815000	116816000	0.19923356135035813	2.599207668339954
chr1	116847000	116848000	chr1	116860000	116861000	0.19923356135035813	2.111212657236631
chr1	116934000	116935000	chr1	116947000	116948000	0.19923356135035813	2.111212657236631
chr1	116942000	116943000	chr1	116956000	116957000	0.19923356135035813	2.111212657236631
chr1	117031000	117032000	chr1	117047000	117048000	0.19923356135035813	2.111212657236631
chr1	117062000	117063000	chr1	117073000	117074000	0.19923356135035813	2.111212657236631
chr1	117575000	117576000	chr1	117589000	117590000	0.19923356135035813	2.111212657236631
chr1	117677000	117678000	chr1	117691000	117692000	0.1962550806915287	2.111212657236631
chr1	117799000	117800000	chr1	117811000	117812000	0.19923356135035813	2.111212657236631
chr1	118024000	118025000	chr1	118035000	118036000	0.16941306547965643	2.111212657236631
chr1	118103000	118104000	chr1	118119000	118120000	0.16941306547965643	2.599207668339954
chr1	118173000	118174000	chr1	118185000	118186000	0.16941306547965643	2.111212657236631
chr1	118228000	118229000	chr1	118245000	118246000	0.16941306547965643	2.599207668339954
chr1	118429000	118430000	chr1	118443000	118444000	0.16941306547965643	2.111212657236631
chr1	118740000	118741000	chr1	118754000	118755000	0.19321452944499526	2.599207668339954
chr1	118753000	118754000	chr1	118768000	118769000	0.16941306547965643	2.111212657236631
chr1	118887000	118888000	chr1	118900000	118901000	0.16995479083299161	2.111212657236631
chr1	119287000	119288000	chr1	119368000	119369000	0.04793456568179239	2.599207668339954
chr1	119313000	119314000	chr1	119324000	119325000	0.19521665624843593	2.111212657236631
chr1	119415000	119416000	chr1	119433000	119434000	0.19096932254801227	2.111212657236631
chr1	119428000	119429000	chr1	119443000	119444000	0.1739600891041998	2.599207668339954
chr1	119638000	119639000	chr1	119650000	119651000	0.16941306547965643	2.111212657236631
chr1	119798000	119799000	chr1	119809000	119810000	0.17336543518345282	2.111212657236631
chr1	119925000	119926000	chr1	119938000	119939000	0.16941306547965643	2.111212657236631
chr1	145980000	145981000	chr1	146008000	146009000	0.1493799977290164	2.111212657236631
chr1	147173000	147174000	chr1	147242000	147243000	0.0014374088844748556	3.6758347359905126
chr1	147279000	147280000	chr1	147294000	147295000	0.16074546655054636	2.111212657236631
chr1	147749000	147750000	chr1	147762000	147763000	0.17706219066026252	2.111212657236631
chr1	150132000	150133000	chr1	150145000	150146000	0.17095137350687847	2.111212657236631
chr1	151773000	151774000	chr1	151787000	151788000	0.19215158024191575	2.111212657236631
chr1	155091000	155092000	chr1	155168000	155169000	0.159480897728622	3.6758347359905126
chr1	156147000	156148000	chr1	156225000	156226000	0.022761547603978145	2.111212657236631
chr1	156384000	156385000	chr1	156445000	156446000	0.0011295368832320851	3.2
chr1	157090000	157091000	chr1	157156000	157157000	3.066429862474962e-05	2.111212657236631
chr1	157372000	157373000	chr1	157489000	157490000	0.0010367252425936464	2.111212657236631
chr1	159936000	159937000	chr1	160006000	160007000	3.2301757546537857e-06	2.111212657236631
chr1	160834000	160835000	chr1	160950000	160951000	0.1638784122342846	3.6758347359905126
chr1	152094000	152095000	chr1	152107000	152108000	0.19899840408801758	2.111212657236631
chr1	152152000	152153000	chr1	152171000	152172000	0.19899840408801758	2.111212657236631
chr1	152155000	152156000	chr1	152176000	152177000	0.19899840408801758	3.6758347359905126
chr1	152190000	152191000	chr1	152226000	152227000	0.003975386275613224	3.6758347359905126
chr1	152204000	152205000	chr1	152222000	152223000	0.19899840408801758	2.599207668339954
chr1	152381000	152382000	chr1	152395000	152396000	0.19899840408801758	2.599207668339954
chr1	152407000	152408000	chr1	152421000	152422000	0.19899840408801758	2.111212657236631
chr1	152495000	152496000	chr1	152522000	152523000	0.19899840408801758	2.599207668339954
chr1	152622000	152623000	chr1	152635000	152636000	0.19899840408801758	2.111212657236631
chr1	152885000	152886000	chr1	152897000	152898000	0.19899840408801758	2.111212657236631
chr1	152976000	152977000	chr1	152989000	152990000	0.19899840408801758	2.599207668339954
chr1	153137000	153138000	chr1	153148000	153149000	0.19899840408801758	2.111212657236631
chr1	153193000	153194000	chr1	153208000	153209000	0.19899840408801758	2.111212657236631
chr1	153323000	153324000	chr1	153336000	153337000	0.19899840408801758	2.111212657236631
chr1	153368000	153369000	chr1	153392000	153393000	0.19899840408801758	2.111212657236631
chr1	153379000	153380000	chr1	153392000	153393000	0.19899840408801758	2.599207668339954
chr1	153422000	153423000	chr1	153432000	153433000	0.19899840408801758	2.111212657236631
chr1	153493000	153494000	chr1	153505000	153506000	0.19899840408801758	2.111212657236631
chr1	153547000	153548000	chr1	153557000	153558000	0.19899840408801758	2.111212657236631
chr1	153592000	153593000	chr1	153604000	153605000	0.19899840408801758	2.111212657236631
chr1	161791000	161792000	chr1	162036000	162037000	1.0885198708621147e-05	3.6758347359905126
chr1	162061000	162062000	chr1	162278000	162279000	0.15508693523878875	3.2
chr1	157960000	157961000	chr1	158123000	158124000	0.04737669503652403	2.111212657236631
chr1	159161000	159162000	chr1	159197000	159198000	0.010113988257080172	2.111212657236631
chr1	159800000	159801000	chr1	159889000	159890000	0.00881285199960709	3.2
chr1	165257000	165258000	chr1	165369000	165370000	8.81618996950162e-05	2.599207668339954
chr1	165393000	165394000	chr1	165592000	165593000	0.00031088958677380485	2.111212657236631
chr1	165405000	165406000	chr1	165593000	165594000	6.305914034276938e-07	4.222425314473262
chr1	165718000	165719000	chr1	165913000	165914000	0.003620934403087639	2.111212657236631
chr1	166910000	166911000	chr1	167058000	167059000	6.5118657133211855e-09	2.111212657236631
chr1	167599000	167600000	chr1	167687000	167688000	0.000209991103172702	3.6758347359905126
chr1	167750000	167751000	chr1	167762000	167763000	0.1868950141611309	2.111212657236631
chr1	167766000	167767000	chr1	167781000	167782000	0.19713697399408275	2.599207668339954
chr1	171801000	171802000	chr1	171983000	171984000	0.0002542333738390079	3.6758347359905126
chr1	172089000	172090000	chr1	172400000	172401000	0.0005344045801587427	2.111212657236631
chr1	174259000	174260000	chr1	174744000	174745000	2.4663086833243142e-05	3.6758347359905126
chr1	174260000	174261000	chr1	174729000	174730000	0.001507400028620598	3.6758347359905126
chr1	175148000	175149000	chr1	175318000	175319000	0.0019521859855946566	2.599207668339954
chr1	175153000	175154000	chr1	175318000	175319000	4.575040346566084e-09	3.6758347359905126
chr1	175160000	175161000	chr1	175319000	175320000	0.10002513427290345	2.111212657236631
chr1	176897000	176898000	chr1	176947000	176948000	0.18626177715466105	2.599207668339954
chr1	179251000	179252000	chr1	179368000	179369000	0.025714179576586393	2.111212657236631
chr1	179706000	179707000	chr1	179810000	179811000	0.002013988370637626	3.6758347359905126
chr1	179713000	179714000	chr1	179808000	179809000	0.12458831803921386	2.111212657236631
chr1	180520000	180521000	chr1	180863000	180864000	0.014601457922536008	1.8379173679952563
chr1	181190000	181191000	chr1	181424000	181425000	1.3643419727316086e-11	3.2
chr1	182084000	182085000	chr1	182395000	182396000	0.00010733820514275961	2.111212657236631
chr1	182600000	182601000	chr1	182785000	182786000	1.2857381825881475e-11	4.222425314473262
chr1	182603000	182604000	chr1	182777000	182778000	0.08392572736344449	3.2
chr1	184805000	184806000	chr1	184983000	184984000	8.433664298183412e-06	3.6758347359905126
chr1	184810000	184811000	chr1	184977000	184978000	0.008857891861481781	3.2
chr1	192805000	192806000	chr1	193019000	193020000	2.3579275532092936e-06	2.111212657236631
chr1	201999000	202000000	chr1	202119000	202120000	0.005113037648700969	3.2
chr1	202009000	202010000	chr1	202118000	202119000	6.503982801175923e-05	2.599207668339954
chr1	203274000	203275000	chr1	203337000	203338000	0.012061607774007943	3.2
chr1	203340000	203341000	chr1	203489000	203490000	0.003242391194976324	2.111212657236631
chr1	206490000	206491000	chr1	206586000	206587000	0.0070460226706296125	3.2
chr1	206634000	206635000	chr1	206664000	206665000	0.0206846870313096	3.2
chr1	207892000	207893000	chr1	207962000	207963000	0.13916041594010792	2.111212657236631
chr1	207898000	207899000	chr1	207961000	207962000	0.0038976396497772403	3.6758347359905126
chr1	203862000	203863000	chr1	204129000	204130000	0.0002882858020399337	2.599207668339954
chr1	203870000	203871000	chr1	204129000	204130000	0.002608785377341026	2.599207668339954
chr1	204445000	204446000	chr1	204493000	204494000	0.009430864309606022	2.111212657236631
chr1	205275000	205276000	chr1	205320000	205321000	0.00043075823283678094	3.2
chr1	205920000	205921000	chr1	205977000	205978000	0.0017241187746095802	3.2
chr1	200280000	200281000	chr1	200491000	200492000	0.07015248203348275	3.2
chr1	200972000	200973000	chr1	201112000	201113000	0.07015248203348275	2.111212657236631
chr1	201254000	201255000	chr1	201347000	201348000	0.18544232297948585	2.599207668339954
chr1	201254000	201255000	chr1	201361000	201362000	3.694537342724402e-09	2.111212657236631
chr1	201260000	201261000	chr1	201361000	201362000	0.1618378121038908	2.111212657236631
chr1	201710000	201711000	chr1	201793000	201794000	0.00010206870962106707	3.6758347359905126
chr1	201895000	201896000	chr1	201973000	201974000	0.13245335541422565	3.6758347359905126
chr1	209576000	209577000	chr1	209692000	209693000	0.01754647125738526	2.111212657236631
chr1	209861000	209862000	chr1	209989000	209990000	0.002801819785666071	3.6758347359905126
chr1	212068000	212069000	chr1	212288000	212289000	0.00440795717013448	3.6758347359905126
chr1	210143000	210144000	chr1	210311000	210312000	0.006009394886068442	2.599207668339954
chr1	211578000	211579000	chr1	211652000	211653000	0.027186325463972226	2.111212657236631
chr1	214214000	214215000	chr1	214330000	214331000	2.4916954259879276e-06	3.2
chr1	214262000	214263000	chr1	214328000	214329000	0.018937397406556755	3.6758347359905126
chr1	224513000	224514000	chr1	224854000	224855000	0.022073104537229164	2.111212657236631
chr1	223712000	223713000	chr1	223749000	223750000	0.019826464204297276	3.2
chr1	236108000	236109000	chr1	236491000	236492000	0.006056305533131967	2.599207668339954
chr1	236112000	236113000	chr1	236491000	236492000	0.015366736230005756	2.599207668339954
chr1	240026000	240027000	chr1	240149000	240150000	1.1829884760561349e-05	2.111212657236631
chr1	244602000	244603000	chr1	244796000	244797000	0.00033434594839602355	2.599207668339954
chr1	248134000	248135000	chr1	248150000	248151000	0.16454941793616168	2.599207668339954

Running on all chromosomes only reports loops for chromosome X:

docker run -i -v ${PWD}:/mnt mblanche/mustache mustache -p 24 -f /mnt/test_xlarge.cool -r 1000 -o /mnt/test_all.tsv

Output Generated:

cat test_all.tsv 
chrX	9995000	9996000	chrX	10118000	10119000	1.1212605821597776e-06	2.599207668339954
chrX	100685000	100686000	chrX	100768000	100769000	2.022866812723123e-06	3.6758347359905126

Loop calling on a very small chromosome

Hello,

I'm interested in running mustache on Drosophila Hi-C data. Chromosome 4 in Drosophila is very small, only 1.3 Mb, and mustache fails on this chromosome with the error below.

Is it possible to get this to work with different parameters? What would you suggest?

The size of the region it's trying to extract seems to depend on the resolution, as it was [0, 8000000) when I tried this at 4 kb resolution, otherwise my first thought would have been to try changing the distance limit to less than the chromosome size.

Command I used:
mustache -f data/hic/cooler_files/wt_Rep1_2kb.mcool -o data/hic/loops/wt_Rep1_2kb_4_mustache_loops_default.tsv -ch 4 -r 2kb -p 4

Error:

The distance limit is set to 2Mbp
Reading contact map...
Traceback (most recent call last):
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/mustache/mustache.py", line 464, in read_mcooler
    temp = clr.matrix(balance=True,sparse=True).fetch( (chr1, int(start), int(end)))
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/cooler/core.py", line 573, in fetch
    i0, i1, j0, j1 = self._fetch(*args, **kwargs)
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/cooler/api.py", line 384, in _fetch
    region1 = parse_region(region, self._chromsizes)
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/cooler/util.py", line 181, in parse_region
    raise ValueError("Genomic region out of bounds: [{}, {})".format(start, end))
ValueError: Genomic region out of bounds: [0, 4000000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/research/vaquerizas/liz/project/.venv/bin/mustache", line 8, in <module>
    sys.exit(main())
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/mustache/mustache.py", line 904, in main
    o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/mustache/mustache.py", line 773, in regulator
    x, y, v = read_mcooler(f, distance_in_bp, chromosome,chromosome2, res)
  File "/home/research/vaquerizas/liz/project/.venv/lib/python3.8/site-packages/mustache/mustache.py", line 493, in read_mcooler
    raise NameError('Reading from the file failed!')
NameError: Reading from the file failed!

I installed mustache today, if the version is relevant (it might be nice to be able to access the version number with mustache --version and/or print it in the log files?)

Differential loop calling with .cool rather than .hic

Hello, is the differential loop tool intended to work with .cool files or only .hic? I ask because when I input .cool files, I get this error:

Traceback (most recent call last):
  File "./mustache/mustache/diff_mustache.py", line 1342, in <module>
    main()
  File "./mustache/mustache/diff_mustache.py", line 1270, in main
    o = regulator(f1, f2, args.norm_method, CHRM_SIZE, args.outdir,
  File "./mustache/mustache/diff_mustache.py", line 1037, in regulator
    x1, y1, v1, res = read_cooler(f1, distance_in_bp, chromosome,chromosome2, norm_method)
NameError: name 'read_cooler' is not defined

Text format input, chromosome naming mustache internal inconsistent with input file

Hi,

Thank you for developing such a great tool for loop calling.

I used "text format" as input which chromosome naming follows "chrn", e.g. "chr1", pattern. However, to run mustache, -ch parameter has to set as -ch 1 rather than -ch chr1. And the output bedpe loop result also using "1" as chromosome name which is inconsistence with the input. Will that be a quick fix that I can modify in the source code?

Thank you,
Hanbin

.hic -ch error/Docker update?

Hi @aylab,

I am trying to run mustache right now from a .hic file and it keeps throwing errors due to options not recognized (-st), and -chr needing to have at least one argument. I would like to run for all chromosomes. I tried running -ch all and -ch and I tried inputting multiple chromosomes in different ways (with/without commas and chr) and mustache is unable to proceed.
See examples of error messages below:

usage: mustache [-h] [-f F_PATH] -o OUTDIR -r RESOLUTION [-bed BED] [-m MAT] [-b BIASFILE] [-pt PT] [-sz S_Z] [-oc OCTAVES] [-i S] [-p NPROCESSES] -ch CHROMOSOME [-v VERBOSE] mustache: error: unrecognized arguments: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X

usage: mustache [-h] [-f F_PATH] -o OUTDIR -r RESOLUTION [-bed BED] [-m MAT] [-b BIASFILE] [-pt PT] [-sz S_Z] [-oc OCTAVES] [-i S] [-p NPROCESSES] -ch CHROMOSOME [-v VERBOSE] mustache: error: unrecognized arguments: chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX

I think this might be due to the version of mustache in the image versus the current available. Are there any plans to update the Dockerfile/Docker Image with the latest version of mustache? It would be greatly appreciated!!

Sincerely,
Jaclyn

First 20MB of called loops missing for each chromosome.

Hi,

I am testing out Mustache for several hic samples and found when checking the loops on Juicer that there is no loops called for the first 20MB of all chromosomes. This is the case for all hic normalizations and all .hic files I've tried. I noticed the first segment of genome for each chromosome is 0 to 20000000, is this related?

Thanks

Below is output for chr1 for one of my samples.

The distance limit is set to 2000000bp
Reading contact map...
0 20000000
18000000 38000000
36000000 56000000
54000000 74000000
72000000 92000000
90000000 110000000
108000000 128000000
126000000 146000000
144000000 164000000
162000000 182000000
180000000 200000000
198000000 218000000
216000000 236000000
234000000 248956421
Normalizing contact map...
Loop calling...
Starting block 1/14...
Block 1 done.
Starting block 8/14...
Block 8 done.
Starting block 7/14...
Block 7 done.
Starting block 5/14...
Block 5 done.
Starting block 6/14...
Block 6 done.
Starting block 4/14...
Block 4 done.
Starting block 2/14...
Block 2 done.
Starting block 9/14...
Block 9 done.
Starting block 3/14...
Block 3 done.
Starting block 10/14...
Block 10 done.
Starting block 11/14...
Block 11 done.
Starting block 13/14...
Block 13 done.
Starting block 14/14...
Block 14 done.
Starting block 12/14...
Block 12 done.
1471 loops found for chrmosome=chr1, fdr<0.2 in 64.72sec

Mustache crashes on hic file

Hi, I'm trying to run mustache using a hic file given to me by a collaborator. I installed mustache using conda and was able to run the tests successfully. However, when I run my command (see below) I get the following error (see further below). I've had them re-generate the hi-c file and still get the same error. Any guidance would be greatly appreciated. Thank you!

Command:

mustache -p 64 -f mysample.hic -d 500000 -r 10000 -o mysample_mustache_test1 2>mysample_mustache_test1.err &

Error:

Traceback (most recent call last):
File "/home/cfeigin/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 396, in read_hic_file
temp = straw.straw("KR", f, str(chr1)+":"+str(int(start))+":"+str(int(end)), str(chr2)+":"+str(int(start))+":"+str(int(end)), "BP", res)
File "/home/cfeigin/.conda/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 511, in straw
myFilePos=list1[0]
TypeError: 'int' object is not subscriptable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cfeigin/.conda/envs/mustache/bin/mustache", line 8, in
sys.exit(main())
File "/home/cfeigin/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 1137, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/home/cfeigin/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 948, in regulator
x, y, v = read_hic_file(f, norm_method, CHRM_SIZE, distance_in_bp, chromosome,chromosome2, res)
File "/home/cfeigin/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 426, in read_hic_file
temp = straw.straw("VC", f, str(chr1)+":"+str(int(start))+":"+str(int(end)), str(chr2)+":"+str(int(start))+":"+str(int(end)), "BP", res)
File "/home/cfeigin/.conda/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 511, in straw
myFilePos=list1[0]
TypeError: 'int' object is not subscriptable

Large loops

Hi,
I love mustache, it is very good at detecting loops. I have an issue with large loops. The detection limit is set to 4Mb, whereas I can see in my HiC map (2kb resolution) a number of loops which are larger than this. Is there a way to avoid the 4Mb size limit (let's say to put it to chromosome size)? Thanks for your answer! Peter

mustache crashing on human chrY in newly created .cool file

Hi, I'm running mustache how of a freshly made docker image where I installed only mustache using pip3. Mustache runs fine expect for chrY. Any chance you could fix that issue? It's a problem for what we try to achieve.

Here's a report with an example crashing.

Problem:
mustache is crashing when processing chrY

How to reproduce:

wget https://dovetail-public.s3-us-west-2.amazonaws.com/publicData/mustacheEx/test.cool
docker  run -v ${PWD}:/mnt -i mblanche/mustache mustache -p 24 -f /mnt/test.cool  -r 5kb -ch chrY -o test_chrY.tsv

Error generated:

The distance limit is set to 2Mbp
Reading contact map...
0 4000000.0
2000000.0 6000000.0
4000000.0 8000000.0
6000000.0 10000000.0
8000000.0 12000000.0
10000000.0 14000000.0
12000000.0 16000000.0
14000000.0 18000000.0
16000000.0 20000000.0
18000000.0 22000000.0
22000000.0 26000000.0
24000000.0 28000000.0
24000000.0 28000000.0
26000000.0 30000000.0
30000000.0 34000000.0
32000000.0 36000000.0
34000000.0 38000000.0
36000000.0 40000000.0
38000000.0 42000000.0
40000000.0 44000000.0
42000000.0 46000000.0
44000000.0 48000000.0
46000000.0 50000000.0
48000000.0 52000000.0
50000000.0 54000000.0
52000000.0 56000000.0
54000000.0 58000000.0
Traceback (most recent call last):
  File "/usr/local/bin/mustache", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 966, in main
    o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 795, in regulator
    x, y, v, res = read_cooler(f, distance_in_bp, chromosome,chromosome2)
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 396, in read_cooler
    temp = clr.matrix(balance=True,sparse=True).fetch( (chr1, int(start), int(end)))
  File "/usr/local/lib/python3.8/dist-packages/cooler/core.py", line 573, in fetch
    i0, i1, j0, j1 = self._fetch(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/cooler/api.py", line 384, in _fetch
    region1 = parse_region(region, self._chromsizes)
  File "/usr/local/lib/python3.8/dist-packages/cooler/util.py", line 181, in parse_region
    raise ValueError("Genomic region out of bounds: [{}, {})".format(start, end))
ValueError: Genomic region out of bounds: [54000000, 58000000)

Dockerfile use to generate the image container:

FROM ubuntu:20.04

USER root

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -y\
    && DEBIAN_FRONTEND=noninteractive \
    && apt-get install -y \
    python3 \
    python3-pip \
    && pip3 install \
    mustache-hic \
    && apt-get clean \
    && apt-get purge \
    && rm -rf /var/lib/apt/lists/* /tmp/*

Differential option

Hi, can you explain how the differential loop calling works? Does it just call loops separately on both datasets or is there some merging procedure to make sure it doesnt call a loop in condition 1 and the same loop, two pixels, over in condition 2?

Additionally, does it account for read depth differences in the two Hi-C files? It should do this when calculating O/E, correct?

Strange error from diff-mustache

Hi, I am running mustache with the following parameters and I get this error:

Traceback (most recent call last):
File "/Genomics/grid/users/gdolsten/.conda/envs/mustache/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Genomics/grid/users/gdolsten/.conda/envs/mustache/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "./github_mustache/mustache/diff_mustache.py", line 1230, in process_block
loops1, diff_loops1, loops2, diff_loops2 = diff_mustache(
ValueError: not enough values to unpack (expected 4, got 0)

Do you know what the source of this is?

Different number of loops when I'm using diff_mustache.py

Hi!
I’m using the new differential tool detection of mustache and I have a question regarding the results.
I’m doing pairwise analysis of 3 samples : e.g A, B and C

When I’m doing A vs B and I'm obtaining 6,090 loops for sample A
But, when I’m doing the A vs C and I'm obtaining 7,548 loops for sample A

I would expect that the number of loops is the same, or maybe I’m missing something?

The parameters that I’m using are: -r 10000 -st 0.8 -norm KR -pt 0.05 -pt 2 0.1

TypeError on FitHiC input

Ran mustache on a sample using .hic format, worked fine for 5kb, 10kb, and 50kb resolution. The .hic file didn't have 25kb resolution, so I used HiC-Pro's hicpro2fithic.py to convert to FitHiC format. Error log follows:

/home/ochapman/miniconda3/envs/mustache/bin/mustache:8: DtypeWarning: Columns (0) have mixed types.Specify dtype option on import or set low_memory=False.
  sys.exit(main())


The distance limit is set to 2Mbp
Reading contact map...
Traceback (most recent call last):
  File "/home/ochapman/miniconda3/envs/mustache/bin/mustache", line 8, in <module>
    sys.exit(main())
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 904, in main
    o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 775, in regulator
    x, y, v = read_pd(f, distance_in_bp, bias, chromosome, res)
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 249, in read_pd
    df = df[np.vectorize(is_chr)(df[0], chromosome)]
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2091, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2161, in _vectorize_call
    ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2121, in _get_ufunc_and_otypes
    outputs = func(*inputs)
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 185, in is_chr
    return str(c) in re.findall("[1-9][0-9]*", s)
  File "/home/ochapman/miniconda3/envs/mustache/lib/python3.8/re.py", line 239, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

Input files:

(base) [ochapman@comet-14-01 input_25000]$ head fithic.interactionCounts
1       12500   1       187500  2
1       12500   1       54462500        1
1       12500   2       112500  1
1       12500   2       113587500       1
1       12500   2       194012500       1
1       12500   3       192862500       1
1       12500   4       487500  1
1       12500   6       115062500       1
1       12500   7       73687500        1
1       12500   7       112387500       1
(base) [ochapman@comet-14-01 input_25000]$ head fithic.biases
1       12500   0.0443452857034
1       37500   0.0180297825985
1       62500   0.250643862343
1       87500   0.248515390376
1       112500  0.316972901384
1       137500  0.177332034246
1       162500  -1
1       187500  0.212476913977
1       212500  0.0426961031691
1       237500  -1

Interchromosomal analysis

Thank you for developing this great tool. I am currently doing interchromosomal analysis using a command:

$ mustache -f MiC-5_merged.mapped.gg7w.chr.contact_map.hic -ch chr14 -ch2 chr17 -r 10kb -o result_trans-loop_14+17_10kb.tsv

But it keeps showing:
NameError: name 'inter_normalize_map' is not defined

I tried different parameters and looked into the .py file:
else:
n1 = max(x) + 1
n2 = max(y) + 1
inter_normalize_map(v)

I also tried to use .cool file but it showed different errors...
I don't know how to resolve the issue. Please give me a hint.

Many thanks,

Ya-Chen


# the full output of using a .hic file is as the following:

$ mustache -f MiC-5_merged.mapped.gg7w.chr.contact_map.hic -ch chr14 -ch2 chr17 -r 10kb -o result_trans-loop_14+17_10kb.tsv

The distance limit is set to 2000000bp
Reading contact map...
0 15370873
HiC version: 8
Normalizing contact map...
Traceback (most recent call last):
File "/home/galaxy/anaconda2/envs/mustache/bin/mustache", line 8, in
sys.exit(main())
File "/home/galaxy/anaconda2/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 1137, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/home/galaxy/anaconda2/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 1009, in regulator
inter_normalize_map(v)
NameError: name 'inter_normalize_map' is not defined


# the full output of using a .cool file is as the following:

$ mustache -f MiC-5_merged.mapped.gg6ens.matrix_1kb.cool -ch 14 -ch2 17 -r 10kb -p 8 -o mic_our_trans-loop_MiC-5_ch14+17_10kb_p0.1.tsv

The distance limit is set to 2000000bp
Reading contact map...
Your cooler data resolution is 1000
Traceback (most recent call last):
File "/home/galaxy/anaconda2/envs/mustache/bin/mustache", line 8, in
sys.exit(main())
File "/home/galaxy/anaconda2/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 1137, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/home/galaxy/anaconda2/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 950, in regulator
x, y, v, res = read_cooler(f, distance_in_bp, chromosome,chromosome2, norm_method)
File "/home/galaxy/anaconda2/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 540, in read_cooler
result = clr.matrix(balance=True,sparse=True).fetch(chr1, chr2)
File "/home/galaxy/.local/lib/python3.8/site-packages/cooler/core.py", line 574, in fetch
return self._slice(self.field, i0, i1, j0, j1)
File "/home/galaxy/.local/lib/python3.8/site-packages/cooler/api.py", line 362, in _slice
return matrix(
File "/home/galaxy/.local/lib/python3.8/site-packages/cooler/api.py", line 684, in matrix
raise ValueError(
ValueError: No column 'bins/weight'found. Use cooler.balance_cooler to calculate balancing weights or set balance=False.

Running out of memory

Hi,
Thanks for the tool! I tried running it on a Micro-C dataset, at 500bp resolution, but run into problems with a huge amount of memory requested (#SBATCH --mem=240000 --cpus-per-task=24)
"""
Reading contact map...
Traceback (most recent call last):
File "./mustache/mustache/mustache.py", line 613, in
main()
File "./mustache/mustache/mustache.py", line 591, in main
o = regulator(f, args.outdir,
File "./mustache/mustache/mustache.py", line 507, in regulator
c, n = read_cooler(f, chromosome)
File "./mustache/mustache/mustache.py", line 252, in read_cooler
result = clr.matrix(balance=True).fetch(chr)
File "/Genomics/grid/users/xbing/miniconda3/envs/mustache/lib/python3.8/site-packages/cooler/core.py", line 574, i
n fetch
return self._slice(self.field, i0, i1, j0, j1)
File "/Genomics/grid/users/xbing/miniconda3/envs/mustache/lib/python3.8/site-packages/cooler/api.py", line 362, in
_slice
return matrix(
File "/Genomics/grid/users/xbing/miniconda3/envs/mustache/lib/python3.8/site-packages/cooler/api.py", line 746, in
matrix
arr = arr * np.outer(bias1, bias2)
File "<array_function internals>", line 5, in outer
File "/Genomics/grid/users/xbing/miniconda3/envs/mustache/lib/python3.8/site-packages/numpy/core/numeric.py", line
906, in outer
return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
MemoryError: Unable to allocate 412. GiB for an array with shape (235138, 235138) and data type float64
"""
What can you suggest I do?

"max() arg is an empty sequence" error

Hello,
We are trying the tool on some data in the lab using a *.hic file as the input. We ran it through all chromosomes (1-22), and all expect chr1 worked fine. We got the following error running it on chr1:


The distance limit is set to 2Mbp
Reading contact map...
HiC version: 8
Normalizing contact map...
Traceback (most recent call last):
File "/usr/local/bin/mustache", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/mustache/mustache.py", line 904, in main
octaves=args.octaves)
File "/usr/local/lib/python3.6/site-packages/mustache/mustache.py", line 768, in regulator
n = max(max(x), max(y)) + 1
ValueError: max() arg is an empty sequence

Any thoughts? Like I said, all the other chromosomes were fine, and this was specific to chr1. Thanks.
-Jesse

Diff_mustache error in regulator function

In the latest version of diff_mustache, I receive the following fatal error:

There is no contact in chrmosome 21 to work on.
Traceback (most recent call last):
  File "Tools/mustache/mustache/diff_mustache.py", line 1342, in <module>
    main()
  File "Tools/mustache/mustache/diff_mustache.py", line 1270, in main
    o = regulator(f1, f2, args.norm_method, CHRM_SIZE, args.outdir,
  File "Tools/mustache/mustache/diff_mustache.py", line 1044, in regulator
    x2, y2, v2 = read_hic_file(f2, norm_method, CHRM_SIZE, distance_in_bp, chromosome,chromosome2, res)
ValueError: too many values to unpack (expected 3)

I thought the issue might have be at line 474 where the function read_hic_file returns 4 values instead of 3 if len(val)==0

if len(val)==0:
        print(f'There is no contact in chrmosome {chr1} to work on.')
        return [],[],[],res
    else:
        val[np.isnan(val)] = 0

I tried manually removing ",res" from line 474. After this change, the error at line 1044 in regulator is fixed but I get the following error and the program does not complete the run:

There is no contact in chrmosome 21 to work on.
Normalizing contact map...
Traceback (most recent call last):
  File "Tools/mustache/mustache/diff_mustache.py", line 1342, in <module>
    main()
  File "Tools/mustache/mustache/diff_mustache.py", line 1270, in main
    o = regulator(f1, f2, args.norm_method, CHRM_SIZE, args.outdir,
  File "Tools/mustache/mustache/diff_mustache.py", line 1063, in regulator
    n2 = max(max(x2), max(y2)) + 1
ValueError: max() arg is an empty sequence

TypeError: nan_to_num() got an unexpected keyword argument 'neginf'

Hi there,
Trying to process fairly deep .hic data set. Got default test analysis to work but when submitting this command to my cluster the pipeline crashes:
mustache.py -f $(pwd)/E1.inter_1.hic -ch 1 -r 5kb -pt 0.01 -o $(pwd)/E1.VC_SQRT.q1.r10kb.p0.1.out.tsv >$(pwd)/mustache/E1/E1.VC_SQRT.q1.r10kb.p0.1.log 2>&1

The distance limit is set to 2Mbp
Reading contact map...
0 10000000
8000000 18000000
16000000 26000000
24000000 34000000
32000000 42000000
40000000 50000000
48000000 58000000
56000000 66000000
64000000 74000000
72000000 82000000
80000000 90000000
88000000 98000000
96000000 106000000
104000000 114000000
112000000 122000000
120000000 130000000
128000000 138000000
136000000 146000000
144000000 154000000
152000000 162000000
160000000 170000000
168000000 178000000
176000000 186000000
184000000 194000000
192000000 202000000
200000000 210000000
208000000 218000000
216000000 226000000
224000000 234000000
232000000 242000000
240000000 248956421
Normalizing contact map...
Traceback (most recent call last):
File "/home/frank/tools/mustache/mustache/mustache.py", line 1178, in
main()
File "/home/frank/tools/mustache/mustache/mustache.py", line 1148, in main
octaves=args.octaves)
File "/home/frank/tools/mustache/mustache/mustache.py", line 961, in regulator
pval_weights = normalize_sparse(x, y, v, res, distance_in_px)
File "/home/frank/tools/mustache/mustache/mustache.py", line 720, in normalize_sparse
neginf=std2, posinf=std2, nan=std2)
TypeError: nan_to_num() got an unexpected keyword argument 'neginf'

Any help would be greatly appreciated.

F

Mustache and ChIA-PET experiments

Thanks for the Mustache tool. It's extremely fast and well written.

I have a following question: can I use Mustache with ChIA-PET experiments? I tried to run it on CTCF-targeted ChIA-PET on GM12878 with the following parameters:

pip install mustache-hic cooler
wget https://data.4dnucleome.org/files-processed/4DNFIMH3J7RW/@@download/4DNFIMH3J7RW.mcool
cooler balance --nproc 16 4DNFIMH3J7RW.mcool::/resolutions/5000

mustache -f 4DNFIMH3J7RW.mcool -r 5kb -o mustache_output_5kb.tsv -ch chr1 -st 0.6 -pt 0.3 -p 16

but got only about 200 chromatin loops detected in chr1.

The HiCPeaks reports aprox. 2400 loops in chr1 while run:

pip install cooler statsmodels sklearn matplotlib hicpeaks
wget https://data.4dnucleome.org/files-processed/4DNFIMH3J7RW/@@download/4DNFIMH3J7RW.mcool
cooler balance --nproc 16 4DNFIMH3J7RW.mcool::/resolutions/5000
pyHICCUPS -O pyhiccups_5kb_chr1.tsv -p 4DNFIMH3J7RW.mcool::/resolutions/5000 --pw 4 --ww 7 --only-anchors --nproc 16 -C 1

Is such difference by design? Should use different parameters while analyzing ChIA-PET experiments?

Resolution of calling loops

Hello,
Thanks for developing this good tool to call chromatin loops. Our HiC sample just has ~150million valid pairs, do you think mustache can be used to call loops? and what resolution should be used?
Best,

[bug] analysis of front end of chr skipped at diffrent bin sizes

When I analyze juicer .hic files with vanilla settings at different resolutions, the beginning of the chromosome is not analyzed at 2000 fold resolution distance. 20kb rsolution skips the first 40MB even though it set -d 2000000 for each analysis.

image

/home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 20000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r20k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes &
/home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 10000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r10k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes &
/home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 5000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r5k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes &
/home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 3000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r3k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes &
/home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 2000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r2k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes

differential loop analysis

Dear developers of Mustache,

thank you for this interesting tool. I would like to run differential analysis on 2 hic samples but I tend to get error:

$ mustache -f1 sample1.hic -f2 sample2.hic -r 25000 -pt 0.05 -pt2 0.05 -o output
usage: mustache [-h] [-f F_PATH] [-d DISTFILTER] -o OUTDIR -r RESOLUTION [-bed BED] [-m MAT] [-b BIASFILE] [-cz CHRSIZE_FILE] [-norm NORM_METHOD] [-st ST] [-pt PT]
                [-sz S_Z] [-oc OCTAVES] [-i S] [-p NPROCESSES] [-ch CHROMOSOME [CHROMOSOME ...]] [-ch2 CHROMOSOME2 [CHROMOSOME2 ...]] [-v VERBOSE]
mustache: error: argument -p/--processes: invalid int value: 't2'

Can you please help me to debug?

Thank you very much.

Best,
Tereza

Different output using hic format (v1.2.7)

Hi,

I'm running mustache v1.2.7 and I'm having a similar problem as the one in #26. I'm using the same dataset from 4DN (4DNFIPC7P27B.hic & 4DNFI9FVHJZQ.mcool) with the same options (for the sake of speed, I'm now using only chr18):

python -m mustache -f 4DNFIPC7P27B.hic -r 5kb -ch 18 -v VERBOSE -norm KR -pt 0.01
python -m mustache -f DNFI9FVHJZQ.mcool -r 5kb -ch chr18 -v VERBOSE -norm KR -pt 0.01

When using mustache v1.2.0, I'm getting the same results as mentioned in #26. That is, 836 loops using .mcool file; and 838 loops using .hic file. However, if I use the version 1.2.0 and the same commands, I get a different number of loops if the input is the .hic file: 722 loops (with the .mcool file I obtain the same number of loops as in v1.2.0: 836).

I tried converting the .hic file to .cool using hicConvertFormat from the HiCExplorer tool:

hicConvertFormat -m 4DNFIPC7P27B.hic -o 4DNFIPC7P27B_converted.cool --inputFormat hic --outputFormat cool -r 5000

And then I ran mustache v1.2.7 with the converted matrix (4DNFIPC7P27B_converted_5000.cool):

python -m mustache -f 4DNFIPC7P27B_converted_5000.cool -r 5kb -ch 18 -v VERBOSE -norm KR -pt 0.01

And I got 838 loops (that is, the same number of loops as with v1.2.0 and the .hic file).

So, I wonder if there may be a bug in the analysis of .hic files in this new version of mustache.

Many thanks!

Paula

How to compare loops of Hi-C from different conditions?

Hi:
Thanks for your great works! I am wondering how to compare the loops quantitatively?For example, if we got two Hi-C samples under different conditions.
Option 1: we can call loops for A and B Hi-C separately. Then we can merged the LoopA and LoopB (sometimes, loops results differ too much which is inconsistent with hic results in juicebox ). Nest step, how should we calculate the relative contact intensity for each loops in LoopA+LoopB. Just DEG analysis in RNA-Seq, can we identify those gain/loss loops in different conditions?
Option 2: we can merge the two hic files and call loops in an artificial merged hic samples. Then key question is also how to calculate the contact intensity for each loops and found those significantly differed?
In summary, how to call loops for two or more Hi-Cs and how to compare them?
Thank you for your time.
Best wishes!
Linhua

issue related to .hic with kr normalization

hi,

Thanks a lot for providing such a useful tool for detection of loops as well as diff-loops. I realized that the default normalization method for .hic file is KR. Are different samples (.hic files) with KR normalization comparable?

no module named 'straw'

Hi,

Recently I decided to use this package to identify chromatin loops, and when I run this command ( python3 ./mustache/mustache/mustache.py -f ./mustache/data/chr21_5kb.RAWobserved -b ./mustache/data/chr21_5kb.KRnorm -ch 21 -r 5kb -o chr21_out5.tsv -pt 0.1 -st 0.8 ), an error (no module named straw) appeared. Then I installed the hic-straw through pip, however, this error was not solved.

Could you please tell me how to solve this issue?

Error: Couldn't find specified contact files

Hi,

I ran the following code:
mustache -f raw contatc map path -m raw matrix from HiC-Pro -norm ICE -bed raw bias file from HiC-Pro

I got the Error: Couldn't find specified contact files

do you know how to deal with this issue?
Best,

Out dated conda environment file

Hi mustache team,

I noticed in a previous issue. People complained about No module named 'straw'. It was because they installed newer version of hic-straw (typically greater than 1.3) which changed import straw to import hic-straw. You have quickly fixed the issue by changing the python import code. However, in your environment.yml, it still requires hic-straw==0.6.0 and only import straw works. Would you please update your environment file so that when people do conda install, they don't run into ModuleNotFoundError?

Different outputs using cooler and hic inputs with the same option.

Hi:
Thanks for this great tool. We have tested mustache.py for a hic from plants. We adjusted distance by adding "distFilter = 3000000" at line 883 based on #10. Here we got a hic file called Ctrl.hic. I tested mustache from two different input formats: hic and cooler. cooler file was generated by hic2cool and balanced by cooler balance. However, results differ a lot as shown below. Using hic file, we got 269 loops, While using cooler file, we got 29 loops.

The detailed process:

hic2cool convert -r 10000 Ctrl.hic temp_Ctrl_10K.cool
cooler balance temp_Ctrl_10K.cool

mustache.py -f temp_Ctrl_10K.cool -ch 1 -r 10kb -o TEST_Ctrl_cooler_outK10_Chr1.tsv -d 10000000

############
10000000
############
Reading contact map...
0 30427671
10000000.0 30427671
20000000.0 30427671
30000000.0 30427671
16.04 seconds to read the contact map
Normalizing the contact map...
5.92 seconds to run the sparse normalization
Loop calling...
Starting block 1/3...
Starting block 2/3...
Starting block 3/3...
Block 1 done.
Block 2 done.
Block 3 done.
29 loops found for chrmosome=1, fdr<0.2 in 30.57sec

mustache.py -f Ctrl.hic -ch 1 -r 10kb -o TEST_Ctrl_hic_outK10_Chr1.tsv -d 10000000

############
10000000
############
Reading contact map...
HiC version:  8
8.47 seconds to read the contact map
Normalizing the contact map...
6.53 seconds to run the sparse normalization
Loop calling...
Starting block 1/3...
Starting block 2/3...
Starting block 3/3...
Block 1 done.
Block 2 done.
Block 3 done.
269 loops found for chrmosome=1, fdr<0.2 in 26.19sec

Thanks again!
Linhua

Mustache crashes on chrY using mcool files (cool files are fine now!)

Real sorry, keep finding small bugs... I guess you'll need to make the changes you made for the cool file also work for mcool file

Problem
Mustache crashes with IndexError: list index out of range error when processing chrY in .mcool file (similar to the previous issue #17 )

How to reproduce it:
Download test data set and pull latest docker image:

wget https://dovetail-public.s3-us-west-2.amazonaws.com/publicData/mustacheEx/test_small.mcool
docker pull mblanche/mustache

Run mustache 1.1.2 from Docker image:

docker run -i -v ${PWD}:/mnt mblanche/mustache mustache -p 24 -f /mnt/test_small.mcool -o test.tsv -ch chrY -r 1000

Error Produced:

The distance limit is set to 2000000bp
Reading contact map...
0 4000000
2000000 6000000
4000000 8000000
6000000 10000000
8000000 12000000
10000000 14000000
12000000 16000000
14000000 18000000
16000000 20000000
18000000 22000000
20000000 24000000
22000000 26000000
24000000 28000000
26000000 30000000
28000000 32000000
30000000 34000000
32000000 36000000
34000000 38000000
36000000 40000000
38000000 42000000
40000000 44000000
42000000 46000000
44000000 48000000
46000000 50000000
48000000 52000000
50000000 54000000
52000000 56000000
54000000 57227414
Traceback (most recent call last):
  File "/usr/local/bin/mustache", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 1112, in main
    o = regulator(f, args.norm_method, args.cooler_do_balance, CHRM_SIZE, args.outdir,
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 930, in regulator
    x, y, v = read_mcooler(f, distance_in_bp, chromosome,chromosome2, res, cooler_do_balance)
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 623, in read_mcooler
    x = np.array(result[0])
IndexError: list index out of range

Expected behavior:
Mustache should complete with exit 0 producing a .tsv file with loops or only header if no loops are found

Diff_mustache vs regular mustache

Will diff_mustache identify the exact same loops as regular mustache on the two separate data sets, or will diff mustache detect new loops that regular mustache cannot?

diff_mustache.py crashing on input file format

I'm trying to run diff_mustache.py on two hic files with the example command line, and getting the following error:

The distance limit is set to 2000000bp
Traceback (most recent call last):
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 1438, in <module>
    main()
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 1323, in main
    chrs, resolutions, masterindex, genome, metadata = read_header(hic)
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 277, in read_header
    key = readcstr(req)
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 237, in readcstr
    return buf.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte

Looks like the hic format is wrong. I tried to use the allValidPairs output of HiC-Pro, converted with Juicer. Should I use something else?

Genomic region out of bounds

Hi,

I'm trying to use mustache to call for significant loops on a ICE balanced contact matrix from a capture HiC experiment targeting a 3Mb locus. I installed mustache throught conda. Here's the command I used :

mustache -f capHiC_iced_5kb.cool -ch chrX -r 5kb -pt 0.05 -cz hg38chrom.sizes -o out_capHiC_iced_5kb.tsv

and the error message I get :

The distance limit is set to 2Mbp
Reading contact map...
8000000 18000000
16000000 26000000
24000000 34000000
32000000 42000000
40000000 50000000
48000000 58000000
56000000 66000000
64000000 74000000
64000000 74000000
72000000 82000000
88000000 98000000
96000000 106000000
104000000 114000000
112000000 122000000
120000000 130000000
128000000 138000000
136000000 146000000
144000000 154000000
152000000 162000000
Traceback (most recent call last):
File "/home/emmanuel/miniconda3/envs/mustache/bin/mustache", line 8, in
sys.exit(main())
File "/home/emmanuel/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 966, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/home/emmanuel/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 795, in regulator
x, y, v, res = read_cooler(f, distance_in_bp, chromosome,chromosome2)
File "/home/emmanuel/miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 396, in read_cooler
temp = clr.matrix(balance=True,sparse=True).fetch( (chr1, int(start), int(end)))
File "/home/emmanuel/miniconda3/envs/mustache/lib/python3.8/site-packages/cooler/core.py", line 573, in fetch
i0, i1, j0, j1 = self._fetch(*args, **kwargs)
File "/home/emmanuel/miniconda3/envs/mustache/lib/python3.8/site-packages/cooler/api.py", line 384, in _fetch
region1 = parse_region(region, self._chromsizes)
File "/home/emmanuel/miniconda3/envs/mustache/lib/python3.8/site-packages/cooler/util.py", line 181, in parse_region
raise ValueError("Genomic region out of bounds: [{}, {})".format(start, end))
ValueError: Genomic region out of bounds: [152000000, 162000000)

The way I understand it, is that mustache finds X-linked data beyond the annotated size of the chrX (156040895 bp) but I check with cooler dump there is contact frequencies for my captured locus and only for this locus. Do you have a more precise idea of what's going on and how to solve it ?

Cheers,

Emmanuel

-ch parameter

Is there any way to run it for all chromosomes? What inputs can one give to this parameter?

how to generate the z-scaore?

by the follwing command, the figure I generate has no z-score. wonder how to calculate the z-score or something I miss?
eg. mustache -f A.hic -r 500kb -pt 0.1 -st 0.3 -norm KR -o A.tsv

mustache.py does not take .hic file as input

I'm trying to run mustache.py with my .hic files, but I am getting the following error:

mustache.py  -f ./micro-c/hic_out_test_length/hic_results/50bp_allValidPairs.hic -ch 1 -r 5kb -o chr1_out5.tsv -pt 0.1 -st 0.8

The distance limit is set to 2Mbp

Reading contact map...
Traceback (most recent call last):
  File "./mustache/mustache/mustache.py", line 1178, in <module>
    main()
  File "./mustache/mustache/mustache.py", line 1135, in main
    o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
  File "./mustache/mustache/mustache.py", line 946, in regulator
    x, y, v = read_hic_file(f, norm_method, CHRM_SIZE, distance_in_bp, chromosome,chromosome2, res)
  File "./mustache/mustache/mustache.py", line 378, in read_hic_file
    chrs, resolutions, masterindex, genome, metadata = read_header(hic)
  File "./mustache/mustache/mustache.py", line 250, in read_header
    key = readcstr(req)
  File "./mustache/mustache/mustache.py", line 210, in readcstr
    return buf.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte

I tried my .hic files from the allValidPairs.file from HiC-Pro, converted with Juicer Tools Version 2.13.06.

I have no problem using the /mustache/data/chr21_5kb.RAWobserved file and the .hic format file for HFFc6 Micro-C from [4D Nucleome Data Portal 4DNFIPC7P27B.hic.

What could I do?

Weight Column Error For Cooler Matrices

Hi,

I'm trying to use mustache to call loops using cooler matrices, and I'm getting an error message.

File "/projects/ps-yeolab4/t_cell_p01/home/sjquon/mustache/mustache/mustache.py", line 1000, in
main()
File "/projects/ps-yeolab4/t_cell_p01/home/sjquon/mustache/mustache/mustache.py", line 966, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/projects/ps-yeolab4/t_cell_p01/home/sjquon/mustache/mustache/mustache.py", line 795, in regulator
x, y, v, res = read_cooler(f, distance_in_bp, chromosome,chromosome2)
File "/projects/ps-yeolab4/t_cell_p01/home/sjquon/mustache/mustache/mustache.py", line 396, in read_cooler
temp = clr.matrix(balance=True,sparse=True).fetch( (chr1, int(start), int(end)))
File "/home/sjquon/anaconda3/envs/mustache/lib/python3.8/site-packages/cooler/core.py", line 574, in fetch
return self._slice(self.field, i0, i1, j0, j1)
File "/home/sjquon/anaconda3/envs/mustache/lib/python3.8/site-packages/cooler/api.py", line 362, in _slice
return matrix(
File "/home/sjquon/anaconda3/envs/mustache/lib/python3.8/site-packages/cooler/api.py", line 684, in matrix
raise ValueError(
ValueError: No column 'bins/weight'found. Use cooler.balance_cooler to calculate balancing weights or set balance=False.

These matrices are already normalized, but they were converted post normalization to a cooler format, so I don't want to add another normalization on top of it to add the column. Is there an argument in the mustache command to set the balance to false?

TypeError: 'int' object is not subscriptable

Hi,
I am running Mustache:
$ mustache -f inter_1to10kb.hic -r 10kb -st 0.7 -pt 0.1 -o10kb_loops.tsv

....after running smoothly over the chromosomes, I am getting an error. I am still getting some loops called, but it seems incomplete
......
Reading contact map...
0 32268
HiC version: 8
Normalizing contact map...
Loop calling...
Starting block 1/1...
Block 1 done.
0 loops found for chrmosome=HIC_SCAFFOLD_74, fdr<0.1 in 108.90sec
Reading contact map...
0 6124
HiC version: 8
File doesn't have the given chr_chr map

File did not contain KR normalization vectors for one or both chromosomes at 10000 BP

HiC version: 8
File doesn't have the given chr_chr map

File did not contain VC normalization vectors for one or both chromosomes at 10000 BP

Traceback (most recent call last):
File "/home/pavlan/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 396, in read_hic_file
temp = straw.straw("KR", f, str(chr1)+":"+str(int(start))+":"+str(int(end)), str(chr2)+":"+str(int(start))+":"+str(int(end)), "BP", res)
File "/home/pavlan/.conda/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 511, in straw
myFilePos=list1[0]
TypeError: 'int' object is not subscriptable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/pavlan/.conda/envs/mustache/bin/mustache", line 8, in
sys.exit(main())
File "/home/pavlan/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 1137, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/home/pavlan/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 948, in regulator
x, y, v = read_hic_file(f, norm_method, CHRM_SIZE, distance_in_bp, chromosome,chromosome2, res)
File "/home/pavlan/.conda/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 426, in read_hic_file
temp = straw.straw("VC", f, str(chr1)+":"+str(int(start))+":"+str(int(end)), str(chr2)+":"+str(int(start))+":"+str(int(end)), "BP", res)
File "/home/pavlan/.conda/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 511, in straw
myFilePos=list1[0]
TypeError: 'int' object is not subscriptable

issue using diff_mustache with .mcools

Thank you for a great software.

I am unable to run diff_mustache.py with .mcool files as input (no problem running mustache.py on the same files)

python3 ./mustache/mustache/diff_mustache.py -f1 file1.mcool -f2 file2.mcool -pt 0.1 -pt2 0.1 -o output -r 10000 -st 0.8

gives me

Reading contact map...
Traceback (most recent call last):
  File "./mustache/diff_mustache.py", line 910, in <module>
    main()
  File "./mustache/diff_mustache.py", line 838, in main
    o = regulator(f1, f2, args.norm_method, CHRM_SIZE, args.outdir,
  File "./mustache/diff_mustache.py", line 607, in regulator
    x1, y1, v1 = read_mcooler(f1, distance_in_bp, chromosome,chromosome2, res, norm_method)
NameError: name 'read_mcooler' is not defined

do I need to declare the .mcool files differently?
Many thanks in advance!

Different output using hic and cool format

Hi,
I'm using mustache (v1.2.0) to call HFFc6 Micro-C loops and I ran into the same issue mentioned in #20. I downloaded the updated hic and cool data from 4DN(4DNFIPC7P27B.hic & 4DNFI9FVHJZQ.mcool). I used the same options (-r 5kb -pt 0.01) but then it reported 28,006 and 36,553 loops from .mcool and .hic file, respectively. As mentioned in https://data.4dnucleome.org/resources/data-analysis/hi_c-processing-pipeline, the two files are generated from the same pairs file as input filtered contact list.

So which loop list should I choose?

Thank you so much for your help!

generate this errors, dont know if it's my .hic file has some problem

the test data is okay
for my own .hic data this is what i got

The distance limit is set to 2000000bp
Reading contact map...
0 4000000
HiC version: 8
HiC version: 8
Traceback (most recent call last):
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 396, in read_hic_file
temp = straw.straw("KR", f, str(chr1)+":"+str(int(start))+":"+str(int(end)), str(chr2)+":"+str(int(start))+":"+str(int(end)), "BP", res)
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 510, in straw
list1 = readFooter(req, c1, c2, norm, unit, binsize)
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 158, in readFooter
nExpectedValues = struct.unpack('<i',req.read(4))[0]
struct.error: unpack requires a buffer of 4 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/snow/anaconda3/envs/mustache/bin/mustache", line 11, in
sys.exit(main())
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 1137, in main
o = regulator(f, args.norm_method, CHRM_SIZE, args.outdir,
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 948, in regulator
x, y, v = read_hic_file(f, norm_method, CHRM_SIZE, distance_in_bp, chromosome,chromosome2, res)
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py", line 426, in read_hic_file
temp = straw.straw("VC", f, str(chr1)+":"+str(int(start))+":"+str(int(end)), str(chr2)+":"+str(int(start))+":"+str(int(end)), "BP", res)
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 510, in straw
list1 = readFooter(req, c1, c2, norm, unit, binsize)
File "/home/snow/anaconda3/envs/mustache/lib/python3.8/site-packages/straw/straw.py", line 158, in readFooter
nExpectedValues = struct.unpack('<i',req.read(4))[0]
struct.error: unpack requires a buffer of 4 bytes

How to adjust genomic distance by -d option?

Hi, thanks for the great tools! I tried mustache to call loops from high resolution Hi-C datasets of a species with a compact genome (~ 180 Mb). My loops of interest are relative long (1M - 10M).Since limited loops can be found near the diagonal, I am more interested in long-range chromatin interactions. So I want to extend the distance of two loci to be larger. Mustache can found loops < 2 Mb. I changed the -d option. But it always return with 2Mb. I also checked the code. I don't understand why the distance filter are designed in this way?

    distFilter = parseBP(args.distFilter)#change
    if not distFilter: 
        if 200*res >= 2000000:
            distFilter = 200*res
            print("The distance limit is set to {}bp".format(200*res))
        elif 2000*res <= 2000000:
            distFilter = 2000*res
            print("The distance limit is set to {}bp".format(2000*res))
        else:
            distFilter = 2000000
            print("The distance limit is set to 2Mbp")			
    elif distFilter < 200*res:
        print("The distance limit is set to {}bp".format(200*res))
        distFilter = 200*res
    elif distFilter > 2000*res:
        print("The distance limit is set to {}bp".format(2000*res))
        distFilter = 2000*res
    elif distFilter > 2000000:
        distFilter = 2000000
        print("The distance limit is set to 2Mbp")

Parameters

Thank you for putting out this cool tool! In your pre-print it was mentioned that:

5kb resolution ... we employed Mustache using two octaves with default parameters (σ0 = 1.6 and s = 10)

But no explicit statement was made about the parameters applied to 10kb resolution data. Could you please share if you used the same ones? Also, several parameters were said to be 'experimentally chosen for 5Kb resolution', would it be possible to learn some more detail about the process to potentially replicate the results?

Set distance filter

It looks like Mustache has a parameter to change the maximum allowable distance between loops. This is not on the github README but it is on the command line help summary:

-d DISTFILTER, --distance DISTFILTER
REQUIRED: Maximum distance (in bp) allowed between loop loci

Based on Mustache output, I don't think the -d parameter is working...
Input:
mustache -f ./MB268.allValidPairs.hic -r 10kb -o MB268-mustache-10kb.hg38.bedpe -ch 1 -d 11000000
Output:

The distance limit is set to 2Mbp
Reading contact map...
HiC version:  8
Normalizing contact map...

Calculation of distance limit?

Since I specified -d 250000000, I can see that the actual max distance metric is 2000 fold higher than my bin size (6Mb, 10Mb, 20Mb for 3k, 5k 10k bin size). Is that hard coded or does that take available RAM into consideration? I am evaluating mustache for shorter loops anyways, so no biggie, just curious.

resolution

hi
your method looks useful. but i have a question aboult the resolution whether be set 500bp when i run mustache with hic data that is 1kb resolution?

i will appreciate if you can reply me quickly

mustache crashing on normalization step for small .cool files

Hi, new problem found that makes using mustache in a production environment challenging. This is more a request than a bug report I think. Anyhow, would help us tremendously if this would be addressed.

thanks a lot

Problem:
mustache do not terminate gracefully on small .cool file during normalization step

Requested behavior:
It would be optimal for mustache to gracefully exit with code 0 but produce some messages on the stderr reporting it's inability to finish normalization. Additionally, a .tsv file with no reported loop would be produce.

How to replicate:

Download test data and pull updated docker image:

wget https://dovetail-public.s3-us-west-2.amazonaws.com/publicData/mustacheEx/test_large.cool
wget https://dovetail-public.s3-us-west-2.amazonaws.com/publicData/mustacheEx/test_small.cool
docker pull mblanche/mustache

Running on large cool file terminates gracefully:

$ docker  run -v ${PWD}:/mnt -i mblanche/mustache mustache -p 24 -f /mnt/test_large.cool  -r 5kb -ch chr21 -o test_chr21_large.tsv


The distance limit is set to 2Mbp
Reading contact map...
0 4000000
2000000 6000000
4000000 8000000
...
Block 19 done.
Starting block 15/22...
Block 15 done.
0 loops found for chrmosome=chr21, fdr<0.2 in 90.09sec

Running on smaller subset cool file crashes at the normalization step and do not exit gracefully:

$ docker  run -v ${PWD}:/mnt -i mblanche/mustache mustache -p 24 -f /mnt/test_small.cool  -r 5kb -ch chr21 -o test_chr21_small.tsv


The distance limit is set to 2Mbp
Reading contact map...
0 4000000
2000000 6000000
4000000 8000000
6000000 10000000
8000000 12000000
10000000 14000000
12000000 16000000
14000000 18000000
16000000 20000000
18000000 22000000
20000000 24000000
22000000 26000000
24000000 28000000
26000000 30000000
28000000 32000000
30000000 34000000
32000000 36000000
34000000 38000000
36000000 40000000
38000000 42000000
40000000 44000000
42000000 46000000
44000000 46709982
Normalizing contact map...
Traceback (most recent call last):
  File "/usr/local/bin/mustache", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 1078, in main
    o = regulator(f, args.norm_method, args.cooler_do_balance, CHRM_SIZE, args.outdir,
  File "/usr/local/lib/python3.8/dist-packages/mustache/mustache.py", line 906, in regulator
    n = max(max(x), max(y)) + 1
ValueError: max() arg is an empty sequence

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.