xiaotaowang / hicpeaks Goto Github PK

View Code? Open in Web Editor NEW

40.0 2.0 9.0 5.43 MB

A Python implementation for BH-FDR and HiCCUPS

License: GNU General Public License v3.0

Python 100.00%

python hi-c chromatin peaks loops contact-matrix bioinformatics genomics cooler

hicpeaks's People

Contributors

Stargazers

Watchers

Forkers

yuanbaowen521 yunxialiu liufuyan2016 buguashushu v587dexinxin pengfeiintuebingen andresweitzel zhuakexi xl5525

hicpeaks's Issues

Calling peaks genome-wide

HI,
is there a way to call peaks genome-wide instead of the chromosome by chromosome? I guess I could make multiple chrxx_chrxx.txt files, and then concatenate all the calls, but i was hoping there was a more streamlined way of doing this.

toCooler

HI
Thank you for your work.
When I use the toCooler to change the matrix obtain from HiC-Pro into cool format, an error occured:

Traceback (most recent call last):
File "/opt/conda/bin/toCooler", line 131, in
run()
File "/opt/conda/bin/toCooler", line 112, in run
from hicpeaks.utilities import Genome, balance
File "/opt/conda/lib/python2.7/site-packages/hicpeaks/utilities.py", line 13, in
from cooler.io import create, parse_cooler_uri, CoolerMerger
ImportError: cannot import name parse_cooler_uri

so I edited the utilities.py as

from cooler.util import binnify, parse_cooler_uri
from cooler.io import create
from cooler.reduce import CoolerMerger

and it passed. Please check and confirm.

ValueError: Offset 1687 (index 1687) out of bounds

In pyHICCUPS, I get the following error:

ValueError: Offset 1687 (index 1687) out of bounds

It occurs in sparse.diags in the following code chunk (~line 135). Do you know what causes it? –
H = Lib.matrix(balance=False, sparse=True).fetch(key)
cHeatMap = Lib.matrix(balance=True, sparse=True).fetch(key)
# Customize Sparse Matrix ...
chromLen = H.shape[0]
num = args.maxapart // resolution + args.maxww + 1

        Diags = [H.diagonal(i) for i in np.arange(num)]
        M = sparse.diags(Diags, np.arange(num), format='csr')

Inter-chromosome peaks/loops

Hi XiaoTao,

Looks like your pyHICCUPS implementation does not support inter-chromosomal loops. Is there a quick fix to the code to output inter-chromosomal loops or you already implemented it?

Thanks!

pyHICCUPS　error?

Dear,
when I use example provided for call loops, command below:
python ../scripts/pyHICCUPS -O K562-MboI-HICCUPS-loops.txt -p K562-MboI-parts.cool::40000 --pw 1 --ww 3
An error occured:

root INFO @ 09/03/18 13:59:59: Loading Hi-C data ...
root INFO @ 09/03/18 13:59:59: Calling Peaks ...
root INFO @ 09/03/18 13:59:59: Chromosome 21 ...
Traceback (most recent call last):
File "../scripts/pyHICCUPS", line 522, in
run()
File "../scripts/pyHICCUPS", line 181, in run
results = map_(worker, Params)
File "../scripts/pyHICCUPS", line 130, in worker
Diags = [H.diagonal(i) for i in np.arange(num)]
TypeError: diagonal() takes exactly 1 argument (2 given)

Could you help me?

Output coordinates question

Hi, thank you very much for this tool, I have tried to use it and was pleasantly surprised - very easy to use and fast!

I however have a small question about the output coordinates of pyHICCUPS. What do they correspond to? What is the difference between loc_1 and centroid_x? Sometimes they are the same, and sometimes they are not... And how is radius determined?

Thank you,
Ilya

toCooler error because of input txt file

Hello Xiaotao,
I am a postdoctor from HZAU and now is learning data analysis for Hi-C. Recently I am using the HiCPeaks software to transform the raw matrix generated by HiC-pro to cool file. Some problems can't be solved.
According to your guidelines, I tried to substract interaction information for chr01 from the raw matrix HPC9_150000.matrix. According to file HPC9_150000_abs.bed , the chr01 is binned to 754 windows. So I generated a file with the code
awk '$1<=754&&$2<=754{print}' HPC9_150000.matrix >1_1.txt

head -5 HPC9_150000.matrix
1 1 1599
1 2 577
1 3 117
1 4 103
1 5 68

head -5 HPC9_150000_abs.bed
Chr01 0 150000 1
Chr01 150000 300000 2
Chr01 300000 450000 3
Chr01 450000 600000 4
Chr01 600000 750000 5

Then I run toCooler with code
toCooler -O HPC9_1.cool -d datasets --nproc 1 --chromsizes-file Ga_1.chromsizes &
It generates error "IndexError: index 754 is out of bounds for axis 0 with size 754"

File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/hicpeaks-0.3.4-py3.6.egg/EGG-INFO/scripts/toCooler", line 128, in run
    balance(cooler_uri, nproc=args.nproc)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/hicpeaks-0.3.4-py3.6.egg/hicpeaks/utilities.py", line 417, in balance
    map=map_)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/balance.py", line 332, in balance_cooler
    .reduce(add, np.zeros(n_bins))
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/tools.py", line 244, in reduce
    return reduce(binop, iter(self.run()), init)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/tools.py", line 54, in apply_pipeline
    data = func(chunk, data)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/balance.py", line 46, in _zero_trans
    mask = chrom_ids[pixels['bin1_id']] != chrom_ids[pixels['bin2_id']]
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2149, in __getitem__values=self._codes[key], dtype=self.dtype, fastpath=True
IndexError: index 754 is out of bounds for axis 0 with size 754

I noticed that the number of first two columes in input 1_1.txt file should be smaller than binned chr windows 754, instead of equal or larger than 754.

I tried to analyze the chr02, I used the code
awk '$1>=755&&$1<=1415&&$2>=755&&$2<=1415{print}' HPC9_150000.matrix >2_2.txt
I replaced 1_1.txt with 2_2.txt under directory ./150K/, then it generated similar errors "IndexError: index 755 is out of bounds for axis 0 with size 661" 661 is the binned number of chr02.
How to prepare the input file correctlly?

By the way, should I prepare the chr_chr.txt files for all the chromosomes one by one ?
Should I put all these chr_chr.txt files under the same ./150K/ directory ?

I hope you can reply. Thank you so much !!!
You can reply through email [email protected] if you think it is more convenient.

Best wishes.
Pengcheng

pyBHFDR can't find weight column

pyBHFDR -O K562-MboI-BHFDR-loops.txt -p Ga.40000.a.cool::40000 -C 4 --pw 1 --ww 3

root INFO @ 10/08/23 15:58:09: Python Version: 3.6.15
root INFO @ 10/08/23 15:58:09:

ARGUMENT LIST:

Output file = K562-MboI-BHFDR-loops.txt

Cooler URI = Ga.40000.cool::40000

Chromosomes = ['4']

Peak window width = 1

Donut width = 3

Maximum donut width = 10

Significant Level = 0.05

Maximum Genomic distance = 2000000

Weight column name = weight

Number of Processes = 1

root INFO @ 10/08/23 15:58:10: Loading Hi-C data ...
root INFO @ 10/08/23 15:58:10: Calling Peaks ...
Traceback (most recent call last):
File "/data/Software/miniconda3/envs/TADlib/bin/pyBHFDR", line 185, in
run()
File "/data/Software/miniconda3/envs/TADlib/bin/pyBHFDR", line 169, in run
for key, pixel_table in results:
File "/data/Software/miniconda3/envs/TADlib/bin/pyBHFDR", line 116, in worker
cHeatMap = Lib.matrix(balance=args.clr_weight_name, sparse=True).fetch(key)
File "/data/Software/miniconda3/envs/TADlib/lib/python3.6/site-packages/cooler/core/_selectors.py", line 150, in fetch
return self._slice(self.field, i0, i1, j0, j1)
File "/data/Software/miniconda3/envs/TADlib/lib/python3.6/site-packages/cooler/api.py", line 384, in _slice
self._is_symm_upper,
File "/data/Software/miniconda3/envs/TADlib/lib/python3.6/site-packages/cooler/api.py", line 710, in matrix
+ "calculate balancing weights or set balance=False."
ValueError: No column 'bins/weight'found. Use cooler.balance_cooler to calculate balancing weights or set balance=False.

toCool

toCooler -O Ga.40000.a.cool -d datasets --chromsizes-file aa.leng --no-balance --nproc 1

root INFO @ 10/08/23 16:06:11: Python Version: 3.6.15
root INFO @ 10/08/23 16:06:11:

ARGUMENT LIST:

Output cooler path = Ga.40000.a.cool

Hi-C datasets = {40000: '/gpfs/Project/wangzw_Project/TDA_analysis/test/HiCPeaks/40K'}

Chromosomes = ['#', 'X']

Include trans-chromosomal data = False

Genome Assembly = None

Chromosome size file = aa.leng

Number of processes = 1

Log file name = tocooler.log

hicpeaks.utilities INFO @ 10/08/23 16:06:12: Read chromosome sizes from /gpfs/Project/wangzw_Project/TDA_analysis/test/HiCPeaks/aa.leng
hicpeaks.utilities INFO @ 10/08/23 16:06:12: Done
hicpeaks.utilities INFO @ 10/08/23 16:06:12: Extract and save data into cooler format for each resolution ...
hicpeaks.utilities INFO @ 10/08/23 16:06:12: Current resolution: 40000bp
hicpeaks.utilities INFO @ 10/08/23 16:06:12: Generate bin table ...

first, do toCooler convert txt file to cool file format .
then call peaks using pyBHFDR , has some error message . don't know how to solve it.
thanks

APA score

Hi Xiaotiao,

Another wonderful tool! Thank you so much.

Could HiCPeaks return a score value when do APA analysis with apa-analysis?

Thank you,
Pinpin

xiaotaowang / hicpeaks Goto Github PK

hicpeaks's People

Contributors

Stargazers

Watchers

Forkers

hicpeaks's Issues

ARGUMENT LIST:

Output file = K562-MboI-BHFDR-loops.txt

Cooler URI = Ga.40000.cool::40000

Chromosomes = ['4']

Peak window width = 1

Donut width = 3

Maximum donut width = 10

Significant Level = 0.05

Maximum Genomic distance = 2000000

Weight column name = weight

Number of Processes = 1

toCool

ARGUMENT LIST:

Output cooler path = Ga.40000.a.cool

Hi-C datasets = {40000: '/gpfs/Project/wangzw_Project/TDA_analysis/test/HiCPeaks/40K'}

Chromosomes = ['#', 'X']

Include trans-chromosomal data = False

Genome Assembly = None

Chromosome size file = aa.leng

Number of processes = 1

Log file name = tocooler.log

Recommend Projects

Recommend Topics

Recommend Org