raphael-group / hatchet Goto Github PK

HATCHet (Holistic Allele-specific Tumor Copy-number Heterogeneity) is an algorithm that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples.

License: BSD 3-Clause "New" or "Revised" License

CMake 0.01% Python 1.07% Shell 0.01% C++ 0.22% Dockerfile 0.01% Roff 95.34% Clojure 3.34%

hatchet's People

Contributors

Stargazers

Watchers

hatchet's Issues

Samtools and bcftools are missing from bioconda environment

samtools and bcftools are missing from the bioconda environment of HATCHet. Please add both of those to the corresponding recipe.

Bioconda update for v0.3.2

Hi,

I checked bioconda and HATCHet still seems to be on v0.3.1. Can you please update it? I would like to run some tests using the bug-fixed version.

Thanks.

Edit: Actually, that might just be the "Versions:" section that doesn't have v0.3.2 listed. Is it just missing there?

Specified normal BAM file does not exist / XDIR question!

Hi! I keep getting a "The specified normal BAM file does not exist" and I can't figure out what is going wrong. Could it be an issue with my XDIR designation? I have my three BAM files listed in the script (normal, tumor, recurrent tumor) and I can't get past the binBAM step (it seems).

Any advice here?

Thank you!!

Samtools: libcrypto.so.1.0.0: cannot open shared object file

Samtools was not being installed properly using conda:

/usr/local/bin/samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

I believe this issue was addressed in #13958, but none of the fixes seemed to work for me except changing the order of the channels in my Dockerfile:

RUN conda create -qy -p /usr/local \
    -c conda-forge \
    -c bioconda \
    -c defaults \
    hatchet==${HATCHET_VERSION}

The documentation states:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Please let me know if there's a different way to approach this. Thanks.

Edit: When using hatchet run hatchet.ini, in addition to reference.dict, I also needed the .bai files for each bam file in the same directory. This wasn't mentioned in the script README so it might help to clarify.

segfault (possibly due to BNPY)

Hi,

I am trying to test hatchet, however after installation of gurobi and bnpy-dev, the first step fails with a segmentation fault. I dont see a core dump being created in the pwd.

demo-WES]$ ${CLUBB} demo-wes.bb -by ${BNPY} -o demo-wes.seg -O demo-wes.bbc -e 12 -tB 0.03 -tR 0.15 -d 0.08
[2019-Apr-23 14:02:41]# Parsing and checking input arguments
[2019-Apr-23 14:02:41]# Reading the combined BB file
[2019-Apr-23 14:02:41]# Format data to cluster
[2019-Apr-23 14:02:41]# Clustering bins by RD and BAF across tumor samples
[2019-Apr-23 14:02:41]## Loading BNPY
Segmentation fault (core dumped)

It fails in a similar fashion with WGS demo as well.

I have bnpy-dev from bitbucket, and gurobi 811.

Thanks!

deBAF error

Hi!

When doing the preprocessing steps I experience an error in the deBAF step.

I was running the following

`#BIN

python -m hatchet binBAM -N ${NORMAL} -T ${BAMS} -S ${ALLNAMES}
-b 200kb -g ${REF} -j ${J}
-q 20 -O ${BIN}normal.1bed -o ${BIN}bulk.1bed -v &> ${BIN}bins.log

#deBAF

python -m hatchet deBAF -N ${NORMAL} -T ${BAMS} -S ${ALLNAMES}
-r ${REF} -j ${J} -q 20 -Q 20 -U 20 -c 4
-C 300 -O ${BAF}normal.1bed -o ${BAF}bulk.1bed -v
-bt $BCF -st $BCF -L $SNPSTAB
&> ${BAF}bafs.log`

the log gives me the following:

2021-Mar-17 23:15:04]# Counting SNPs alleles from the matched-normal sample
[2021-Mar-17 23:15:05]AlleleCounter-1 starts on LMU01_NG for chr1
[2021-Mar-17 23:15:05]AlleleCounter-2 starts on LMU01_NG for chr2
[2021-Mar-17 23:15:05]AlleleCounter-3 starts on LMU01_NG for chr3
[2021-Mar-17 23:15:05]AlleleCounter-4 starts on LMU01_NG for chr4
[2021-Mar-17 23:15:05]AlleleCounter-5 starts on LMU01_NG for chr5
[2021-Mar-17 23:15:05]AlleleCounter-6 starts on LMU01_NG for chr6
[2021-Mar-17 23:15:05]AlleleCounter-7 starts on LMU01_NG for chr7
[2021-Mar-17 23:15:05]AlleleCounter-8 starts on LMU01_NG for chr8
[2021-Mar-17 23:15:05]AlleleCounter-9 starts on LMU01_NG for chr9
[2021-Mar-17 23:15:05]AlleleCounter-10 starts on LMU01_NG for chr10
[2021-Mar-17 23:15:05]AlleleCounter-11 starts on LMU01_NG for chr11
[2021-Mar-17 23:15:05]AlleleCounter-12 starts on LMU01_NG for chr12
[2021-Mar-17 23:15:05]AlleleCounter-13 starts on LMU01_NG for chr13
[2021-Mar-17 23:15:05]AlleleCounter-14 starts on LMU01_NG for chr14
[2021-Mar-17 23:15:05]AlleleCounter-15 starts on LMU01_NG for chr15
[2021-Mar-17 23:15:05]AlleleCounter-16 starts on LMU01_NG for chr16
[2021-Mar-17 23:15:05]AlleleCounter-17 starts on LMU01_NG for chr17
[2021-Mar-17 23:15:05]AlleleCounter-18 starts on LMU01_NG for chr18
[2021-Mar-17 23:15:05]AlleleCounter-19 starts on LMU01_NG for chr19
[2021-Mar-17 23:15:05]AlleleCounter-20 starts on LMU01_NG for chr20
[2021-Mar-17 23:15:05]AlleleCounter-21 starts on LMU01_NG for chr21
[2021-Mar-17 23:15:05]AlleleCounter-22 starts on LMU01_NG for chr22
Progress: |----------------------------------------| 0.0% CompleteProcess AlleleCounter-19:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr19 of LMU01_NG, please check errors in ./LMU01_NG_chr19_bcftools.log!
Process AlleleCounter-18:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr18 of LMU01_NG, please check errors in ./LMU01_NG_chr18_bcftools.log!
Process AlleleCounter-22:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr22 of LMU01_NG, please check errors in ./LMU01_NG_chr22_bcftools.log!
Process AlleleCounter-2:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr2 of LMU01_NG, please check errors in ./LMU01_NG_chr2_bcftools.log!
Process AlleleCounter-10:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr10 of LMU01_NG, please check errors in ./LMU01_NG_chr10_bcftools.log!
Process AlleleCounter-20:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr20 of LMU01_NG, please check errors in ./LMU01_NG_chr20_bcftools.log!
Process AlleleCounter-17:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr17 of LMU01_NG, please check errors in ./LMU01_NG_chr17_bcftools.log!
Process AlleleCounter-9:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr9 of LMU01_NG, please check errors in ./LMU01_NG_chr9_bcftools.log!
Process AlleleCounter-15:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr15 of LMU01_NG, please check errors in ./LMU01_NG_chr15_bcftools.log!
Process AlleleCounter-3:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr3 of LMU01_NG, please check errors in ./LMU01_NG_chr3_bcftools.log!
Process AlleleCounter-13:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr13 of LMU01_NG, please check errors in ./LMU01_NG_chr13_bcftools.log!
Process AlleleCounter-5:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr5 of LMU01_NG, please check errors in ./LMU01_NG_chr5_bcftools.log!
Process AlleleCounter-11:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr11 of LMU01_NG, please check errors in ./LMU01_NG_chr11_bcftools.log!
Process AlleleCounter-21:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr21 of LMU01_NG, please check errors in ./LMU01_NG_chr21_bcftools.log!
Process AlleleCounter-12:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr12 of LMU01_NG, please check errors in ./LMU01_NG_chr12_bcftools.log!
Process AlleleCounter-8:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr8 of LMU01_NG, please check errors in ./LMU01_NG_chr8_bcftools.log!
Process AlleleCounter-14:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr14 of LMU01_NG, please check errors in ./LMU01_NG_chr14_bcftools.log!
Process AlleleCounter-1:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr1 of LMU01_NG, please check errors in ./LMU01_NG_chr1_bcftools.log!
Process AlleleCounter-6:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr6 of LMU01_NG, please check errors in ./LMU01_NG_chr6_bcftools.log!
Process AlleleCounter-7:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr7 of LMU01_NG, please check errors in ./LMU01_NG_chr7_bcftools.log!
Process AlleleCounter-16:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr16 of LMU01_NG, please check errors in ./LMU01_NG_chr16_bcftools.log!
Process AlleleCounter-4:
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 179, in run
snps = self.countAlleles(bamfile=next_task[0], samplename=next_task[1], chromosome=next_task[2])
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 198, in countAlleles
raise ValueError(sp.error('Allele counting failed on {} of {}, please check errors in {}!').format(chromosome, samplename, errname))
ValueError: Allele counting failed on chr4 of LMU01_NG, please check errors in ./LMU01_NG_chr4_bcftools.log!
Traceback (most recent call last):
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/main.py", line 45, in
globals()command
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 25, in main
snps = counting(bcftools=args["bcftools"], reference=args["reference"], samples=[args["normal"]], chromosomes=args["chromosomes"], num_workers=args["j"],
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/site-packages/hatchet/utils/deBAF.py", line 131, in counting
tasks.join()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/queues.py", line 326, in join
self._cond.wait()
File "/home/zyto/unger/anaconda3/envs/hatchet/lib/python3.8/multiprocessing/synchronize.py", line 261, in wait
return self._wait_semaphore.acquire(True, timeout)

bcftools.log gives me:

[E::hts_open_format] Failed to open file "[]" : No such file or directory
bcftools mpileup: Could not read file "[]"Failed to read from standard input: unknown file type

as input I used the mapped bams of a normal tissue and of primary and relapse of the the patient. For the SNP list I imported Homo_sapiens_assembly38.dbsnp138.vcf from the GATK repository into R using vcfR and wrote a tab-delimited text file with two columns (chromosome and position) without header.

Any suggestions what might cause these errors?

Issues with Installation Question

I have been stuck with installing Hatchet at the <pip install .> command. I've acquired the Gurobi license and have installed it appropriately and am not sure why I can't circumnavigate these issues (attached). I am running pip install from the root directory of hatchet. Is this an issue with file paths? Have you seen this before? Thanks so much!!

solve steps fails with GRBException

Hi,

I have followed the workflow described originally in the bash script and then in the md file.

Everything seems to work until the solve step, which fails after a few seconds with this message ( i shortened it a bit, but can add the whole thing if that is interesting.)

# Finding the neutral diploid/tetraploid cluster
## Cluster selected as neutral (diploid/tetraploid) is 29
# Running diploid
## Running diploid with 2 clones
### Running command: /home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/solve /home/shollizeck/test_hatchet//bbc/bulk -f  -e 6 -j 2 -p 400 -u 0.03 -r 12 -M 2 -v 2 -c 29:1:1 -n 2 -o /home/shollizeck/test_hatchet//results/results.diploid.n2
## M:	2
## mR:	0.08
## mB:	0.04
## m:	None
## clonal:	None
## tR:	0.08
## seg:	/home/shollizeck/test_hatchet//bbc/bulk.seg
## tetraploid:	False
## diploid:	False
## eT:	12
## r:	12
## tB:	0.04
## u:	0.03
## x:	/home/shollizeck/test_hatchet//results
## f:	None
## d:	None
## g:	0.35
## solver:	/home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/solve
## ln:	2
## eD:	6
## j:	2
## bbc:	/home/shollizeck/test_hatchet//bbc/bulk.bbc
## ts:	0.008
## p:	400
## s:	None
## un:	8
## limit:	0.6
## v:	3
## input:	/home/shollizeck/test_hatchet//bbc/bulk
## td:	0.1
## ampdel:	True
## tc:	1
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/__main__.py", line 45, in <module>
    solve([solve_bin] + args)
  File "/home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/bin/HATCHet.py", line 189, in main
    diploidObjs = runningDiploid(neutral=neutral, args=args)
  File "/home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/bin/HATCHet.py", line 378, in runningDiploid
    results.append((n , execute(args, basecmd, n, outprefix), outprefix))
  File "/home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/bin/HATCHet.py", line 641, in execute
    raise RuntimeError(error("The following command failed: \n\t\t{}\nwith {}\n".format(cmd, buffer)))
RuntimeError: The following command failed: 
		/home/shollizeck/hatchet/lib/python2.7/site-packages/hatchet-0.1.0-py2.7-linux-x86_64.egg/hatchet/solve /home/shollizeck/test_hatchet//bbc/bulk -f  -e 6 -j 2 -p 400 -u 0.03 -r 12 -M 2 -v 2 -c 29:1:1 -n 2 -o /home/shollizeck/test_hatchet//results/results.diploid.n2
with ['\x1b[95m\x1b[1m[21:06:31]### Parsing and checking input arguments\t\x1b[0m', '\x1b[92m[21:06:31]## \tInput prefix:  /home/shollizeck/test_hatchet//bbc/bulk', 'Input SEG:  /home/shollizeck/test_hatchet//bbc/bulk.seg', 'Input BBC:  /home/shollizeck/test_hatchet//bbc/bulk.bbc', 'Number of clones:  2', 'Clonal copy numbers:  { 29 [Cluster] : 1|1 [CN] }', 'Help message:  0', 'Maximum number of copy-number states:  -1', 'Maximum integer copy number:  6', 'Number of jobs:  2', 'Number of seeds:  400', 'Minimum tumor-clone threshold:  0.03', 'Maximum resident memory:  -1', 'Time limit:  -1', 'Maximum number of iteratios:  10', 'Random seed:  12', 'Solving mode:  Coordinate-descent only', 'Verbose:  2', 'Output prefix:  /home/shollizeck/test_hatchet//results/results.diploid.n2', 'Diploid threshold:  0.1', 'Base:  1', 'Force amp-del:  1\t\x1b[0m', '\x1b[95m\x1b[1m[21:06:31]### Reading the input SEG file\t\x1b[0m', '\x1b[95m\x1b[1m[21:06:31]### Scale the read-depth ratios into fractional copy numbers using the provided copy numbers\t\x1b[0m', '\x1b[95m\x1b[1m[21:06:31]### Compute allele-specific fractional copy numbers using BAF\t\x1b[0m', '\x1b[95m\x1b[1m[21:06:31]### Starting coordinate descent algorithm on 400 seeds\t\x1b[0m', '\x1b[92m[21:06:31]## Coordinate Descence {\t\x1b[0m', "terminate called after throwing an instance of 'GRBException'", '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

I just dont even know where to do something different, so any help/fix would be amazing.

Cheers,
Sebastian

ArgParsing Bug

Line 387 in ArgParsing.py should be sp.error(..., not error(...

binBam error

Dear Hatchet Team,
when I was running binBam.py I got the following error:
File "genericpath.py", line 99, in _splitext
sepIndex = p.rfind(sep)
AttributeError: 'NoneType' object has no attribute 'rfind'

Any help would be greatly appreciated!

Demoes/Examples are pointing to the wrong python exec in bioconda env

The current demos point to python3 but when using bioconda version of HATCHet this is incorrect and should be substituted with python, otherwise resulting in the error:

/camp/apps/eb/software/Anaconda3/2020.02/bin/python3: No module named hatchet

How to pick the right copy number solution

How reliable is the chosen copy number solution made by hatchet? Are there any suggested ways of confirming that it is the correct one, rather than a different solution with a larger number of subclones? Is it important that the maximum copy number across bins be below a specific maximum number? I noticed that this value varies among the different solutions.

I understand that the optimal solution is being chosen based on maximizing the score (the second derivative of the objective function). However the chosen solution seems like it found a dominant subclone which it declares to present in all samples, rather than a higher-order solution with greater heterogenity. Should this be used as a criteria as well?

Thank you

Impossible to specify --regions in count-alleles

Hi, I'm trying to run hatchet on WES data, but there doesn't seem to be a way to provide the --regions option to count-alleles,

in the arg parsing code, the --snps options is required

 parser.add_argument("-L","--snps", required=True, type=str, nargs='+', help="List of SNPs to consider in the normal sample")

but the parser exits if --regions is also specified.

if args.snps != None and args.regions != None:
      raise ValueError(sp.error("Both SNP list and genomic regions have been provided, please provide only one of these!"))

I copied the above code from master but I'm having the issue on Bioconda hatchet=0.3.2.

Possible to convert BAF to 0-1 scale?

Hi,

I would like to compare results from Hatchet to those that I got from other methods (SNP arrays). However, Hatchet gives BAF values from 0-0.5 which I assume is the ratio of major allele/minor allele, whereas SNP arrays usually output BAF between 0-1.
Is there a way to convert these values, e.g. can I just take the ratio's in the baf/tumor.bed file and then take the mean across a segment of 50k?
Thanks in advance!

Mentioning CRAM support in doc

Hi,

This is actually not really an issue, but I wanted to inform you that I tried hatchet on CRAM files and it worked great for me. I guess that because the only steps involving the bam files are actually samtools commands, it will be transparent to the users as long as the reference for decoding the CRAM files can be found automatically by samtools (either using the md5 or the path in the header or the appropriate env variable).
It would be helpful to mention CRAM support in the doc somewhere, so people feel confident using it in this increasingly popular format.

Thanks again for the great tool!

Nicolas

clubb error

Hi,
Thanks for the wonderful tool.
I am handling WES multi-region samples (4~5 samples per patient including 1 normal sample). Average coverage is X180. I got an error below while running the script below that.
I tried to solve it myself by looking into the code but I couldn't know what was causing the problem.

[error]
12:14:20]'python3 -m hatchet cluBB bb/bulk.bb -o bbc/bulk.seg -O bbc/bulk.bbc -e 11150 -d 0.1 -tR 0.15 -tB 0.04 -u 20 -dR 0.002 -dB 0.002
ESC[1mESC[95m[2021-Apr-06 12:14:22]# Parsing and checking input arguments
ESC[0mESC[1mESC[95m[2021-Apr-06 12:14:22]# Reading the combined BB file
ESC[0mESC[1mESC[95m[2021-Apr-06 12:14:22]# Format data to cluster
ESC[0mESC[1mESC[95m[2021-Apr-06 12:14:22]# Bootstrap each bin for clustering
ESC[0mESC[1mESC[95m[2021-Apr-06 12:14:24]# Clustering bins by RD and BAF across tumor samples
ESC[0mESC[92m[2021-Apr-06 12:14:25]## Clustering with K=50 and c=0.02...
ESC[0mESC[1mESC[95m[2021-Apr-06 18:44:38]# Refining clustering using given tolerances
ESC[0mTraceback (most recent call last):
File "/opt/School/python/3.8.1/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/School/python/3.8.1/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/School/python/3.8.1/lib/python3.8/site-packages/hatchet/main.py", line 45, in
globals()command
File "/opt/School/python/3.8.1/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 34, in main
clusterAssignments, numClusters = refineClustering(combo=combo, assign=clusterAssignments, assignidx=bintoidx, samples=samples, rdtol=args['rdtol'], baftol=a
rgs['baftol'])
File "/opt/School/python/3.8.1/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 204, in refineClustering
assert -1 not in set(newassign)
AssertionError

[setting]
REF=$REF
LIST="/data/public/dbSNP/b154/GRCh38/GCF_000001405.38.re.vcf.gz"
REF_VERS="hg38"
CHR_NOTATION=true
XDIR=
NORMAL="$Aligned_Path$NORMAL"
BAMS="$Aligned_Path$BAM1 $Aligned_Path$BAM2 $Aligned_Path$BAM3 $Aligned_Path$BAM4"
NAMES="$NAME1 $NAME2 $NAME3 $NAME4"
ALLNAMES="$NORMAL_NAME $NAME1 $NAME2 $NAME3 $NAME4"
J=22
MINREADS=20
MAXREADS=1000
BIN="250kb" #200kb
PHASE="None"

[script]
python3 -m hatchet binBAM -N ${NORMAL} -T ${BAMS} -S ${ALLNAMES} -b ${BIN} -g ${REF} -j ${J} -O ${RDR}normal.1bed -o ${RDR}tumor.1bed -t ${RDR}total.tsv |& tee ${RDR}bins.log
python3 -m hatchet SNPCaller -N ${NORMAL} -r ${REF} -j ${J} -c ${MINREADS} -C ${MAXREADS} -R ${LIST} -o ${SNP} |& tee ${BAF}bafs.log
python3 -m hatchet deBAF -st ${SAM} -bt ${BCF} -N ${NORMAL} -T ${BAMS} -S ${ALLNAMES} -r ${REF} -j ${J} -L ${SNP}*.vcf.gz -c ${MINREADS} -C ${MAXREADS} -O ${BAF}normal.1bed -o ${BAF}tumor.1bed |& tee ${BAF}bafs.log
python3 -m hatchet comBBo -c ${RDR}normal.1bed -C ${RDR}tumor.1bed -B ${BAF}tumor.1bed -t ${RDR}total.tsv -p ${PHASE} -e ${RANDOM} > ${BB}bulk.bb
python3 -m hatchet cluBB ${BB}bulk.bb -o ${BBC}bulk.seg -O ${BBC}bulk.bbc -e ${RANDOM} -d 0.1 -tR 0.15 -tB 0.04 -u 20 -dR 0.002 -dB 0.002
cd ${PLO}
python3 -m hatchet BBot -c RD --figsize 6,3 ../${BBC}bulk.bbc
python3 -m hatchet BBot -c CRD --figsize 6,3 ../${BBC}bulk.bbc
python3 -m hatchet BBot -c BAF --figsize 6,3 ../${BBC}bulk.bbc
python3 -m hatchet BBot -c BB ../${BBC}bulk.bbc
python3 -m hatchet BBot -c CBB ../${BBC}bulk.bbc -tS 0.005
#python3 -m hatchet BBot -c CBB ../${BBC}bulk.bbc --colwrap 3 -tS 0.005
cd ../${RES}
python3 -m hatchet solve -i ../${BBC}bulk -n2,6 -p 400 -u 0.06 -eD 6 -eT 12 -g 0.35 -l 0.5 -j ${J} -r ${RANDOM} &> >(tee >(grep -v Progess > hatchet.log))
cd ../${SUM}
python3 -m hatchet BBeval ../${RES}/best.bbc.ucn -rC 10 -rG 1

Thanks for any advice in advance ,
Jiho

RuntimeError

Hello,
I have encountered a RuntimeError while running the pipeline, which is believed to occur in the compute-cn or plot-cn step as no results are written.
The error occurs in line 644 of HATCHet.py with the following error message:

## Cluster selected as neutral (diploid/tetraploid) is 9
# Running diploid
## Running diploid with 2 clones
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.8/site-packages/hatchet/__main__.py", line 69, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.8/site-packages/hatchet/__main__.py", line 63, in main
    globals()[command](args)
  File "/opt/conda/lib/python3.8/site-packages/hatchet/utils/run.py", line 151, in main
    hatchet_main(args=[
  File "/opt/conda/lib/python3.8/site-packages/hatchet/bin/HATCHet.py", line 192, in main
    diploidObjs = runningDiploid(neutral=neutral, args=args)
  File "/opt/conda/lib/python3.8/site-packages/hatchet/bin/HATCHet.py", line 381, in runningDiploid
    results.append((n , execute(args, basecmd, n, outprefix), outprefix))
  File "/opt/conda/lib/python3.8/site-packages/hatchet/bin/HATCHet.py", line 644, in execute
    raise RuntimeError(error("The following command failed: \n\t\t{}\nwith {}\n".format(cmd, buffer)))
RuntimeError: The following command failed: 
                /opt/conda/lib/python3.8/site-packages/hatchet/solve /scratch/887692/hatchet/output/bbc/bulk -f  -e 6 -j 4 -p 400 -u 0.03 -M 2 -v 2 -c 9:1:1 -n 2 -o /scratch/887692/hatchet/output/results/results.diploid.n2
with ['\x1b[95m\x1b[1m[10:48:58]### Parsing and checking input arguments\t\x1b[0m', '\x1b[92m[10:48:58]## \tInput prefix:  /scratch/887692/hatchet/output/bbc/bulk', 'Input SEG:  /scratch/887692/hatchet/output/bbc/bulk.seg', 'Input BBC:  /scratch/887692/hatchet/output/bbc/bulk.bbc', 'Number of clones:  2', 'Clonal copy numbers:  { 9 [Cluster] : 1|1 [CN] }', 'Help message:  0', 'Maximum number of copy-number states:  -1', 'Maximum integer copy number:  6', 'Number of jobs:  4', 'Number of seeds:  400', 'Minimum tumor-clone threshold:  0.03', 'Maximum resident memory:  -1', 'Time limit:  -1', 'Maximum number of iteratios:  10', 'Random seed:  -1', 'Solving mode:  Coordinate-descent only', 'Verbose:  2', 'Output prefix:  /scratch/887692/hatchet/output/results/results.diploid.n2', 'Diploid threshold:  0.1', 'Base:  1', 'Force amp-del:  1\t\x1b[0m', '\x1b[95m\x1b[1m[10:48:58]### Reading the input SEG file\t\x1b[0m', '\x1b[95m\x1b[1m[10:48:58]### Scale the read-depth ratios into fractional copy numbers using the provided copy numbers\t\x1b[0m', '\x1b[95m\x1b[1m[10:48:58]### Compute allele-specific fractional copy numbers using BAF\t\x1b[0m', '\x1b[95m\x1b[1m[10:48:58]### Starting coordinate descent algorithm on 400 seeds\t\x1b[0m', '\x1b[92m[10:48:58]## Coordinate Descence {\t\x1b[0m', "terminate called after throwing an instance of 'GRBException'", '']

I am using the latest version HATCHet 0.3.3 and the reference genome is HG19 / GRCh37.

Is there a possible fix for this?
Best wishes
Tim

Could the outcome from HATCHet link to MACHINA?

Hi, Thanks for the nice software and detailed/kind introduction to use. (I love the tutorial!!)

In the original paper for HATCHet, figure 7, it seemed that infer clonal seeding is possible via MACHINA algorithm with the outcome of HATCHet.
https://www.biorxiv.org/content/10.1101/496174v1.full.pdf

I could run MACHINA with the outcome from Pyclone, but I don't know how to link the outcome from HATCHet to MACHINA.

If it is possible, could you add some details how to infer clonal seeding via MACHINA with the outcome of HATCHet in introduction of HATCHet?

or in the paper, it was just compared results from MACHINA and HATCHet with same data? not HATCHet to MACHINA?

Thanks!

==== updated==
I received comments from Gryte Satas as like below. It is helpful information to me, and I'll try it.

Thanks all !!

However, one could run MACHINA in phm_tr mode which takes as input a phylogeny and infers a seeding history while resolving polytomies in the input phylogeny. In order to do this, one must first infer a phylogeny using the HATCHET copy-number clones. This can be done with our Copy-Number Tree algorithm (https://github.com/raphael-group/CNT-ILP). The resulting phylogeny can then be used as input for MACHINA as described in the pmh_tr section of the documentation.

Are non-human reference genomes supported?

I am attempting to run the example script (https://github.com/raphael-group/hatchet/blob/master/script/runHATCHet.sh) on pig data. In the deBAF step, I am getting the error: The given reference cannot be used because the chromosome names are inconsistent!

The command being used is

python /home/jiaqiwu6/hatchet/utils/deBAF.py -N /scratch/data/oncopig/kidney_RG.dedup.bam -T /scratch/data/oncopig/tumor1_RG.dedup.bam /scratch/data/oncopig/tumor2_RG.dedup.bam /scratch/data/oncopig/tumor3_RG.dedup.bam /scratch/data/oncopig/tumor4_RG.dedup.bam /scratch/data/oncopig/tumor5_RG.dedup.bam /scratch/data/oncopig/cell_line_RG.dedup.bam -S Normal tumor1 tumor2 tumor3 tumor4 tumor5 tumor0 -r /scratch/data/oncopig/ref/sus11.1.fa -j 22 -q 20 -Q 20 -U 20 -c 4 -C 300 -O /scratch/data/oncopig/hatchet_script/baf/normal.baf -o /scratch/data/oncopig/hatchet_script/baf/bulk.baf -v

I wonder if this is because I used -g hg19 in the binBAM step. Are non-human reference genomes supported (if so, where can I specify this)? Alternatively, can we exclude certain chromosomes from the computation entirely?

Thanks in advance.

cluBB bnpy version issue

Hello dear developers,

I've identified an issue while trying to run \time -v python2 -m hatchet cluBB ${BB}bulk.bb -o ${BBC}bulk.seg -O ${BBC}bulk.bbc -e ${RANDOM} -tB 0.04 -tR 0.15 -d 0.08

I've followed the instructions in the tutorial and receive the following error:

16:50:50]'time -v python2 -m hatchet cluBB /cluster/projects/kridelgroup/RAP_ANALYSIS/ANALYSIS/Hatchet/p001/bb/bulk.bb -o /cluster/projects/kridelgroup/RAP_ANALYSIS/ANALYSIS/Hatchet/p001/bbc/bulk.seg -O /cluster/projects/kridelgroup/RAP_ANALYSIS/ANALYSIS/Hatchet/p001/bbc/bulk.bbc -e 28080 -tB 0.04 -tR 0.15 -d 0.08
ESC[1mESC[95m[2021-Jan-11 16:52:15]# Parsing and checking input arguments
ESC[0mESC[1mESC[95m[2021-Jan-11 16:52:15]# Reading the combined BB file
ESC[0mESC[1mESC[95m[2021-Jan-11 16:52:16]# Format data to cluster
ESC[0mESC[1mESC[95m[2021-Jan-11 16:52:16]# Clustering bins by RD and BAF across tumor samples
ESC[0mESC[92m[2021-Jan-11 16:52:16]## Loading BNPY
ESC[0mESC[92m[2021-Jan-11 16:52:24]## Clustering...
ESC[0mTraceback (most recent call last):
  File "/cluster/tools/software/python/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/cluster/tools/software/python/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/cluster/home/kisaev/python2env/lib/python2.7/site-packages/hatchet/__main__.py", line 43, in <module>
    globals()[command](args)
  File "/cluster/home/kisaev/python2env/lib/python2.7/site-packages/hatchet/utils/cluBB.py", line 29, in main
    mus, sigmas, clusterAssignments, numPoints, numClusters = cluster(points=points, output=args["outsegments"], samples=samples, clouds=clouds, K=args["initclusters"], sf=args["tuning"], restarts=args['restarts'])
  File "/cluster/home/kisaev/python2env/lib/python2.7/site-packages/hatchet/utils/cluBB.py", line 151, in cluster
    hmodel, Info = bnpy.Run.run(Data, 'DPMixtureModel', 'DiagGauss', 'memoVB', nLap=100, nTask=restarts, K=K, moves='birth,merge', ECovMat='eye', sF=sf, doWriteStdOut=False)
AttributeError: 'module' object has no attribute 'Run'
Command exited with non-zero status 255

This is due to the change in bnpy function naming shown at this commit:
bnpy/bnpy@79062d9#diff-ba6a27f08b567cf80a70b88a32847ce969311c9ca20180b95dfed4420f9fc402

Due to the fact that BNPY does not use releases it's currently impossible to pin it at the proper version allowing me to run your tool.

Run has now been renamed to Runner thus causing an error when I run cluBB.

In addition I believe there are a few spots in your documentation that are outdated leading to conflicting commands requiring to pass in a -by flag which is no longer valid

Thank you for all your work maintaining this tool
Cheers!
-Karin

Update command for number of CPUs for Python3

With the switch to python3, the python command to automatically retrieve the maximum number of CPUs needs to be corrected:

hatchet/script/runHATCHet.sh

Line 14 in 461d889

 J=$(python -c 'import multiprocessing as mp; print mp.cpu_count()') #Replace with fixed number if you do not want to use all available cpus  

docker or conda version of hatchet

hi, is there a docker or conda version of hatchet. if so, is there some command about how to use hatchet in such situation,
thanks a lot

Color Palette Issue

Hi SImone,

I pulled the most recent release and am running some additional sample but running into an issue with seaborn/matplotlib now.

I keep on getting this error throw:

Traceback (most recent call last): File "/home/kinnamam/miniconda3/envs/hatch/hatchet/utils/BBot.py", line 464, in <module> main() File "/home/kinnamam/miniconda3/envs/hatch/hatchet/utils/BBot.py", line 74, in main clubb(bbc, clusters, args, out) File "/home/kinnamam/miniconda3/envs/hatch/hatchet/utils/BBot.py", line 257, in clubb g = sns.lmplot(data=df, x=lx, y=ly, hue=lh, hue_order=order, palette=args['cmap'], fit_reg=False, size=figsize[0], aspect=figsize[1], scatter_kws={"s":s}, legend=False, col=g, col_wrap=args['colwrap']) File "/home/kinnamam/.local/lib/python2.7/site-packages/seaborn/regression.py", line 588, in lmplot legend_out=legend_out) File "/home/kinnamam/.local/lib/python2.7/site-packages/seaborn/axisgrid.py", line 253, in __init__ colors = self._get_palette(data, hue, hue_order, palette) File "/home/kinnamam/.local/lib/python2.7/site-packages/seaborn/axisgrid.py", line 174, in _get_palette colors = color_palette(palette, n_colors) File "/home/kinnamam/.local/lib/python2.7/site-packages/seaborn/palettes.py", line 235, in color_palette raise ValueError("%s is not a valid palette name" % palette) ValueError: tab20 is not a valid palette name

Might be a seaborn/matplotlib version issue. Prior to pulling latest code didn't have this issue.
Thanks,
MIchael

Specified reference genome does not exist

Hi,

I was trying to run hatchet on my laptop, but I kept hitting this error:

[2021-May-24 19:22:11]# Parsing and checking input arguments
Traceback (most recent call last):
  File "/usr/local/bin/hatchet", line 33, in <module>
    sys.exit(load_entry_point('hatchet==0.3.1', 'console_scripts', 'hatchet')())
  File "/usr/local/lib/python3.9/site-packages/hatchet/__main__.py", line 63, in main
    globals()[command](args)
  File "/usr/local/lib/python3.9/site-packages/hatchet/utils/run.py", line 44, in main
    count_reads(
  File "/usr/local/lib/python3.9/site-packages/hatchet/utils/count_reads.py", line 16, in main
    args = ap.parse_count_reads_arguments(args)
  File "/usr/local/lib/python3.9/site-packages/hatchet/utils/ArgParsing.py", line 271, in parse_count_reads_arguments
    raise ValueError(sp.error("The specified reference genome does not exist!"))
ValueError: The specified reference genome does not exist!

I specified the paths for the reference, normal, and tumor bams correctly in hatchet.ini, and they were all in the same directory, but the reference wasn't being passed in for the count_reads function. I looked into run.py and noticed the genome argument was missing:

count_reads(
            args=[
                '-N', config.run.normal,
                '-g', config.run.reference,
                '-T'

After adding in '-g', config.run.reference, it ran as expected. Was this only a problem on my end?

Bioconda version does not support SNPCaller

The current version of HATCHet in bioconda does not support the command SNPCaller and results in the following message:

> python -m hatchet SNPCaller
The following commands are supported: binBAM deBAF comBBo cluBB BBot solve BBeval

"The given reference cannot be used because the chromosome names are inconsistent!"

I am getting the error during deBAF. I am using the same reference that I've used for alignment.

The regions provided for chromosome are non-disjoint or a region start is greater than corresponding region end

Hi, thanks for the software. I've been trying to run this on a whole exome samples with a exome bed region from hg38. There's currently a bug in the bed file verification whereby if you use bed file with "chr" naming convention (GRCh37 and hg38), the regions are read in as characters and the sort function will not work properly for parseRegions function in ArgParsing.py script. This will throw out the error (Due to the algorithm sorting the keys as string rather than numerics):

The regions provided for chromosome are non-disjoint or a region start is greater than corresponding region end

To solve this, I changed the append function from:
res[chro].append((split[1], split[2]))

to:
res[chro].append((int(split[1]), int(split[2])))

And it seems to be running fine now. I will update further if there's more issues!

help for result interpretation

Hi,

I dont know if you guys would want this on github at all, but i have some questions, how to interpret the outputs, because its not a 100% clear to me from the manual.

FIrst of all, my data is far more fragmented on first glance than your example data. I attached the copy number plot so you know what I mean.
intratumor-clones-totalcn.pdf
Would you rerun the analysis with a bigger bin size, or change some parameter to discourage the fragmentation, or do you think this is real signal?
The sequencing shown here is from 8 tumour samples at 160x depth.

following from that, i am not sure what the proportional plots mean. Is that the fraction of bins in that area that have that copy number? so if the region had sections with the hypothetical cns of 5,2,2,2,4 you would have 60%: 2 20%: 4 and 20%: 5 or am i missing something there?

And finally, what is the difference between the proportional plot and the mixture plot? shouldnt they be basically the same?
I would love to gain some insights into all of this.

Some additional questions:

I also realised, that my samples are assumed to be tetraploid

### SAMPLE: 26 -- PURITY: 0.707019 -- PLOIDY: 3.12095595261 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 41 -- PURITY: 0.2971019 -- PLOIDY: 3.26569623284 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 57 -- PURITY: 0.712802 -- PLOIDY: 3.12713806854 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 55 -- PURITY: 0.3873721 -- PLOIDY: 3.12282241734 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 47 -- PURITY: 0.4824277 -- PLOIDY: 3.1278291538 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 59 -- PURITY: 0.793201 -- PLOIDY: 3.25531305957 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 31 -- PURITY: 0.174453 -- PLOIDY: 3.86276716895 -- CLASSIFICATION: TETRAPLOID
### SAMPLE: 11 -- PURITY: 0.784387 -- PLOIDY: 3.10640781323 -- CLASSIFICATION: TETRAPLOID

But the sample is a human cancer, so it should be at least originally be diploid, what do I take away from this?

Also with other cn calling tools i have used, they did report possible alternate solutions for purity and ploidy, which obviously affect the calling quite heavily. I do have histological information about the cellularity of the sample, so I would love to see, if there might be another solution for a sample, which might be closer to what we already know of the sample.

GRBException Solve Step Failure

Hi! I have a WLS gurobi license and am wondering how to circumnavigate the errors I am getting at the solve step where it throws a GRBexception. I've set the GUROBI_HOME and GRB_LICENSE_FILE variables and have verified that my license is active… do you have any advice on this front? I had previously (old code ?) gotten the WES demo to run all the way through but I am once again getting this GRBexception error.

Thanks so much!

Reference Index File Name

Looks like HATCHet requires the reference index file to end in a .dict. This doesn't follow the standard convention of .fai or, sometimes, .tbx for the index files. Is there a need in your pipeline for the .dict name? FAIDX automatically adds the .fai extnsion

Sex chromosomes with labels 'chrX' and 'chrY' not reported

Hi,

I am running Hatchet fine but the sex chromosomes are not reported. My normal and tumor BAMs and reference use the nomenclature "chrX" and "chrY" for sex chromosomes, and I am not using any bed file.
From the logs, the sex chromosomes are excluded right away by binBAM.py, so I assume that what happens is that at the begginning of binBAM.py, ArgParsing.py calls function extractChromosomes which does not find chrX and chrY. From my understanding of the function, it is because extractChromosomes looks for numerical chromosomes (chr1 to chr23 here

hatchet/utils/ArgParsing.py

Lines 423 to 427 in 0e626b0

 for i in range(1, 23): 

 if str(i) in normal_sq: 

 no_chrm.add(str(i)) 

 elif "chr" + str(i) in normal_sq: 

 chrm.add("chr" + str(i))

). In that case, is it possible to add chrX and chrY to the list of chromosomes that can be detected? I don't think that just naming my sex chromosomes 23 would be a good idea, since then I would loose the difference between X and Y. It is possible that giving a bed file would solve the issue, but I feel that the automatic detection should be able to find them too.

Thanks a lot!

Nicolas

cluBB: `division by zero` error

I'm encountering the following issue with some of our samples:

  Traceback (most recent call last):
    File "/opt/gridware/depots/1a8f5697/el7/pkg/apps/python3/3.8.1/gcc-4.8.5/lib/python3.8/runpy.py", line 193, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/opt/gridware/depots/1a8f5697/el7/pkg/apps/python3/3.8.1/gcc-4.8.5/lib/python3.8/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/opt/apps/pkg/apps/hatchet/0.2.11/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/__main__.py", line 45, in <module>
      globals()[command](args)
    File "/opt/apps/pkg/apps/hatchet/0.2.11/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 49, in main
      segments = segmentBins(bb=combo, clusters=clusters, samples=samples)
    File "/opt/apps/pkg/apps/hatchet/0.2.11/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 228, in segmentBins
      return minSegmentBins(sbb, nbins, rd, nsnps, cov, clusters, samples)
    File "/opt/apps/pkg/apps/hatchet/0.2.11/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 234, in minSegmentBins
      mean = {cluster : {sample : float(alpha[cluster][sample]) / float(alpha[cluster][sample]+beta[cluster][sample]) for sample in samples} for cluster in clusters}
    File "/opt/apps/pkg/apps/hatchet/0.2.11/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 234, in <dictcomp>
      mean = {cluster : {sample : float(alpha[cluster][sample]) / float(alpha[cluster][sample]+beta[cluster][sample]) for sample in samples} for cluster in clusters}
    File "/opt/apps/pkg/apps/hatchet/0.2.11/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/cluBB.py", line 234, in <dictcomp>
      mean = {cluster : {sample : float(alpha[cluster][sample]) / float(alpha[cluster][sample]+beta[cluster][sample]) for sample in samples} for cluster in clusters}
  ZeroDivisionError: float division by zero

I had a dig around the bulk.bb input file, which has a few lines like this one:

chr1    2700000 2750000 tumour   1.0423660310815477      1       40.0    0       0       0.5

Might this be the problem?

I'm using release version 0.2.11. I haven't tried the 0.3 releases yet - is it likely to be fixed in there?

Samples in BBC files does not match the ones in SEG file

Hi,

I'm trying to run the full HATCHet script using three tumor samples (L01, L02, and P01), but I'm getting an error in the compute-cn step:

Traceback (most recent call last):
  File "/mnt/exports/shared/home/panand/miniconda3/envs/hatchet/bin/hatchet", line 33, in <module>
    sys.exit(load_entry_point('hatchet==0.3.2', 'console_scripts', 'hatchet')())
  File "/mnt/exports/shared/home/panand/miniconda3/envs/hatchet/lib/python3.9/site-packages/hatchet/__main__.py", line 64, in main
    globals()[command](args)
  File "/mnt/exports/shared/home/panand/miniconda3/envs/hatchet/lib/python3.9/site-packages/hatchet/utils/run.py", line 149, in main
    hatchet_main(args=[
  File "/mnt/exports/shared/home/panand/miniconda3/envs/hatchet/lib/python3.9/site-packages/hatchet/bin/HATCHet.py", line 181, in main
    assert bsamples == ssamples, error("Samples in BBC files does not match the ones in SEG file!")
AssertionError: Samples in BBC files does not match the ones in SEG file!

The input log for compute-cn looks like this:

## solver:	/mnt/exports/shared/home/panand/miniconda3/envs/hatchet/lib/python3.9/site-packages/hatchet/solve
## input:	output/bbc/bulk
## seg:	output/bbc/bulk.seg
## bbc:	output/bbc/bulk.bbc
## ln:	2
## un:	6
## clonal:	None
## ampdel:	True
## d:	None
## eD:	6
## eT:	12
## ts:	0.008
## tc:	1
## td:	0.01
## tR:	0.08
## tB:	0.04
## mR:	0.08
## mB:	0.04
## limit:	0.6
## g:	0.35
## p:	400
## j:	72
## r:	None
## s:	None
## m:	None
## u:	0.03
## f:	None
## M:	2
## x:	output/results
## diploid:	False
## tetraploid:	False
## v:	2

I tried printing out the contents of bsamples and ssamples right before the assertion during the run, and saw that ssamples was empty:

bsamples: {'L01', 'L02', 'P01'}
ssamples: set()

But when I run compute-cn separately using:

hatchet compute-cn -i output/bbc/bulk -x output/results/ -n 2,6 -p 400 -u 0.03 -eD 6 -eT 12 -g 0.35 -l 0.6

it works as expected.

I'm unsure where to look for this. Any help would be appreciated, thanks.

Interpretation of results and integration with SNVs

Hi again, I successfully ran your tool on WGS data from patient with two tumour samples from two distinct spatial regions. My full data set contains 20 samples for this patient but for now I am just testing the tool on two samples. I have been reading your manuscript in more detail and am interested in the integration with VAFs from SNVs. I would like to try and merge the results from hatchet with my mutations so that I can run them through Pyclone for example.

I was wondering if you can please suggest the correct way to do this. In the results file 'best.seg.ucn', I see for each segment in the genome, the major and minor allele copy number status for each clone along with the abundance of each clone in a given sample. I am just having a bit of a hard time wrapping my head around how to correctly match a given SNV in my data with the appropriate clone results.

Thank you in advance!

Karin

deBAF required options inconsistent

Hi, thanks for this software!

I'm trying to run the deBAF step of the workflow, I provided --regions as my data are exome-based and get an error stating that the --snps option is required - I don't think this is documented here.

I made a snps file by extracting the chrs and positions from a gnomad VCF (no header, it's not complete clear what kind of file deBAF expects), but parse_baf_arguments errors with

ValueError: Both SNP list and genomic regions have been provided, please provide only one of these!

For Exome data, should I subset the list of snps using my exon regions manually given that I can't pass both to the tool?

Python 3 support???

Since Python 2 is gone will there be support for Python 3? I have started to make the changes for Python 3 support but I did it using a clone of master but I see a Ci branch that appears to be active. Is someone already working on this? I don't want to duplicate efforts.

Type error in plot-cn

hatchet/src/hatchet/utils/plot_cn.py

Line 45 in 81fa8ae

 parser.add_argument("-b","--baseCN", required=False, default=config.plot_cn.basecn, type=str, help='Base copy number (default: inferred from tumor ploidy)') 

Arg type is str, but later comparison requires int type.

hatchet/src/hatchet/utils/plot_cn.py

Line 90 in 81fa8ae

if args.baseCN is not None and args.baseCN < 2:

Library nomalization

Have this software performed library normalization before or after computing the LogR value? If not, would it affect the clustering result since this software use both LogR and BAF value to cluster bin?

installation requires specific versions (which are no specified)

Hi,

now that you changed to a setup.py, i ran into some issues when trying to install.
most packages need a specific version to work with your tool, otherwise it fails, but the setup process actually uses the newest available instead.

i had to manually install a few of the deps to make it work.

scikit-learn==0.20
zipp==1.2
sphinx==1.8.3
ipython==5.1
joblib==0.10

and it required cython
and to call the program with the python -m hatchet notation, i also had to install 'configparser'

It would be good if you could add that as specific requirements, otherwise its quite a pain to actually install.

Cheers,
Sebastian

AttributeError: 'module' object has no attribute 'MOVBBirthMergeAlg'

This seems to be caused by a deprecated algorithm used in utils/cluBB.py:

hatchet/utils/cluBB.py

Line 154 in c73f619

 hmodel, Info = bnpy.Run.run(Data, 'DPMixtureModel', 'DiagGauss', 'moVB', nLap=100, nTask=1, K=K, moves='birth,merge', targetMaxSize=500, ECovMat='eye', mergeStartLap=10, sF=sf, doWriteStdOut=False) 

As it seems it has been noted before in the context of THetA:
bnpy/bnpy#14 (comment)

Changing the algorithm from 'moVB' to 'memoVB' fixes the error - but is it methodically correct?

Cheers!
Harry

GATK custom pipeline is using python2

The GATK custom pipeline is using python2 and is not working in the new python3 environment of HATCHet and needs to move over python3.

None potential neutral cluster found with given parameters!

Hello,
I am trying to run Hatchet solve on the bulk.bbc and bulk.seg.

The code I am trying to execute is: python2 /hatchet/bin/HATCHet.py /hatchet/build/solve -i bulk -n2,8 -p 400 -v 3 -r ${RANDOM} &> >(tee >(grep -v Progress > hatchet.log))

The error I keep getting is:
ValueError: None potential neutral cluster found with given parameters!

Is this problem due to previous steps (eg. clubb.py) or the parameters being fed into the hatchet solver?

Any help would be appreciated,
Aaron

deal with primary/met tumor-only samples

I have 10 pairs of primary/met samples from 10 patients, but none of them have normal samples.
Is hatchet able to be used in this situation?

Thanks

Update complete demo

Complete demo of entire HATCHet's execution does not work with the newest version of HATCHet and need to be updated:

https://github.com/raphael-group/hatchet/blob/master/examples/demo-complete/demo-complete.sh

Question about bin start-end used for samtools invocation

I notice that in the BAMBinning.py script, the chromosome range is split up into bins (of size -b as passed in to the script). Assuming this is 50kb, and -q is 11, the script seems to be invoking samtools repeatedly like so:

samtools view input.bam -c -q 11 chr11:0-50000
samtools view input.bam -c -q 11 chr11:50000-100000
...

My understanding is that samtools uses 1-based indexing, with both boundaries inclusive. Although passing in a 0 instead of a 1 in the first case seems to work (and gives identical results to the 1 case), this seems to be something that is handled as a special case by samtools, but fails when one tries to use 0 in the underlying htslib library (as I found when trying to use pysam). So perhaps this may need fixing?

Also, creating the ranges like above means that position 50000 is repeated between the first invocation and the second. Maybe this is intentional (I don't know much about the underlying science), in which case I can emulate this 1-position overlap in the pysam case too.

Overall, I'm finding that utilizing pysam (a thin python wrapper around the underlying htslib library) instead of samtools through popen seems to reduce runtime on a ~8GB BAM (I'm using SRR5906250.bam), from 220s to just 40s, while producing identical .bin files, so I think this might be worth pursuing..

ZeroDivisionError in plot-cn

In one of my samples, I get this error when trying to plot:

# Plotting reduced-clone profiles in ./intratumor-profilesreduced.pdf
m/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py:704: UserWarning: Attempting to set identical left == right == 0 results in singular transformations; automatically expanding.
  ax.set_xlim(0, max_dependent_coord * 1.05)
# Plotting reduced mixtures in ./intratumor-mixtures.pdf
Traceback (most recent call last):
  File "/opt/gridware/depots/1a8f5697/el7/pkg/apps/python3/3.8.1/gcc-4.8.5/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/gridware/depots/1a8f5697/el7/pkg/apps/python3/3.8.1/gcc-4.8.5/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/__main__.py", line 69, in <module>
    sys.exit(main())
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/__main__.py", line 63, in main
    globals()[command](args)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/plot_cn.py", line 159, in main
    single(tumors[list(tumors)[0]], clones[list(clones)[0]], props[list(props)[0]], base[list(base)[0]], args)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/plot_cn.py", line 212, in single
    gridmixtures(tumor, base, clones, props, args, out)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/plot_cn.py", line 536, in gridmixtures
    g = sns.clustermap(**para)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/_decorators.py", line 46, in inner_f
    return f(**kwargs)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 1408, in clustermap
    return plotter.plot(metric=metric, method=method,
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 1221, in plot
    self.plot_dendrograms(row_cluster, col_cluster, metric, method,
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 1066, in plot_dendrograms
    self.dendrogram_row = dendrogram(
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/_decorators.py", line 46, in inner_f
    return f(**kwargs)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 774, in dendrogram
    plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis,
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 584, in __init__
    self.linkage = self.calculated_linkage
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 651, in calculated_linkage
    return self._calculate_linkage_scipy()
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/seaborn/matrix.py", line 619, in _calculate_linkage_scipy
    linkage = hierarchy.linkage(self.array, method=self.method,
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 1060, in linkage
    y = distance.pdist(y, metric)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/scipy/spatial/distance.py", line 2072, in pdist
    dm[k] = metric(X[i], X[j], **kwargs)
  File "/opt/apps/pkg/apps/hatchet/0.3.1/gcc-4.8.5+python3-3.8.1+gurobi-9.1.2+samtools-1.9+bcftools-1.9/lib/python3.8/site-packages/hatchet/utils/plot_cn.py", line 939, in similaritysample
    return float(sum((bothamp(u[i], v[i]) or bothdel(u[i], v[i])) and u[i] != 0 and v[i] != 0 for i in range(len(u)))) / float(sum(u[i] != 0 or v[i] != 0 for i in range(len(u))))
ZeroDivisionError: float division by zero

The input for that HATCHet run were two tumour samples vs one control. It seems the tumour samples had low purity and/or no CNVs, ie this might just be a bad sample. Still, I guess one should guard against crashes even if the input is poor quality.

Let me know if you need the .ucn files for debugging.

Compatibility with PhyloWGS

Hi,

I am working with a dataset with high aneuploidy and I was hoping to use your tool in combination with https://github.com/morrislab/phylowgs. (I tried battenberg before, but I think there may be too much noise to compare CNA calls cross-sample)

Can the output from from hatchet be made similar to the _subclones.txt files that phylowgs uses?

Thank you

multiple -i arguments found whilst running hatchet.py

Hi,

I am trying to run hatchet in my dataset but i got an error whilst running HATCHet.py. The error trace is attached. Would you have an idea what I am doing wrong?

Thank you so much,
Gunes Gundem

/Users/gundemg/Documents/projects/neuroblastoma/triple_callers/error_trace.txt

terminate called after throwing an instance of 'GRBException

While running the software using the ini profile, I ran into the following problems:

Traceback (most recent call last):
  File "/usr/local/bin/hatchet", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/hatchet/__main__.py", line 63, in main
    globals()[command](args)
  File "/usr/local/lib/python3.8/site-packages/hatchet/utils/run.py", line 149, in main
    hatchet_main(args=[
  File "/usr/local/lib/python3.8/site-packages/hatchet/bin/HATCHet.py", line 192, in main
    diploidObjs = runningDiploid(neutral=neutral, args=args)
  File "/usr/local/lib/python3.8/site-packages/hatchet/bin/HATCHet.py", line 381, in runningDiploid
    results.append((n , execute(args, basecmd, n, outprefix), outprefix))
  File "/usr/local/lib/python3.8/site-packages/hatchet/bin/HATCHet.py", line 644, in execute
    raise RuntimeError(error("The following command failed: \n\t\t{}\nwith {}\n".format(cmd, buffer)))
RuntimeError: The following command failed: 
                /usr/local/lib/python3.8/site-packages/hatchet/solve output1/bbc/bulk -f  -e 6 -j 64 -p 400 -u 0.03 -M 2 -v 2 -c 5:1:1 -n 2 -o output1/results/results.diploid.n2
with ['\x1b[95m\x1b[1m[04:46:16]### Parsing and checking input arguments\t\x1b[0m', '\x1b[92m[04:46:16]## \tInput prefix:  output1/bbc/bulk', 'Input SEG:  output1/bbc/bulk.seg', 'Input BBC:  output1/bbc/bulk.bbc', 'Number of clones:  2', 'Clonal copy numbers:  { 5 [Cluster] : 1|1 [CN] }', 'Help message:  0', 'Maximum number of copy-number states:  -1', 'Maximum integer copy number:  6', 'Number of jobs:  64', 'Number of seeds:  400', 'Minimum tumor-clone threshold:  
0.03', 'Maximum resident memory:  -1', 'Time limit:  -1', 'Maximum number of iteratios:  10', 'Random seed:  -1', 'Solving mode:  Coordinate-descent only', 'Verbose:  2', 'Output prefix:  output1/results/results.diploid.n2', 'Diploid threshold:  0.1', 'Base:  1', 'Force amp-del:  1\t\x1b[0m', '\x1b[95m\x1b[1m[04:46:16]### Reading the input SEG file\t\x1b[0m', '\x1b[95m\x1b[1m[04:46:16]### Scale the read-depth ratios into fractional copy numbers using the provided copy numbers\t\x1b[0m', '\x1b[95m\x1b[1m[04:46:16]### Compute allele-specific fractional copy numbers using BAF\t\x1b[0m', '\x1b[95m\x1b[1m[04:46:16]### Starting coordinate descent algorithm on 400 seeds\t\x1b[0m', '\x1b[92m[04:46:16]## Coordinate Descence {\t\x1b[0m', "terminate called after throwing an instance of 'GRBException'", '']

The log show that, processing stop at "Running diploid" step,
here is my ini profile,

[run]
# What individual steps of HATCHet should we run in the pipeline?
# Valid values are True or False
count_reads = True
genotype_snps = True
count_alleles = True
combine_counts = True
cluster_bins = True
plot_bins = True
compute_cn = True
plot_cn = True

# Path to reference genome
# Make sure you have also generated the reference dictionary as /path/to/reference.dict
reference = /data/reference/hg38.fa"
normal = "/data/SAMN05341176/bam/SRR3943943.sorted.marked.bqsr.bam"
bams = "/data/SAMN05341176/bam/SRR3943944.sorted.marked.bqsr.bam"
samples = "SAMN05341176"

# Output path of the run script
output = "output1/"

# How many cores to use for the end-end pipeline?
# This parameter, if specified, will override corresponding 'processes' parameters in individual <step> sections below.
processes = 64

[count_reads]
# Bin size for calculating RDR and BAF
size = 50kb

[genotype_snps]
# Reference version used to select list of known germline SNPs;
# Possible values are "hg19" or "hg38", or leave blank "" if you wish for all positions to be genotyped by bcftools
reference_version = "hg38"
# Does your reference name chromosomes with "chr" prefix?; True or False
chr_notation = True

# Use 8 for WGS with >30x and 20 for WES with ~100x
mincov = 20
# Use 300 for WGS with >30x and Use 1000 for WES with ~100x
maxcov = 1000
# Path to SNP list
#   If blank, HATCHet selects a list of known germline SNPs based on <run.reference_version> and <run.chr_notation>
#   If not, please provide full path to a locally stored list (.vcf.gz) here.
snps = "/bundle/hg38/dbsnp_146.hg38.vcf.gz"

[count_alleles]
# Use 8 for WGS with >30x and 20 for WES with ~100x
mincov = 20
# Use 300 for WGS with >30x and Use 1000 for WES with ~100x
maxcov = 1000

[combine_counts]
# Haplotype block size  used for combining SNPs
blocklength = 50kb
# Path to phased file; leave as "None" to run hatchet without phasing
phase = "None"

[cluster_bins]
diploidbaf = 0.08
tolerancerdr = 0.15
tolerancebaf = 0.04

[plot_bins]
sizethreshold = 0.01
figsize = "6,3"

[compute_cn]
clones = 2,6
seeds = 400
minprop = 0.03
diploidcmax = 6
tetraploidcmax = 12
ghostprop = 0.35
limitinc = 0.6

Known samples with low purity - resolution of clone portion

I was wondering if we have a known set of samples (J) with adequate purity and known set of samples (K) with low purity. If we first solve for the clonal profiles using just samples J, would it be possible to later calculate the proportions of clone1,2,3 etc for samples K?

	for i in range(1, 23):
	if str(i) in normal_sq:
	no_chrm.add(str(i))
	elif "chr" + str(i) in normal_sq:
	chrm.add("chr" + str(i))

raphael-group / hatchet Goto Github PK

hatchet's People

Contributors

Stargazers

Watchers

Forkers

hatchet's Issues

Recommend Projects

Recommend Topics

Recommend Org