nservant / hic-pro Goto Github PK

View Code? Open in Web Editor NEW

372.0 372.0 181.0 47.16 MB

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing

License: Other

Python 45.32% Shell 31.76% R 7.62% C++ 11.49% Makefile 2.63% Dockerfile 0.34% Singularity 0.83%

hic-pro's People

Contributors

Stargazers

Watchers

Forkers

nellev yuanbaowen521 ikutojyu cyang-2014 ustcahwry matthiasblum dvanic jbrayet xuanheiiis sameet yixf-self dariabunina shijs1982 skytguuu tintingli bagavi yihchii chaijingchao123 bioinfo-pf-curie ruiqinzheng wenzi0809 mikedacre awaisc resurgo-genetics bioarpit1 diloreto-rose kaukrise jing-wan maxiaolong2017 linguoliang ejdzi epon93 konrad 1pakch fengyq lynn12123 vreuter burstingcell hui-liu rakarnik huihuipaodekuai ay-lab gyd1990 leajessop maleilei wenjiexiaomeng doaneas youngorchuang xuelei-dai ea409 yizhes dayedepps reineckef biobenkj langjidong jun-lizst zhangdahangroup fengpku dhtc jasondanic xuzhichao830 zhenyiwangthu yog31 jhh130910 abhijitcbio rrbscode biolittleboy dexterdandi scenxing jessica-2019 joray726 pouletaxel parvsachdeva bio-lijs dongxuzheng goodstudychina cerikson zoucheng123 mengchengyao zhengzhanye nicmoya yangxiaofeill xjyx bowangxjtu tangbozeng pandawyh tw7649116 kenichihorisawa life333 lanliting zhaokai2014 jchenpku anthonyjaquaniello skurscheid linzhi2013 leopoldc eijynagai jennymoon90 xzhang2016 zhenzhen-zhang

hic-pro's Issues

--help option should "work" (and output the same thing as -h)

Checking python libraries

Hi,
I currently had an issue with a "too up-to-date" scipy version. I am using the 0.18.1 developer version which causes an error within the scripts/install/check_pythonlib.py because the vcmp() can't catch the developer name tag suffix. I know it's a minor issue, but the error message is kind of a cryptic mess if you aren't a python programmer and is easy to fix/catch.

Best,
TK

cleaning

Improve tmp file cleaning. See mapping steps

Feature request: output matrices with uneven bins

Hi there,

Thank you very much for this pipeline! I'd like to create matrices with uneven size bins, and it is currently not possible. More precisely, I'd like to create matrices with exactly one RE site per bin.

Cheers,
Nelle

Makefile

Hello,

in the Makefile at the line:

iced: $(SOURCES)/ice_mod
(cp $(SOURCES)/ice_mod/iced/scripts/ice ${SCRIPTS}; cd $(SOURCES)/ice_mod/; python setup.py install --user;)

It would be better to have a makefile that uses the python set in config-install.txt

When I used the local version of python and the relative libraries, I had an error during installation. The way to solve it was giving the full path of python in the Makefile

Thanks

Zhan

HiC scaffolding

Hi,
Just curious, is it possible to use Hic-Pro for contig scaffolding?
Thanks!

installing issue - dist.py

When I run make CONFIG_SYS=config-install.txt install, I get the following error message:

/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'include_package_data'

and then these errors:
Warning: Assuming default configuration (iced/utils/{setup_utils,setup}.py was not found)Appending iced.utils configuration to iced
Ignoring attempt to set 'name' (from 'iced' to 'iced.utils')
Appending iced.datasets configuration to iced
Ignoring attempt to set 'name' (from 'iced' to 'iced.datasets')
non-existing path in 'iced/io': '../src/cblas'
non-existing path in 'iced/io': '../src/cblas'
Appending iced.io configuration to iced
Ignoring attempt to set 'name' (from 'iced' to 'iced.io')
non-existing path in 'iced': '../src/cblas'
non-existing path in 'iced': '../src/cblas'
Appending iced configuration to
Ignoring attempt to set 'name' (from '' to 'iced')
running install
error: [Errno 13] Permission denied: '/home/neuertlab/.local/lib/python2.7'
Makefile:57: recipe for target 'iced' failed
make: *** [iced] Error 1

could you advise? I already made sure that I had all the dependencies.

errors when job array is large

Hey Nicolas,

I'm noticing an issue that I sometimes have when I run Step1 of the pipeline with large datasets (which have large job arrays -- 100s of jobs). Sometimes a few of these array jobs fail to complete successfully, but when I go back and re-execute commands on the input data used in that specific array job, it completes successfully. Here is an example of one of these errors (I wish I had more examples of this to show, but I've just reran the pipeline in the past, and everything seems to go to completion):

Thu Mar 3 14:48:45 EST 2016
Bowtie2 alignment step1 ...

/gpfs/fs0/data/lab/Tony/tools/HiC-Pro_2.7.2b/scripts/bowtie_wrap.sh -c /data/lab/Tony/HiC/folder/config-hicpro.txt -u >> hicpro.log

Thu Mar 3 15:00:56 EST 2016
Bowtie2 alignment step2 ...

/gpfs/fs0/data/lab/Tony/tools/HiC-Pro_2.7.2b/scripts/bowtie_wrap.sh -c /data/lab/Tony/HiC/folder/config-hicpro.txt -l >> hicpro.log

Thu Mar 3 15:05:29 EST 2016
Combine both alignment ...
/gpfs/fs0/data/lab/Tony/tools/HiC-Pro_2.7.2b/scripts/bowtie_combine.sh -c /data/lab/Tony/HiC/folder/config-hicpro.txt >> hicpro.log

[pretty_header] invalid header

Thu Mar 3 15:06:21 EST 2016
Bowtie2 mapping statistics for R1 and R2 tags ...
/gpfs/fs0/data/lab/Tony/tools/HiC-Pro_2.7.2b/scripts/mapping_stat.sh -c /data/lab/Tony/HiC/folder/config-hicpro.txt >> hicpro.log
[E::hts_open] fail to open file 'bowtie_results/bwt2/REP2/AMD240_S8_L001_ad_R1_hg19.bwt2merged.bam'

samtools: failed to open "bowtie_results/bwt2/REP2/AMD240_S8_L001_ad_R1_hg19.bwt2merged.bam" for reading: No such file or directory

Thu Mar 3 15:06:42 EST 2016
Pairing of R1 and R2 tags ...
/gpfs/fs0/data/lab/Tony/tools/HiC-Pro_2.7.2b/scripts/bowtie_pairing.sh -c /data/lab/Tony/HiC/folder/config-hicpro.txt >> hicpro.log
Traceback (most recent call last):
File "/gpfs/fs0/data/lab/Tony/tools/HiC-Pro_2.7.2b/scripts/mergeSAM.py", line 216, in
with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2:
File "pysam/calignmentfile.pyx", line 318, in pysam.calignmentfile.AlignmentFile.cinit (pysam/calignmentfile.c:4730)
File "pysam/calignmentfile.pyx", line 534, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:7261)
IOError: file bowtie_results/bwt2/REP2/AMD240_S8_L001_ad_R1_hg19.bwt2merged.bam not found
make: *** [bowtie_pairing] Error 1

ENH: have an easy way to identify which exact command failed

I ran HiCPro parallel on a new human dataset, and some jobs failed, at different places in the pipeline. In order to better debug, it would be nice to identify exactly which step failed, and on which file.

Typo in Quick Start Guide and Feature Request

I came across this typo in the quick start guide example section. Maybe it was from an earlier version but the final normalization step points to a script that doesn't exist in 2.7.7

lundi 2 mars 2015, 17:03:57 (UTC+0100)
Run ICE Normalization ...
normContactMaps.sh -c /bioinfo/users/nservant/projects_dev/HiC-Pro/config_test.txt >> hicpro_IRM90_rep1_split.log

I think it should be running the ice_norm.sh from the scripts directory.

Secondly, I was wondering if you had any plans to include A/B Compartment calling from a PCA analysis eigenvector. This is similar to the doCisPCADomains function in hiclib (http://mirnylab.bitbucket.org/hiclib/binneddata.html#hiclib.binnedData.binnedData.doCisPCADomains).

Thanks for publishing such a well-written and documented pipeline! Really great stuff!

rm tmp and sam files by default

Change Makefile to remove temp files and sam files after running the complete workflow

some questions about the testdata

Hi! If I want to run the software using my own data,Must the data contain 4 datasets? such as SRR400264_01_R1.fastq.gz,SRR400264_01_R2.fastq.gz
SRR400264_00_R1.fastq.gz,SRR400264_00_R2.fastq.gz?

HiC-Pro crashes in case of annotation unconformity

The Bowtie2 indexes for mm9 seem to use 'chrMT' instead of 'chrM' as provided in the annotation file. If so, HiC-Pro crashes during reads filtering :

jeudi 12 mars 2015, 16:10:57 (UTC+0100)
Assign alignments to HindIII sites ...
/bioinfo/users/nservant/Apps/HiC-Pro_v2.4.0/scripts/mapped_2hic_fragments.sh -c /data/tmp/HiC_babraham/config-hicpro.txt >> TetMyc
Traceback (most recent call last):
File "/bioinfo/users/nservant/Apps/HiC-Pro_v2.4.0/scripts/mapped_2hic_fragments.py", line 405, in
r1_resfrag = getOverlappingRestrictionFragment(resFrag, r1_chrom, r1)
File "/bioinfo/users/nservant/Apps/HiC-Pro_v2.4.0/scripts/mapped_2hic_fragments.py", line 142, in getOverlappingRestrictionFragment
resfrag = resFrag[chrom].find(pos, pos+1)
KeyError: 'chrMT'
make: *** [mapped_2hic_fragments] Erreur 1

Default installation path

I think the default installation path should be ~/.local/bin, as this is the convention on linux.

QC

A couple of quality controls metrics are already available in HiCPro but more can be added.

Fraction of duplicates is not easily available
Proximity to 5' - and 3' restriction fragment site
Percentage of contacts of long-range and short-range distances
...

raw data directory needs to be specified twice ?

first in the command line, then in the configuration file. Why ?

-s and -p conflict

Option -s cannot be applied in parallel mode

Compatibility with Homer

Hi Nicolas,

Hope you are doing well!

I was wandering whether you have suggestions in how to modify the output of HiC-pro to make it compatible with Homer downstream analysis.

Thanks!

samtools-1.3 sort bug

Hi,

I had an error while running the pipeline during the bamfile pairing step. More specifically got the "Forward and reverse reads not paired. Check that BAM files are sorted." error from mergeSam.py file.

Tracking it down I realised that the 'samtools sort' output was never stored. In bowtie_combine.sh samtools sort is called by:

cmd="${SAMTOOLS_PATH}/samtools sort -@ ${N_CPU} -n ${BOWTIE2_FINAL_OUTPUT_DIR}/${prefix}.bwt2merged.bam ${BOWTIE2_FINAL_OUTPUT_DIR}/${prefix}.bwt2merged.sorted"

In samtools 1.3, "The obsolete samtools sort in.bam out.prefix usage has been removed. If you are still using ‑f, ‑o, or out.prefix, convert to use -T PREFIX and/or -o FILE instead."

Consider switching the sort call by specifying the output name with the -o flag.

Thank you for a great pipeline. :-)

Regards,
Nikos

Comparing Heatmaps

Hi,
I am wondering that if I want to compare two Hi-C heatmaps, which one is better ? using one raw heatmap to subtract another raw heatmap or using one iced heatmap to subtract another iced heatmap?

Thank you very much!

change CUT_SITE_5OVER in configuration file

as this is not really the cutting site, but the ligation motif which is expected

Annotations directory

We should be able to specify the annotations directory.

(I might do a PR for this issue.)

ICE normalization failure due to conflict between iced/io and standard io package

It appears that, in my installation of HiC-Pro / iced, problems arise when I attempt to run ICE normalization because of a naming conflict between the standard io Python package (used by tempfile) and the io package that comes with iced. This leads to errors that can be rectified by changing the order of imports, which is a sign of something strange going on in the code.

For instance, after installing HiC-Pro, I can't import iced directly:

(venv)kmn7@loge:~/park/hicpro/HiC-Pro_2.7.7$ python
Python 2.7.6 (default, Aug 3 2015, 17:43:52)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import iced
Traceback (most recent call last):
File "", line 1, in
File "/home/kmn7/venv/lib/python2.7/site-packages/iced/init.py", line 1, in
from . import normalization
File "/home/kmn7/venv/lib/python2.7/site-packages/iced/normalization.py", line 1, in
import numpy as np
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/init.py", line 180, in
from . import add_newdocs
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in
from numpy.lib import add_newdoc
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/lib/init.py", line 8, in
from .type_check import *
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in
import numpy.core.numeric as nx
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/core/init.py", line 57, in
from numpy.testing import Tester
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/testing/init.py", line 14, in
from .utils import *
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/testing/utils.py", line 15, in
from tempfile import mkdtemp
File "/opt/python-2.7.6/lib/python2.7/tempfile.py", line 32, in
import io as io
File "/home/kmn7/venv/lib/python2.7/site-packages/io/init.py", line 1, in
from .fastio import loadtxt, savetxt
File "init.pxd", line 155, in init iced.io.fastio (iced/io/fastio_.c:4789)
AttributeError: 'module' object has no attribute 'dtype'

Or even tempfile:

(venv)kmn7@loge:~/park/hicpro/HiC-Pro_2.7.7$ python
imPython 2.7.6 (default, Aug 3 2015, 17:43:52)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tempfile
Traceback (most recent call last):
File "", line 1, in
File "/opt/python-2.7.6/lib/python2.7/tempfile.py", line 32, in
import io as io
File "/home/kmn7/venv/lib/python2.7/site-packages/io/init.py", line 1, in
from .fastio import loadtxt, savetxt
File "init.pxd", line 155, in init iced.io.fastio_ (iced/io/fastio_.c:4789)
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/init.py", line 180, in
from . import add_newdocs
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in
from numpy.lib import add_newdoc
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/lib/init.py", line 8, in
from .type_check import *
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in
import numpy.core.numeric as _nx
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/core/init.py", line 57, in
from numpy.testing import Tester
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/testing/init.py", line 14, in
from .utils import *
File "/opt/python-2.7.6/lib/python2.7/site-packages/numpy/testing/utils.py", line 15, in
from tempfile import mkdtemp
ImportError: cannot import name mkdtemp

This is presumably because tempfile requires the standard io package (https://docs.python.org/2/library/io.html); we can confirm this by running:

import sys
sys.path.insert(0,'/opt/python-2.7.6/lib/python2.7')
import tempfile

But if I import io first, then I can import iced and tempfile without any problems:

import io
io.file
'/home/kmn7/venv/lib/python2.7/site-packages/io/init.pyc'
import tempfile
import iced

Might renaming the io package be a quick fix for this anomaly?

Thank you for your help!

Best,
Chris Nam

how to generate the annotation file of Arabidopsis_thaliana?

Hi
Im interested in your softeware,and I have somen data of Arabidopsis_thaliana,but theres no annotation file in you software`s annotation,so how can I get the annotation file of Arabidopsis_thaliana?
Can I use bowtie2 and then samtools 、bedtools to generate the Arabidopsis_thaliana.bed file?
Thanks a lot

[bam_header_read] EOF marker is absent. The input is probably truncated.

I am trying the test data. It give the error in the mapping stage.
[bam_header_read] EOF marker is absent. The input is probably truncated.

I am trying to map the reads by leaving --rg-id BMG --rg SM:${prefix} out. It does not give me any problem while using samtools to transform sam into bam.

ICE Normalization failed

Hi,

I ran the pipeline and it failed at ICE Normalization step with this error message "cannot find ice". Can you fix?

Thanks,
Duy

Error in "Combine both alignment" step

Test dataset - no annotation

For the test dataset, it would be nice if the bowtie2 index was also downloaded so that if you're not working with human you don't have to set up for human and build an hg19 index just to test that the software works.

There is also no config-test.txt file in the test_data directory, or in the ../test_data directory. One can be generated from that in the ./HiC-Pro_2.7.1/ directory, but it would be nice if the test was self-sufficient, so that users could actually test the tool without having to edit files and search for things.

build_maps issue

In some cases the --process option crashed because of a division by zero
The current version fix this issue by removing the option

Add citation in the readme.

Multiple Pairs

Hi,
When I was using mergeSAM.py to merge the two mapped reads file and get the pairing statistics, the results showed my pairs are 80% multiple pairs which is not reasonable. And I checked the script and found that " read.is_unique" function is only for bowtie2 mapped files. Since I was using SNAP to align the reads but not Bowtie2, could you please help me that is there an another option to find multiple pairs? Thanks a lot!

Best

python dependencies

The doc. might be updated to indicate a more specific python module dependencies i.e. including minimum version number - I ran into issues with an outdated pysam (.AlignmentFile() missing). Furthermore, the module iced is required.
My current (working on the testdata) pip freeze:

bx-python==0.7.3
iced==0.2.2
numpy==1.11.0
pysam==0.8.4
scipy==0.17.1

Pairing of R1 and R2 tags Error

The last command run in the main log file is running mergeSAM.py and the error it produced is:

Forward and reverse reads not paired. Check that BAM files are sorted.

The mergeSAM.log file last line says:

Forward and reverse reads not paired. Check that BAM files are sorted.

Using samtools Version: 1.3 (using htslib 1.3) and HiC-Pro 2.7.4b

Bowtie2 global alignment error - file path specification bug

It seems there may be a bug in the way folder paths are defined in bowtie_wrap.sh. Specifically, the R1 and probably R2 variables

During reads mapping, specifically when global_align() is called, several files are created, including ones of the form: x_R1_mm9.bwt2glob.unmap.fastq. All of those files seem to be created without error.

The next task seems to be to generate files of the form R1_mm9.bwt2glob.bam, and this is where the errors is.

Looking at the error log, the culprit is the following command:

/home/ubuntu/HCP/bowtie2-2.2.4/bowtie2 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --un bowtie_results/bwt2_global/fastq/00_SRR1300754_R1_mm9.bwt2glob.unmap.fastq --rg-id BMG --rg SM:00_SRR1300754_R1 --phred33-quals -p 8 -x /home/ubuntu/mm9_bowtie2_index/mm9 -U rawdata/fastq/00_SRR1300754_R1.fastq 2> logs/fastq/bowtie_00_SRR1300754_R1_global_mm9.log | /usr/bin/samtools view -F 4 -bS - > bowtie_results/bwt2_global/fastq/00_SRR1300754_R1_mm9.bwt2glob.bam

I noticed that the folder path that comes after the -U option doesn't exist, and that all of the .fastq files that are being referenced are actually in the /rawdata main dir.

When I manually did the same command but setting the -U option to rawdata/00_SRR1300754_R1.fastq, the files of the form mm9.bwt2glob.bam were sussessfully created.

This leads me to suspect that somewhere along the line, there is a file path specification bug that is causing this error.

Problem creating the torque script when filename contains several PAIR1_EXT

To reproduce the bug:
create datafiles of following format:
00_all_1.fastq 00_all_2.fastq 01_all_1.fastq 01_all_2.fastq 02_all_1.fastq 02_all_2.fastq 03_all_1.fastq 03_all_2.fastq 04_all_1.fastq 04_all_2.fastq 05_all_1.fastq 05_all_2.fastq

and set
PAIR1_EXT = 1
PAIR2_EXT = 2

=> it fails.

Support for raw sequences coming in one file

It appears that HiC-Pro only supports reading raw sequences in which the reads (e.g. R1, R2) come in separate files; are there any plans to support cases in which the sequences from the left and right are combined and in one file or is there some option I'm missing?

some questions about the bowtie_pairing error

Hi
When I try to run the hicpro using my own data of Arabidopsis_thaliana TAIR10,it showed me error messages:
Sun Jul 24 11:06:44 CST 2016
Pairing of R1 and R2 tags ...
/public/home/rqzheng/bin/HiC-Pro_2.7.8/scripts/bowtie_pairing.sh -c /public/home/rqzheng/t
est/hicprotest/config_test_latest.txt >> hicpro.logmake: *** [bowtie_pairing] Error 1
How can I solve this problem?
Thanks a lot!

"error: can't combine user with prefix, exec_prefix/home, or install_(plat)base"

I cloned the latest master (v2.7.8).

When I do make install, I get:

[eamorr@login1(eamorr) HiC-Pro]$ make install
(g++ -Wall -O2 -std=c++0x -o build_matrix /opt/apps/HiC-Pro/scripts/src/build_matrix.cpp; mv build_matrix /opt/apps/HiC-Pro/scripts)
(g++ -Wall -O2 -std=c++0x -o cutsite_trimming /opt/apps/HiC-Pro/scripts/src/cutsite_trimming.cpp; mv cutsite_trimming /opt/apps/HiC-Pro/scripts)
(cp /opt/apps/HiC-Pro/scripts/src/ice_mod/iced/scripts/ice /opt/apps/HiC-Pro/scripts; cd /opt/apps/HiC-Pro/scripts/src/ice_mod/; /opt/gridware/pkg/el6/apps/python/2.7.8/gcc-4.4.7/bin/python setup.py install --user;)
/opt/gridware/pkg/el6/apps/python/2.7.8/gcc-4.4.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'include_package_data'
  warnings.warn(msg)
/opt/gridware/pkg/el6/apps/python/2.7.8/gcc-4.4.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
  warnings.warn(msg)
Warning: Assuming default configuration (iced/utils/{setup_utils,setup}.py was not found)Appending iced.utils configuration to iced
Ignoring attempt to set 'name' (from 'iced' to 'iced.utils')
Appending iced.datasets configuration to iced
Ignoring attempt to set 'name' (from 'iced' to 'iced.datasets')
non-existing path in 'iced/io': '../src/cblas'
non-existing path in 'iced/io': '../src/cblas'
Appending iced.io configuration to iced
Ignoring attempt to set 'name' (from 'iced' to 'iced.io')
non-existing path in 'iced': '../src/cblas'
non-existing path in 'iced': '../src/cblas'
Appending iced configuration to
Ignoring attempt to set 'name' (from '' to 'iced')
running install
error: can't combine user with prefix, exec_prefix/home, or install_(plat)base
make: *** [iced] Error 1

I'm on Linux x64_64 CentOS 6.

Any ideas? Thank you kindly,

Check restriction fragment file is available

Check if the specified restriction fragment is available.
In the current version, if the file does not exist it automatically runs the DNase mode ...

installation error

The following error occurred during installation:

cp -Ri /u/home/galaxy/collaboratory/apps/HiC-Pro /u/home/galaxy/collaboratory/apps/hic-pro /HiC-Pro_2.7.6
cp: target `/HiC-Pro_2.7.6' is not a directory
make: *** [cp] Error 1

This is resolved by removing the space at the end of PATH in Prefix in the config-install.txt

potential bug: RM_MULTI in bowtie_pairing.sh

Hi there,

First of all very nice pipeline, enjoying working with it!

One potential bug I recognized today:

In the bowtie_pairing.sh script, the option "-m" for mergeSAM.py is set if RM_MULTI == 0.
RM_MULTI is referred to as removing multiple aligned reads in the documentation, but the option "-m" in mergeSAM.py actually enables the reporting of multiple alignments.

This shouldn't make make any difference when running bowtie2 with default parameters but may lead to false behavior when actually allowing multiple reads in the alignment.

Potentially the same issue with RM_SINGLETON.

Easy to workaround but would be probably worth adjusting.

Cheers

Reads pairing - reads not sorted

make: *** [bowtie_pairing] Error 1
Forward and reverse reads not paired. Check that BAM files are sorted.
It appears that reads name have /1 and /3 in their name, referring to R1 and R2.
Usually pipelines ignore any part after "/" things as they may reflect such information.

Conversion to full matrix

It would be great to have an option to convert the triplet sparse format to full matrix format that can eventually be compatible with downstream analysis tools such as TAD callers from Dixon et al and Crane et al.

Cryptic error when the R1/R2 tags are not set properly in parallel mode

HiC-Pro then fails to write the torque scripts with the following error:
make: *** [make_torque_script] Error 1

Error in the merge_valid_interactions.sh script while running the test dataset

I am trying to use HiC-Pro and running into an error on the test dataset itself. The software compiles properly, aligns the test reads properly but gives an error at the stage of merging multiple files from same sample.

Merge multiple files from the same sample ...
/usr/local/bin/HiC-Pro_2.6.0/scripts/merge_valid_interactions.sh -c /Users/chinmayshukla/Downloads/config_v2.5.1_orlatter.txt >> hicpro_test.log
Exit: Error in input type.'.fastq|.bam|.validPairs|.matrix' files are expected.
make: *** [merge_valid_interactions] Error 1

Any help would be appreciated!

Newer version of bx-python does not have a `version` attribute.

Hi,

The newer version of the bx-python (e.g. 0.7.3) does not have a __version__ attribute, and the check_pythonlib.py fails. I commented out the vcmp lines, and kept the except ImportError part, and it seems to have worked.

bug in rawdata organization

Bug if the rawdata folder contains a sample folder as expected together with some fastq.gz
In this case, the rawdata folder will be seen as sample

"organism" in the config file should probably be renamed "reference_genome"

as it is not the organism but the reference genome name (hg19 or TAIR10)

parallel mode suggestion

Hello,

I thought I'd offer a suggestion for the parallel mode of the software. It would be useful to have Step2 of the pipeline executed as a job array, so that analyses of different samples can be run concurrently on a cluster. I figure this would be useful for users analyzing many samples, especially if the samples are of significant sequencing depth.

Thanks,
Tony

hicpro2juicebox.sh

Hej,
I am highly interested to use my HiC-Pro output to parse it into juicebox. Since I am using a custom genome of my department, it is none of the default "hg18, hg19, hg38, dMel, mm9, mm10, anasPlat1, bTaurus3, canFam3, equCab2, galGal4, Pf3D7, sacCer3, sCerS288c, susScr3, or TAIR10" of the hicpro2juicebox.sh.
So my question is: where does the -g GENOME get sourced from, what exactly is the genome (fasta/ bw2 index/ .sizes?) and therefore can I create a way/workaround to get my output into the script?

Thanks,
TK