nservant / cancer-hic-norm Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 1.0 71.62 MB

Normalization of cancer Hi-C data

R 82.09% Shell 5.20% Python 12.70%

cancer-hic-norm's People

Contributors

Stargazers

Watchers

Forkers

jennymoon90

cancer-hic-norm's Issues

cnv source

Hi, I am interested in applying LOIC and CAIC balancing to my HiC data. Can I use copy number information from long-read CNV analysis instead of extracting it from the Hi-C data?
Thank you

changes for droso

Hi Nicolas,

Thank you for the response.

I was able to get LOIC working by using the existing codebase in the github repository.

My organism of reference is drosophila, so I had to modify the code a bit to get it working.

I have some notes regarding the modifications, if you would like to incorporate the same into the package to make it more generalised for other users.

cghseg, a CRAN R package which is used by the run_seg function in lib_cnv_hic, is no longer present in CRAN and users need to download it from CRAN archives.

A function within the GLAD package, ChrNumeric, which converts non-numeric chr names to numeric ones only accepts human or mouse chromosome names. I had to replace the function within the GLAD namespace to make it work with drosophila.

when running cnv_ice.py, the script wanted symmetric matrices, with i and j being numeric names.
> But in the examples provided in the HiC-Pro package, https://github.com/nservant/HiC-Pro/blob/master/doc/MANUAL.md suggested that the names of the genome intervals should be in character format. So, later after running cnv_ice.py for the first time, I had to reconvert and reimport my Hi-C data.
> The importC function in annotate_hicdata.R forces symmetry on the HiTC object, it would be better if this was written to disk after importing, so that users can pass on the same symmetric matrix to cnv_ice.py.

Within the python package, iced/normalization/init.py line 74, it is assumed that the cnv bias vector provided has missing values (0). If this is not the case, then the package produces an error at X.sum() because subsequent calls are made to an empty array.
> It would be better to do a if any(rows_to_remove): at line 75 to check for such an occurrence before doing the operation.

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file

Hello,

I am trying to use cancer-hic-norm to infer copy number information from the Hi-C data. For the time being, I am just trying to infer the copy number data and not normalize my Hi-C data according to it.

For the script cancer-hic-norm-master/cnv_from_hic/segment_hic_data.R, when it comes to the part for plotting p2, it generates the following data frame.

dat <- data.frame(chr=as.vector(seqnames(rs.seg.gr)), pos = xpos, counts.cor = rs.seg.gr$counts.cor, smt = rs.seg.gr$smt, cn=rs.seg.gr$cnv)

However, this results in an error for me because rs.seg.gr does not have a column called cnv. If I remove this part from the command (remove cn=rs.seg.gr$cnv), script runs fine. Output file seems to only use the smt column so I was thinking this should be okay but I just wanted to make sure I'm not discarding some kind of critical information. Which step is the CNV column supposed to be added to the rs.seg.gr object?

Also, I wasn't sure how to interpret the final output bed file with copy number values at the 4th column. Is this a log2 ratio compared to the average number of reads for bins throughout the genome? For example, if I see the 4th column being 2 for a bin of my interest, does that mean that bin shows twice more reads compared to average of the genome when appropriate normalization and smoothing of the signal took place?

Thank you!

CAIC normalisation very slow for high resolution

Do you have any recommendations for speeding up the CAIC normalisation for high resolution matrices?

I have called the CNVs and am supplying the seg.bed file along with the abs.bed and .matrix files to the ice_cnv.py script.

It works great and quite fast for 1Mb and 500Kb, but when I get down to 100Kb it is almost prohibitively slow. I would like to run the normalisation at 40Kb, but it would be impossible at the current speed.

The LOIC runs pretty fast, the rate limiting step is in estimating the CNV bias.

nservant / cancer-hic-norm Goto Github PK

cancer-hic-norm's People

Contributors

Stargazers

Watchers

Forkers

cancer-hic-norm's Issues

cnv source

changes for droso

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file

CAIC normalisation very slow for high resolution

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent