nservant / cancer-hic-norm Goto Github PK
View Code? Open in Web Editor NEWNormalization of cancer Hi-C data
Normalization of cancer Hi-C data
Hi, I am interested in applying LOIC and CAIC balancing to my HiC data. Can I use copy number information from long-read CNV analysis instead of extracting it from the Hi-C data?
Thank you
Hi Nicolas,
Thank you for the response.
I was able to get LOIC working by using the existing codebase in the github repository.
My organism of reference is drosophila, so I had to modify the code a bit to get it working.
I have some notes regarding the modifications, if you would like to incorporate the same into the package to make it more generalised for other users.
cghseg, a CRAN R package which is used by the run_seg function in lib_cnv_hic, is no longer present in CRAN and users need to download it from CRAN archives.
A function within the GLAD package, ChrNumeric, which converts non-numeric chr names to numeric ones only accepts human or mouse chromosome names. I had to replace the function within the GLAD namespace to make it work with drosophila.
when running cnv_ice.py, the script wanted symmetric matrices, with i and j being numeric names.
> But in the examples provided in the HiC-Pro package, https://github.com/nservant/HiC-Pro/blob/master/doc/MANUAL.md suggested that the names of the genome intervals should be in character format. So, later after running cnv_ice.py for the first time, I had to reconvert and reimport my Hi-C data.
> The importC function in annotate_hicdata.R forces symmetry on the HiTC object, it would be better if this was written to disk after importing, so that users can pass on the same symmetric matrix to cnv_ice.py.
Within the python package, iced/normalization/init.py line 74, it is assumed that the cnv bias vector provided has missing values (0). If this is not the case, then the package produces an error at X.sum() because subsequent calls are made to an empty array.
> It would be better to do a if any(rows_to_remove): at line 75 to check for such an occurrence before doing the operation.
Hello,
I am trying to use cancer-hic-norm to infer copy number information from the Hi-C data. For the time being, I am just trying to infer the copy number data and not normalize my Hi-C data according to it.
For the script cancer-hic-norm-master/cnv_from_hic/segment_hic_data.R, when it comes to the part for plotting p2, it generates the following data frame.
dat <- data.frame(chr=as.vector(seqnames(rs.seg.gr)), pos = xpos, counts.cor = rs.seg.gr$counts.cor, smt = rs.seg.gr$smt, cn=rs.seg.gr$cnv)
However, this results in an error for me because rs.seg.gr does not have a column called cnv. If I remove this part from the command (remove cn=rs.seg.gr$cnv), script runs fine. Output file seems to only use the smt column so I was thinking this should be okay but I just wanted to make sure I'm not discarding some kind of critical information. Which step is the CNV column supposed to be added to the rs.seg.gr object?
Also, I wasn't sure how to interpret the final output bed file with copy number values at the 4th column. Is this a log2 ratio compared to the average number of reads for bins throughout the genome? For example, if I see the 4th column being 2 for a bin of my interest, does that mean that bin shows twice more reads compared to average of the genome when appropriate normalization and smoothing of the signal took place?
Thank you!
Do you have any recommendations for speeding up the CAIC normalisation for high resolution matrices?
I have called the CNVs and am supplying the seg.bed file along with the abs.bed and .matrix files to the ice_cnv.py script.
It works great and quite fast for 1Mb and 500Kb, but when I get down to 100Kb it is almost prohibitively slow. I would like to run the normalisation at 40Kb, but it would be impossible at the current speed.
The LOIC runs pretty fast, the rate limiting step is in estimating the CNV bias.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.