Comments (6)
Chromap does not remove PCR duplicates in Hi-C through the preset option. You can find the information at the bottom of the manual page here: https://zhanghaowen.com/chromap/chromap.html .
If you want to remove the duplicates, you can add the option "--remove-pcr-duplicates".
from chromap.
Chromap does not remove PCR duplicates in Hi-C through the preset option. You can find the information at the bottom of the manual page here: https://zhanghaowen.com/chromap/chromap.html .
If you want to remove the duplicates, you can add the option "--remove-pcr-duplicates".
Great, thank you so much!
from chromap.
I just want to mention one of the main reasons that we did not add the PCR deduplication as the default behavior. The deduplication is based on alignment coordinates, and we did not keep track of the internal alignment breakpoint (ligation site) in Hi-C data. So if two read pairs have the same endpoints, but have different ligation sites, one of them will still be removed in the deduplication step.
from chromap.
I just want to mention one of the main reasons that we did not add the PCR deduplication as the default behavior. The deduplication is based on alignment coordinates, and we did not keep track of the internal alignment breakpoint (ligation site) in Hi-C data. So if two read pairs have the same endpoints, but have different ligation sites, one of them will still be removed in the deduplication step.
Thank you for your detailed explanation. That make senses. I just want to clarify that, only read pairs, of which both end reads are duplicated, will be considered as duplicates in chromap? Thanks!
from chromap.
Just following up to see if we can get an answer for @jimwry's previous question? I'm working on integrating chromap with https://github.com/c-zhou/yahs. Thanks!
from chromap.
Sorry I missed that question. To answer the question, yes, we only consider the endpoints in deduplication for read pairs in the pairs output format. For single-end read, it is only the start site on the reference genome.
from chromap.
Related Issues (20)
- [BUG] summary and log are confusing. HOT 6
- "Number of mapped reads" from log file HOT 3
- [Feature Request] report number of duplicated fragments in bulk HOT 4
- Different ValidPairs rate between chromap and bowtie2 in HiC data HOT 9
- how to keep multi-mapped paires for HiC data. HOT 1
- [BUG] output to /dev/stdout HOT 6
- Understanding the multi-mapping reads and whether they are part of the bed file HOT 2
- ATAC-seq single end? HOT 3
- Coordinate system of the output fragment file? HOT 1
- multi-mapped reads HOT 3
- [BUG] Manpage is down HOT 1
- [BUG] Support for combinatorial barcode indexing(like SHARE) not present HOT 3
- [BUG] chromap map Hi-C short reads Parameters: error threshold HOT 2
- [BUG]For HiC data, the size of SAM files outputted using Chromap is much smaller compared to those from BWA-MEM HOT 4
- Repetitive or low-quality barcode sequences in scATAC data HOT 1
- [BUG] possibly improper MD tag generation whej running atac data. HOT 3
- Mapping paired-end single-cell ATAC-Seq reads HOT 2
- why so slow? HOT 2
- Failure to load cellular barcodes containing Ns HOT 3
- Understanding the terminologies from stderr and summary HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chromap.