Comments (9)
Here is an example of A line:
A utg000001l 2093 - SRR11606870.1250244 0 16611 id:i:1250243 HG:A:a A utg000001l 3572 - SRR11606870.2648803 0 17169 id:i:2648802 HG:A:a
For SRR11606870.1250244, 'utg000001l' is the unitig name, 2093 and (3572 - 1) is the start position and end position in utg000001l, '-' is the direction in utg000001l, 'HG:A:a' is the haplotype label of SRR11606870.1250244 which is only useful for haplotype-resolved assembly like trio-binning.
As for cigar, I personally think most reads are exactly mapped to unitig since all reads have been corrected. I'm not sure if this assumption is enough for your project...
from hifiasm.
Thanks for the quick response - just clarifying, there are 2 A
lines above. It looks like they refer to different reads. I assumed that 2093
and 3572
are the start coordinates of alignment on the unitig and 0
is the start coordinate on the read in each line, and 16611
and 17169
are either the end coordinates or the alignment lengths?
I agree the reads should match the contigs almost exactly - for now it is useful enough for me to just to pull the coordinates, although if it would be easy to put in a cigar, or even better paf-style cs tag that would be really helpful, since I assume there will be some bubbles that are popped, correct? Or would the popped reads just get discarded?
from hifiasm.
It is technically difficult to output CIGAR because a small fraction of reads are not mapped exactly. We have no plan to deal with those. In addition, everything on the A-line is derived from corrected, not raw reads. CIGARs from corrected reads won't be useful to you anyway.
Or would the popped reads just get discarded?
Popped reads and contained reads are discarded. If you want to recruit reads in a region, you have to redo read alignment against the contigs.
from hifiasm.
Ok, thank you anyway!
from hifiasm.
I agree. Another possible solution is to jointly check p_ctg, a_ctg and r_utg. It will be more accurate in extreme cases, but maybe not as easy as directly alignment.
It is technically difficult to output CIGAR because a small fraction of reads are not mapped exactly. We have no plan to deal with those. In addition, everything on the A-line is derived from corrected, not raw reads. CIGARs from corrected reads won't be useful to you anyway.
Or would the popped reads just get discarded?
Popped reads and contained reads are discarded. If you want to recruit reads in a region, you have to redo read alignment against the contigs.
from hifiasm.
I assumed that 2093 and 3572 are the start coordinates of alignment on the unitig
Yes
and 0 is the start coordinate on the read in each line and 16611 and 17169 are either the end coordinates or the alignment lengths?
No, 16611 or 17169 is just read length, instead of alignment length.
from hifiasm.
Hi,
I wonder what the number 1250243 after the "id:i:" in "id:i:1250243" means. thanks!
from hifiasm.
It is the read ID, which is only useful for hifiasm itself.
from hifiasm.
i see. thanks!
from hifiasm.
Related Issues (20)
- issue in generating hap1 and hap2 asm files
- larger assembly size than kmer estimation genome size HOT 1
- larger assembly size than kmer estimation genome size HOT 2
- Why more contigs always present in haplotype 1 than haplotype 2? HOT 2
- overlap parameter HOT 2
- How do you assemble chromosomes X and Y? HOT 3
- Add Options for Pore-C Data HOT 1
- Output interpretation with HiFi+ONT+HiC with inbred samples + `-l0` HOT 1
- low BUSCO scores HOT 1
- Mitigate Overlapping Sequence Assignments in Haplotypes HOT 3
- Help!!! Segmentation fault (core dumped) HOT 1
- Question about the depth of ONT ultra-long reads HOT 1
- Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly? HOT 1
- setting K parameter in yak HOT 2
- how to make the correct genome size estimation for allotetraploid species? HOT 2
- Possible missing one haplotype in human assemblies HOT 2
- No haploid.gfa files output in trio-binning mode HOT 3
- Hifi + Hi-c + ONT assembly fails
- In Trio-binning, always more on hap1 despite (almost) same sequences for paternal and maternal
- discontinuous assembly with shorter pacbio hifi reads but high coverage HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hifiasm.