I am wondering about the gfa output - is there any documentation about the lines begin

Here is an example of A line: <div class="highlight highlight-source-

Thanks for the quick response - just clarifying, there are 2 <code class="notranslate"

GFA `A` lines about hifiasm HOT 9 CLOSED

chhylp123 commented on May 26, 2024

GFA `A` lines

from hifiasm.

Comments (9)

chhylp123 commented on May 26, 2024

Here is an example of A line:

A	utg000001l	2093	-	SRR11606870.1250244	0	16611	id:i:1250243	HG:A:a
A	utg000001l	3572	-	SRR11606870.2648803	0	17169	id:i:2648802	HG:A:a

For SRR11606870.1250244, 'utg000001l' is the unitig name, 2093 and (3572 - 1) is the start position and end position in utg000001l, '-' is the direction in utg000001l, 'HG:A:a' is the haplotype label of SRR11606870.1250244 which is only useful for haplotype-resolved assembly like trio-binning.

As for cigar, I personally think most reads are exactly mapped to unitig since all reads have been corrected. I'm not sure if this assumption is enough for your project...

from hifiasm.

apregier commented on May 26, 2024

Thanks for the quick response - just clarifying, there are 2 A lines above. It looks like they refer to different reads. I assumed that 2093 and 3572 are the start coordinates of alignment on the unitig and 0 is the start coordinate on the read in each line, and 16611 and 17169 are either the end coordinates or the alignment lengths?

I agree the reads should match the contigs almost exactly - for now it is useful enough for me to just to pull the coordinates, although if it would be easy to put in a cigar, or even better paf-style cs tag that would be really helpful, since I assume there will be some bubbles that are popped, correct? Or would the popped reads just get discarded?

from hifiasm.

lh3 commented on May 26, 2024

It is technically difficult to output CIGAR because a small fraction of reads are not mapped exactly. We have no plan to deal with those. In addition, everything on the A-line is derived from corrected, not raw reads. CIGARs from corrected reads won't be useful to you anyway.

Or would the popped reads just get discarded?

Popped reads and contained reads are discarded. If you want to recruit reads in a region, you have to redo read alignment against the contigs.

from hifiasm.

apregier commented on May 26, 2024

Ok, thank you anyway!

from hifiasm.

chhylp123 commented on May 26, 2024

I agree. Another possible solution is to jointly check p_ctg, a_ctg and r_utg. It will be more accurate in extreme cases, but maybe not as easy as directly alignment.

It is technically difficult to output CIGAR because a small fraction of reads are not mapped exactly. We have no plan to deal with those. In addition, everything on the A-line is derived from corrected, not raw reads. CIGARs from corrected reads won't be useful to you anyway.

Or would the popped reads just get discarded?

Popped reads and contained reads are discarded. If you want to recruit reads in a region, you have to redo read alignment against the contigs.

from hifiasm.

chhylp123 commented on May 26, 2024

I assumed that 2093 and 3572 are the start coordinates of alignment on the unitig

Yes

and 0 is the start coordinate on the read in each line and 16611 and 17169 are either the end coordinates or the alignment lengths?

No, 16611 or 17169 is just read length, instead of alignment length.

from hifiasm.

zhenzhenyang-psu commented on May 26, 2024

Hi,
I wonder what the number 1250243 after the "id:i:" in "id:i:1250243" means. thanks!

from hifiasm.

chhylp123 commented on May 26, 2024

It is the read ID, which is only useful for hifiasm itself.

from hifiasm.

zhenzhenyang-psu commented on May 26, 2024

i see. thanks!

from hifiasm.

GFA `A` lines about hifiasm HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent