Giter Club home page Giter Club logo

Comments (25)

pmelsted avatar pmelsted commented on September 24, 2024

There is no fixed ordering so a parser will have to accept any possible order.

The only exception would perhaps be the header line, it would be sort of rude to not have it be the first line, but that's not in the spec yet.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

I guess I was more concerned about output, so that's fine then. It would indeed be a bit coarse to put a header line anywhere but the header.

Thanks!

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

ABySS requires that the segments incident to a link be given before the link that refers to those segments. I'd like to see that be a requirement of the spec. My preference is that the records are grouped together by type in the order H, S, L then P. I haven't thought about the other record types. It would have been nice if the record types were in alphabetical order. Oh well.

from gfa-spec.

pmelsted avatar pmelsted commented on September 24, 2024

I think it makes sense to not allow links to refer to paths that haven't been defined yet, similarly for paths. A stronger version of this is to specify everything in a strict order, H, S, L P.

It makes the parsing less painful and shouldn't be a burden on producing the output.

from gfa-spec.

ekg avatar ekg commented on September 24, 2024

But to enforce it in the schema seems overly heavy handed. In any case we
can fix the ordering via a sort. I would suggest we leave it unspecified
what the order might be.
On Feb 19, 2016 9:05 PM, "Pall Melsted" [email protected] wrote:

I think it makes sense to not allow links to refer to paths that haven't
been defined yet, similarly for paths. A stronger version of this is to
specify everything in a strict order, H, S, L P.

It makes the parsing less painful and shouldn't be a burden on producing
the output.


Reply to this email directly or view it on GitHub
#25 (comment).

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

I propose that

  1. The definition of segments should precede the definition of the links that are incident to those segments.
  2. The definition of segments and links should precede the definition of paths that reference those segments and links.

Anyone disagree?

from gfa-spec.

ekg avatar ekg commented on September 24, 2024

I dissent. I think we should not impose an ordering I'm the spec. It will
only cause pain and has no effect on the semantic content of what is
transmitted.

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

Okay. How about adding it to the spec as a should rather than a must, as a recommendation rather than a requirement? (as per https://www.ietf.org/rfc/rfc2119.txt)

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

Even when it doesn't affect the semantic content, I think making the format as easy to parse as possible for implementers is an important consideration.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

I think it's certainly more convenient to parse things if we ensure that one segment of a link is given before the link (e.g. a link's source is always given before that link). But I'm not sure I like requiring both ends to be defined beforehand. This does get complicated with inversions. This advantage goes away if we parse things to a map beforehand anyway.

I'm strongly opposed to the HSLP ordering. While it contains all the elements and is easy to parse by machines, to me it's harder for a human being to parse. It destroys the "graphiness" of GFA by decomposing the graph into loose sets of elements. It's nice to find a segment in the file and immediately see if it's highly connected based on how many link lines immediately follow its definition and whether it is on any paths.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

Also sorting on the source nodes of elements (S, C, L, P lines) is nice - it's easy enough to enforce alphanumeric sorting and it might simplify random access.

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

SAM allows specifying in the header whether the file is ordered or not. That makes sense here too.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

Good point - if there's an equivalent to samtools sort for gfa (and a well-defined sorting order) for GFA it seems reasonable to allow both.

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

There is no gfatools sort yet, but I think we'll need one for certain for random access.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

I'm on it - I'll push something here today. Unfortunately my javascript is too rusty to push it to gfatools proper.

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

I regularly sort SAM files with the UNIX sort utility. It would be nice if GFA could be sorted with the UNIX sort utility. It's unfortunate that there's not a clear way to specify the sort order of the record types to UNIX sort.

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

You could do a hacky thing where you change H to 0, S to 1, L to 2, etc, then sort, then convert back, but that hackery kind of defeats the benefit of using an off the shelf tool.

from gfa-spec.

ekg avatar ekg commented on September 24, 2024

I have typically sorted on the second column with sort -k2. This yields an
output in which local regions of the graph tend to occur together.
Although, I tend to use numerical IDs, which probably changes things.

from gfa-spec.

sjackman avatar sjackman commented on September 24, 2024

I also tend to use numerical IDs, but for simplicity I think the specified ordering should be ASCII ordered. You can always pad with 0s at the left to get consistent numerical ordering.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

There's now a gfa_sort utility in gfakluge.

Alphanumeric sort on seg IDs, non-blocked:
./gfa_sort -i reads.gfa or ./gfa_sort --gfa-file reads.gfa

Blocked (HSLC) format and alphanumeric sort in each block:
./gfa_sort -i reads.gfa -b or ./gfa_sort --gfa-file reads.gfa --block-order

Block-ordered input/output is in the core lib now too.

from gfa-spec.

lh3 avatar lh3 commented on September 24, 2024

I also prefer NOT to impose ordering, except the header line. Having S lines above L lines does make programming easier, but not having this requirement is not too bad.

from gfa-spec.

ggonnella avatar ggonnella commented on September 24, 2024

A general strict ordering may not be necessary, but I think we still should mandate the header to be on the top. I opened for this reason I different issue on that.

from gfa-spec.

edawson avatar edawson commented on September 24, 2024

It should be possible to sort GFA without knowing the spec version; then the header lines should float to the top, and it should be easy to determine the VN from there. That being said, I agree with the idea posed in #39 that it is nice to have the H VN as the first line in the file.

from gfa-spec.

ggonnella avatar ggonnella commented on September 24, 2024

Yes, of course it is still possible to basically look for the VN in the file before really parsing it, but I think this should not be required. For the current files, I can parse the file in one pass (by introducing some "virtual" lines, such as S which I expect but did not came up yet), but if I have to look for the VN first, this will not be possible anymore.

from gfa-spec.

stale avatar stale commented on September 24, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

from gfa-spec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.