Comments (25)
There is no fixed ordering so a parser will have to accept any possible order.
The only exception would perhaps be the header line, it would be sort of rude to not have it be the first line, but that's not in the spec yet.
from gfa-spec.
I guess I was more concerned about output, so that's fine then. It would indeed be a bit coarse to put a header line anywhere but the header.
Thanks!
from gfa-spec.
ABySS requires that the segments incident to a link be given before the link that refers to those segments. I'd like to see that be a requirement of the spec. My preference is that the records are grouped together by type in the order H
, S
, L
then P
. I haven't thought about the other record types. It would have been nice if the record types were in alphabetical order. Oh well.
from gfa-spec.
I think it makes sense to not allow links to refer to paths that haven't been defined yet, similarly for paths. A stronger version of this is to specify everything in a strict order, H, S, L P.
It makes the parsing less painful and shouldn't be a burden on producing the output.
from gfa-spec.
But to enforce it in the schema seems overly heavy handed. In any case we
can fix the ordering via a sort. I would suggest we leave it unspecified
what the order might be.
On Feb 19, 2016 9:05 PM, "Pall Melsted" [email protected] wrote:
I think it makes sense to not allow links to refer to paths that haven't
been defined yet, similarly for paths. A stronger version of this is to
specify everything in a strict order, H, S, L P.It makes the parsing less painful and shouldn't be a burden on producing
the output.—
Reply to this email directly or view it on GitHub
#25 (comment).
from gfa-spec.
I propose that
- The definition of segments should precede the definition of the links that are incident to those segments.
- The definition of segments and links should precede the definition of paths that reference those segments and links.
Anyone disagree?
from gfa-spec.
I dissent. I think we should not impose an ordering I'm the spec. It will
only cause pain and has no effect on the semantic content of what is
transmitted.
from gfa-spec.
Okay. How about adding it to the spec as a should rather than a must, as a recommendation rather than a requirement? (as per https://www.ietf.org/rfc/rfc2119.txt)
from gfa-spec.
Even when it doesn't affect the semantic content, I think making the format as easy to parse as possible for implementers is an important consideration.
from gfa-spec.
I think it's certainly more convenient to parse things if we ensure that one segment of a link is given before the link (e.g. a link's source is always given before that link). But I'm not sure I like requiring both ends to be defined beforehand. This does get complicated with inversions. This advantage goes away if we parse things to a map beforehand anyway.
I'm strongly opposed to the HSLP ordering. While it contains all the elements and is easy to parse by machines, to me it's harder for a human being to parse. It destroys the "graphiness" of GFA by decomposing the graph into loose sets of elements. It's nice to find a segment in the file and immediately see if it's highly connected based on how many link lines immediately follow its definition and whether it is on any paths.
from gfa-spec.
Also sorting on the source nodes of elements (S, C, L, P lines) is nice - it's easy enough to enforce alphanumeric sorting and it might simplify random access.
from gfa-spec.
SAM allows specifying in the header whether the file is ordered or not. That makes sense here too.
from gfa-spec.
Good point - if there's an equivalent to samtools sort
for gfa (and a well-defined sorting order) for GFA it seems reasonable to allow both.
from gfa-spec.
There is no gfatools sort
yet, but I think we'll need one for certain for random access.
from gfa-spec.
I'm on it - I'll push something here today. Unfortunately my javascript is too rusty to push it to gfatools proper.
from gfa-spec.
I regularly sort SAM files with the UNIX sort utility. It would be nice if GFA could be sorted with the UNIX sort utility. It's unfortunate that there's not a clear way to specify the sort order of the record types to UNIX sort.
from gfa-spec.
You could do a hacky thing where you change H
to 0
, S
to 1
, L
to 2
, etc, then sort, then convert back, but that hackery kind of defeats the benefit of using an off the shelf tool.
from gfa-spec.
I have typically sorted on the second column with sort -k2. This yields an
output in which local regions of the graph tend to occur together.
Although, I tend to use numerical IDs, which probably changes things.
from gfa-spec.
I also tend to use numerical IDs, but for simplicity I think the specified ordering should be ASCII ordered. You can always pad with 0
s at the left to get consistent numerical ordering.
from gfa-spec.
There's now a gfa_sort utility in gfakluge.
Alphanumeric sort on seg IDs, non-blocked:
./gfa_sort -i reads.gfa
or ./gfa_sort --gfa-file reads.gfa
Blocked (HSLC
) format and alphanumeric sort in each block:
./gfa_sort -i reads.gfa -b
or ./gfa_sort --gfa-file reads.gfa --block-order
Block-ordered input/output is in the core lib now too.
from gfa-spec.
I also prefer NOT to impose ordering, except the header line. Having S
lines above L
lines does make programming easier, but not having this requirement is not too bad.
from gfa-spec.
A general strict ordering may not be necessary, but I think we still should mandate the header to be on the top. I opened for this reason I different issue on that.
from gfa-spec.
It should be possible to sort
GFA without knowing the spec version; then the header lines should float to the top, and it should be easy to determine the VN
from there. That being said, I agree with the idea posed in #39 that it is nice to have the H VN
as the first line in the file.
from gfa-spec.
Yes, of course it is still possible to basically look for the VN in the file before really parsing it, but I think this should not be required. For the current files, I can parse the file in one pass (by introducing some "virtual" lines, such as S which I expect but did not came up yet), but if I have to look for the VN first, this will not be possible anymore.
from gfa-spec.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
from gfa-spec.
Related Issues (20)
- Need to specify "reference" in terms of cigar operations in overlap HOT 4
- Do two genes link together in GFA file indicate these two genes associate with each other? HOT 2
- Should a PG line (like in SAM) be codified in the spec? HOT 3
- GFA2: does not mention the encoding expected of file content (ASCII-7bit, UTF-8, etc.) HOT 1
- v1.1 is not semantically distinct from v1 HOT 2
- W lines: no description of '>' and '<' use HOT 2
- Use of GFA2 as a pangenome reference
- Representation of annotations in a GFA2/GFA3 file
- Segment names conflicts in spec
- Translocation and Inversion HOT 2
- Allow lowercase characters in hex strings
- looking for a CLI tool to produce circular candidates from GFA HOT 2
- Allow empty string value in optional field like SAM does HOT 1
- Namespace for S and P lines in GFA1 HOT 1
- Indicating that a path is circular HOT 2
- manipulating .gfa file HOT 5
- Implied adjacent objects in GFA2 groups are problematic HOT 3
- GFA2 specification does not mention optional field reserved tags HOT 4
- making path overlap cigar list optional HOT 3
- GFA has been submitted to the EDAM ontology HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gfa-spec.