Comments (19)
from gfa-spec.
@ekg @ggonnella @jts @lh3 @pb-jchin @rchikhi @richarddurbin @rrwick @sjackman @thegenemyers
Your vote on this proposal is appreciated.
from gfa-spec.
This text was added to address Erik's use case of walks. His use case specifically included sorting the path records by their components, and so I believe this text does not address his use case, and as such should be removed.
Erik's use case should I feel be addressed with a new walk record type. Heng Li proposed such a record at #47 (comment). I suggest we define a walk record type in a future backward-compatible revision of GFA (e.g. GFA 2.1), with input from Erik and that specifically addresses his use case.
from gfa-spec.
I vote in favour of this proposal!
from gfa-spec.
I am not against your proposal of deleting the quoted text (i.e. I am rather neutral on this).
As GFA2 allows for user-specific line types (ignoring by a core parser anything which is not starting with the standard record type specifiers), we should decide if the W line shall be an application-specific line or if it shall really be included in a future specification.
from gfa-spec.
I'm inclined to defer that question to a future date. More immediately I think it's quite important that we resolve that the meaning of a GFA file should not depend on the order of its lines. The walk record could start as application specific, and later be proposed using a pull request for adoption by the standard. The benefit of an extensible standard is that we don't have to nail down every possible application right away.
from gfa-spec.
from gfa-spec.
Gene,
Using your 'blue' example, if this is the full graph structure...
...then this path is V3 -> V4 -> V5 -> V1 -> V2 -> V3:
O blue V3 V4 V5
O blue V1 V2 V3
...and this path is V1 -> V2 -> V3 -> V3 -> V4 -> V5:
O blue V1 V2 V3
O blue V3 V4 V5
Both paths are valid, so it seems like the definition does depend on the line order. I guess I'm unsure of what you mean by 'work just fine'. Am I missing something?
from gfa-spec.
from gfa-spec.
In graphs that have been disambiguated using paired-end (or other long-range information), paths may contain cycles. A typical example is an untangled repeat. The paths of vertices A X B Y C
and A Y B X C
are both valid and quite different. If the path record is recorded in five lines, then disturbing the order of those lines (by for example sorting) changes the path.
Just as it's possible to concatenate two FASTA files, I think it should be possible to concatenate two GFA files to get the union of those two graphs. If those two files unexpectedly share a path ID and the user doesn't realize, I think it should give an error of the duplicated path ID, not silently concatenate the two paths into a single path. Imagine if FASTA implementations silently concatenated any two sequences with the same ID.
from gfa-spec.
from gfa-spec.
The discussion boils down to whether the following lines
O blue A B
O blue B C
would encode the path A ->B ->B -> C (1) or the path A ->B -> C (2).
The spec seems to leave some room for interpretation here..
from gfa-spec.
Anyhow I vote in favor of the proposal. It wouldn't be very elegant if the order of the lines inside a GFA2 file mattered just because of those O/U-lines.
from gfa-spec.
I understand better now, but even if separate lines of a path must share a segment (as Gene and Rayan said) the line order can still matter.
Here's the simplest case I can think of:
This path is V1 -> V2 -> V1 -> V3 -> V1:
O blue V1 V2 V1
O blue V1 V3 V1
And this path is V1 -> V3 -> V1 -> V2 -> V1:
O blue V1 V3 V1
O blue V1 V2 V1
Complexities like this feel awkward, so I'm still in favour of the proposal to require groups to be defined on one line.
from gfa-spec.
My count of the votes is
Strike the text: @rchikhi @rrwick @sjackman
Keep the text: @pb-jchin @thegenemyers
Abstain: @ggonnella
@lh3 @richarddurbin Do you wish to vote on this proposal?
Unless anyone else speaks up, the proposal to strike the text is accepted. We can of course address this use case again in a future proposal.
from gfa-spec.
I'm fine with moving the multiline path discussion to a non-formal descriptive text. A multiline path could be useful, but I think its format needs more discussion.
from gfa-spec.
I've added the following text:
Note: It was discussed whether U/O-lines with the same name could be considered to be concatenated together in the order in which they appear (see #54 and #47). This multi-line path format was not included in the current version of this specification, but if people want to explore use of this structure, they can do so using a different single letter record code.
from gfa-spec.
from gfa-spec.
If we allow a multiline path at some point, one could code the order with an optional tag, only included in multilne paths. I do not see any disadvantages in that case, other than some more validation checks (is the tag included in all lines with the same ID? are the tags values all different? is the encoded path resulting from the concatenation a valid one?).
from gfa-spec.
Related Issues (20)
- Need to specify "reference" in terms of cigar operations in overlap HOT 4
- Do two genes link together in GFA file indicate these two genes associate with each other? HOT 2
- Should a PG line (like in SAM) be codified in the spec? HOT 3
- GFA2: does not mention the encoding expected of file content (ASCII-7bit, UTF-8, etc.) HOT 1
- v1.1 is not semantically distinct from v1 HOT 2
- W lines: no description of '>' and '<' use HOT 2
- Use of GFA2 as a pangenome reference
- Representation of annotations in a GFA2/GFA3 file
- Segment names conflicts in spec
- Translocation and Inversion HOT 2
- Allow lowercase characters in hex strings
- looking for a CLI tool to produce circular candidates from GFA HOT 2
- What do P lines with zero, one or two Segment ids mean in GFA v1? HOT 11
- Namespace for S and P lines in GFA1 HOT 1
- Indicating that a path is circular HOT 2
- manipulating .gfa file HOT 5
- Implied adjacent objects in GFA2 groups are problematic HOT 3
- GFA2 specification does not mention optional field reserved tags HOT 4
- making path overlap cigar list optional HOT 3
- GFA has been submitted to the EDAM ontology HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gfa-spec.