Giter Club home page Giter Club logo

Comments (19)

richarddurbin avatar richarddurbin commented on June 22, 2024 4

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

@ekg @ggonnella @jts @lh3 @pb-jchin @rchikhi @richarddurbin @rrwick @sjackman @thegenemyers
Your vote on this proposal is appreciated.

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

This text was added to address Erik's use case of walks. His use case specifically included sorting the path records by their components, and so I believe this text does not address his use case, and as such should be removed.

Erik's use case should I feel be addressed with a new walk record type. Heng Li proposed such a record at #47 (comment). I suggest we define a walk record type in a future backward-compatible revision of GFA (e.g. GFA 2.1), with input from Erik and that specifically addresses his use case.

from gfa-spec.

rrwick avatar rrwick commented on June 22, 2024

I vote in favour of this proposal!

from gfa-spec.

ggonnella avatar ggonnella commented on June 22, 2024

I am not against your proposal of deleting the quoted text (i.e. I am rather neutral on this).

As GFA2 allows for user-specific line types (ignoring by a core parser anything which is not starting with the standard record type specifiers), we should decide if the W line shall be an application-specific line or if it shall really be included in a future specification.

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

I'm inclined to defer that question to a future date. More immediately I think it's quite important that we resolve that the meaning of a GFA file should not depend on the order of its lines. The walk record could start as application specific, and later be proposed using a pull request for adoption by the standard. The benefit of an extensible standard is that we don't have to nail down every possible application right away.

from gfa-spec.

thegenemyers avatar thegenemyers commented on June 22, 2024

from gfa-spec.

rrwick avatar rrwick commented on June 22, 2024

Gene,

Using your 'blue' example, if this is the full graph structure...

screen shot 2017-01-14 at 12 12 42 am

...then this path is V3 -> V4 -> V5 -> V1 -> V2 -> V3:

O	blue	V3 V4 V5
O	blue	V1 V2 V3

...and this path is V1 -> V2 -> V3 -> V3 -> V4 -> V5:

O	blue	V1 V2 V3
O	blue	V3 V4 V5

Both paths are valid, so it seems like the definition does depend on the line order. I guess I'm unsure of what you mean by 'work just fine'. Am I missing something?

from gfa-spec.

thegenemyers avatar thegenemyers commented on June 22, 2024

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

In graphs that have been disambiguated using paired-end (or other long-range information), paths may contain cycles. A typical example is an untangled repeat. The paths of vertices A X B Y C and A Y B X C are both valid and quite different. If the path record is recorded in five lines, then disturbing the order of those lines (by for example sorting) changes the path.

Just as it's possible to concatenate two FASTA files, I think it should be possible to concatenate two GFA files to get the union of those two graphs. If those two files unexpectedly share a path ID and the user doesn't realize, I think it should give an error of the duplicated path ID, not silently concatenate the two paths into a single path. Imagine if FASTA implementations silently concatenated any two sequences with the same ID.

from gfa-spec.

thegenemyers avatar thegenemyers commented on June 22, 2024

from gfa-spec.

rchikhi avatar rchikhi commented on June 22, 2024

The discussion boils down to whether the following lines

O blue A B
O blue B C

would encode the path A ->B ->B -> C (1) or the path A ->B -> C (2).
The spec seems to leave some room for interpretation here..

from gfa-spec.

rchikhi avatar rchikhi commented on June 22, 2024

Anyhow I vote in favor of the proposal. It wouldn't be very elegant if the order of the lines inside a GFA2 file mattered just because of those O/U-lines.

from gfa-spec.

rrwick avatar rrwick commented on June 22, 2024

I understand better now, but even if separate lines of a path must share a segment (as Gene and Rayan said) the line order can still matter.

Here's the simplest case I can think of:
screen shot 2017-01-14 at 8 46 15 am

This path is V1 -> V2 -> V1 -> V3 -> V1:

O	blue	V1 V2 V1
O	blue	V1 V3 V1

And this path is V1 -> V3 -> V1 -> V2 -> V1:

O	blue	V1 V3 V1
O	blue	V1 V2 V1

Complexities like this feel awkward, so I'm still in favour of the proposal to require groups to be defined on one line.

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

My count of the votes is
Strike the text: @rchikhi @rrwick @sjackman
Keep the text: @pb-jchin @thegenemyers
Abstain: @ggonnella
@lh3 @richarddurbin Do you wish to vote on this proposal?
Unless anyone else speaks up, the proposal to strike the text is accepted. We can of course address this use case again in a future proposal.

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

I'm fine with moving the multiline path discussion to a non-formal descriptive text. A multiline path could be useful, but I think its format needs more discussion.

from gfa-spec.

sjackman avatar sjackman commented on June 22, 2024

I've added the following text:

Note: It was discussed whether U/O-lines with the same name could be considered to be concatenated together in the order in which they appear (see #54 and #47). This multi-line path format was not included in the current version of this specification, but if people want to explore use of this structure, they can do so using a different single letter record code.

from gfa-spec.

thegenemyers avatar thegenemyers commented on June 22, 2024

from gfa-spec.

ggonnella avatar ggonnella commented on June 22, 2024

If we allow a multiline path at some point, one could code the order with an optional tag, only included in multilne paths. I do not see any disadvantages in that case, other than some more validation checks (is the tag included in all lines with the same ID? are the tags values all different? is the encoded path resulting from the concatenation a valid one?).

from gfa-spec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.