Comments (11)
I'm confused as to how DNA can become reversed without being complemented. Can you give me an example?
from gfa-spec.
I am not completely clear how this is possible either.
That said we should not limit ourselves to things that we know are normal biologically. This would be like making a FASTA spec that says only biologically viable DNA sequences can be represented.
I don't think there is any problem for inversions if each node is like a virtual pair of nodes connected by a hidden link (we could imagine that this link carries the sequence label of the node). Then edges always come from one end and go to another.
You can also represent the deletion of a node without referring to its neighbors, which is very useful.
I also can't see how this would present a problem for other uses. The more simple and general we can keep things the less constraint the various uses will need to work around.
However if links have only + and - versions as they do now then we can't convey enough information to represent this.
Graphviz has dot format, which is generally able to represent any graph you can think up. It is also quite simple to make simple graphs. We should aim for this level of generality.
from gfa-spec.
These inversions happen when the molecular machinery goes wrong and it's
usually bad for you.
I don't see a clean way of representing this without adding a new operator.
We could use ~ (tilde) to denote non-complemented, so ~+ and ~- would mean
... ugh.
Do you have a pointer to the ga4gh graph discussion about this?
from gfa-spec.
I do not believe that it is possible by natural mutation to reverse a DNA sequence without also complementing it. It is not helpful to design a file format to handle cases that are not physically possible.
from gfa-spec.
I've never heard of reverse-non-complement in vivo, chemically it makes no sense since it requires breaking each individual 5' to 3' bond and flipping it. The only time i've ever seen it used is a control when searching for low information regions within repetitive sequences.
A technical artifact which arises in silico though, that's easy to see.
from gfa-spec.
From this diagram http://ghr.nlm.nih.gov/handbook/illustrations/inversion it looks like the region will be reverse-complemented.
from gfa-spec.
Yes, correct. Reverse and complemented. The -
value of the orientation field indicates reversed and complemented.
from gfa-spec.
I disagree that we should only design for things that are physically
possible. The graphs we are all working with have no natural chemical
basis. No genome will ever look like an overlap or de Bruijn graph, so a
design rule of this type would preclude everything we are doing. Maybe I am
taking the metaphor too far though :)
The use case that makes a lot of sense to me is describing the deletion of
an entire node. If we cannot describe which end edges go from and to then
this cannot be done in a node-local sense. You would need to add edges
between the inbound and outbound nodes where an intermediary has been
deleted and a path that skips it is required.
As for representing non complemented inversions, it seems correct that
another operator would be required to clarify this. I guess an extension of
the cigar concept would be sufficient? The reason for not duplicating these
as reverse complemented sequences is to enable non ambiguous alignment to
and annotation of the graph. With minor extensions to the exchange format
the inversion can be encoded in the graph without duplication.
@adamnovak, @benedictpaten, and @haussler have been strong proponents of
this idea and maybe could better clarify what I am describing.
On Jul 29, 2015 12:21 AM, "Shaun Jackman" [email protected] wrote:
Yes, correct. Reverse and complemented. The - value of the orientation
field indicates reversed and complemented.—
Reply to this email directly or view it on GitHub
#6 (comment).
from gfa-spec.
You would need to add edges between the inbound and outbound nodes where an intermediary has been deleted and a path that skips it is required.
Yes, that's correct. A deletion is represented like so:
Path 11
is AAACCCATA
Path 12
is AAAATA
S 0 AAA
S 1 CCC
S 2 ATA
L 0 + 1 + 0M
L 0 + 2 + 0M
L 1 + 2 + 0M
P 11 0+,1+,2+ 0M,0M,0M
P 12 0+,2+ 0M,0M,0M
from gfa-spec.
I disagree that we should only design for things that are physically possible.
Biology has enough weirdness as it is. Let's prioritize first handling the cases that are physically possible.
from gfa-spec.
Similarly for (RC)-inversion it can be represented directly
S 0 AAA
S 1 CCC
S 2 ATA
L 0 + 1 + 0M
L 0 + 1 - 0M
L 1 + 2 + 0M
L 1 - 2 + 0M
P 11 0+,1+,2+ 0M,0M,0M
P 12 0+,1-,2+ 0M,0M,0M
I think the case you are thinking of adding intermediate nodes does happen in de Bruijn graphs, but since you can specify 0M
as overlap you don't need them here.
from gfa-spec.
Related Issues (20)
- Need to specify "reference" in terms of cigar operations in overlap HOT 4
- Do two genes link together in GFA file indicate these two genes associate with each other? HOT 2
- Should a PG line (like in SAM) be codified in the spec? HOT 3
- GFA2: does not mention the encoding expected of file content (ASCII-7bit, UTF-8, etc.) HOT 1
- v1.1 is not semantically distinct from v1 HOT 2
- W lines: no description of '>' and '<' use HOT 2
- Use of GFA2 as a pangenome reference
- Representation of annotations in a GFA2/GFA3 file
- Segment names conflicts in spec
- Translocation and Inversion HOT 2
- Allow lowercase characters in hex strings
- looking for a CLI tool to produce circular candidates from GFA HOT 2
- Allow empty string value in optional field like SAM does HOT 1
- Namespace for S and P lines in GFA1 HOT 1
- Indicating that a path is circular HOT 2
- manipulating .gfa file HOT 5
- Implied adjacent objects in GFA2 groups are problematic HOT 3
- GFA2 specification does not mention optional field reserved tags HOT 4
- making path overlap cigar list optional HOT 3
- GFA has been submitted to the EDAM ontology HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gfa-spec.