Comments (9)
I'm referring to: https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/references.avdl#L48
from ga4gh-schemas.
The use of this flag is common. The chrY sequence in GRCh37 contains the PARs, but the chrY used by the 1000g has PARs hard masked. It is not the official chrY and does not have an accession number. For this 1000g sequence, isDerived
should be set. Also in GRCh37, there is a single 'R' on one chromosome. Some versions of GRCh37 have it converted to N. The md5 will be different. In the version of GRCh38 for the mapping purposes, multiple chromosomes have some centromeric regions hard masked. Some cancer groups also hard mask wrong regions in the reference genome. These are all derived sequences without official accession numbers.
from ga4gh-schemas.
I now think I understand the fields, but they are confusing as currently documented and its not clear how a new user would make use of them. I think they need improved documentation, or else need to be reworked. I'm not sure which because I don't exactly understand the role/semantics of derived References in a ga4gh repository.
The main thing I'm missing is the motivation. Is the primary purpose of these derived references to allow comparison between datasets which are aligned against an original reference and derived reference? Presumably the user may consider sourceDivergence
when deciding to allow this. Or is the purpose to avoid fetching bases for a derived reference, if you had the bases for the original? Or is this just for provenance?
Depending on the answer above, would it provide any value to have derivedFromReferenceId
in place of isDerived
, which points to the parent reference if any?
from ga4gh-schemas.
Great, this explains things much more. I will file a pull request to
improve the documentation.
On Aug 21, 2014 6:50 PM, "Heng Li" [email protected] wrote:
The use of this flag is common. The chrY sequence in GRCh37 contains the
PARs, but the chrY used by the 1000g has PARs hard masked. It is not the
official chrY and does not have an accession number. For this 1000g
sequence, isDerived should be set. Also in GRCh37, there is a single 'R'
on one chromosome. Some versions of GRCh37 have it converted to N. The md5
will be different. In the version of GRCh38 for the mapping purposes,
multiple chromosomes have some centromeric regions hard masked. Some cancer
groups also hard mask wrong regions in the reference genome. These are all
derived sequences without official accession numbers.—
Reply to this email directly or view it on GitHub
#130 (comment).
from ga4gh-schemas.
Purpose: data mapped to different derived versions of the same sourceAcession
are allowed to be jointly retrieved.
from ga4gh-schemas.
@ekg any update on a PR?
from ga4gh-schemas.
did a PR happen for this one? Is this issue ready to close?
(trying to get us ready for our v0.5 cleanup release!)
from ga4gh-schemas.
No PR and no recent comments. Closing.
from ga4gh-schemas.
thank @lh3 for point at this. This description needs to be added to the ga4gh documentation, not left in a ticket.
from ga4gh-schemas.
Related Issues (20)
- Package for CRAN
- RNA expression data structure is inefficient HOT 7
- Rename repository HOT 2
- Update Release notes for the v0.6.0a10 release
- Remove created and updated timestamps from API HOT 4
- Add peer service human readable docs HOT 1
- Document maven release process HOT 1
- Move datamodel to its own repo
- Improve development.rst
- Content Type Negotiation
- Implement updated transcript effects protocol
- Deprecate reference ID in favor of reference name or accession ID HOT 1
- Recreate assay metadata HOT 2
- Update Java Protobuf Dependency to 3.1+
- protobuf java square write code-gen HOT 3
- Change booleans to enums
- Assay Metadata for Analysis object table is broken in documentation...
- GeoLocation attributes names HOT 1
- ListReferenceBasesRequest GET or POST HOT 1
- AnalysisResult scores
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ga4gh-schemas.