Giter Club home page Giter Club logo

schemablocks's Introduction

Preview of GA4GH SchemaBlocks -> Now retired; new developments at schemablocks.org

A graph showing recommended basic objects and their relationships. The example attributes are placeholders for elements defined in the general schema description.

Please not: The content of this repository is being re-"invented" in schemablocks.org, as a wider GA4GH cross-workstream initiative. This site here should be considered a prototype for the new project ...

This repository contains schema "blocks" for the GA4GH project, in a collaborative effort between members of the Clinical and Phenotypic Data Capture (GA4GH::CP) and the Genomic Knowledge Standards (GA4GH::GKS) and the Discovery work streams.

Such blocks can be

  • object prototypes
  • object relations
  • documentation of data formats and standards
  • ... and probably other "things" related to the building of APIs and resources related top GA4GH

The project does not intent to build a monolithic API, but rather help to exchange usable components for creating implementations.

Currently, this site just represents skeleton schema elements, derived from the original, then monolithic GA4GH schema.

The primary documents are in the yaml directory, with JSON versions and examples extracted from them. The "readable" documentation is also created from the YAML files and can be accessed through the links below.

  • common (raw) object classes, which are used in the schemas themselves
  • biosample (raw) Most relevant "bio"data (such as diagnoses, phenotypes ...) is stored in the biosample object.
  • individual (raw) The individual object contains information which pertains to the whole biological entity biosamples are derived from (e.g. sex, heritable phenotypes...).

The "genomic" parts of the schema recommendations do not yet represent authoritative recommendations of the GA4GH::GKS group, but rather reflect extended versions of the original, VCF-derived GA4GH schema. Examples for current use of this schema are e.g. in the arraymap.org and the Beacon+ projects.

  • variant (raw) The variant object includes attributes and examples for both structural (DUP, DEL ...) and precise genome variants.
  • callset (raw) The callset object is for technoical data and series information (e.g. used platform and analysis metods). It is not strictly needed for querying combined variant + biosample aspects, since in the current implementation the variant object contains a reference to the biosample it was derived from.

schemablocks's People

Contributors

mbaudis avatar

Stargazers

Chris Roeder avatar Allison Heath avatar

Watchers

Michele Mattioni avatar  avatar James Cloos avatar Peter Robinson avatar  avatar

schemablocks's Issues

callset vs dataset

Could we have a dataset object of which a calmest would be a specific instantiation?

Status of CURIEs in SchemaBlocks

The docs mention use of CURIEs for ontology classes, but there are some invalid CURIEs used in this context, e.g

In other cases, the CURIE has been lowercased from it's standard form:

CURIEs/URIs are case-sensitive so it's important to use the standard form. I realize there is inconsistency between OBO and identifiers.org with the former using NCIT and the latter using ncit but nobody lowercases the leading c in the local fragment portion.

Also, why not extend the use of CURIEs for all identifiers, rather than just ontology classes?

Individual definition

At https://github.com/ga4gh-metadata/schemas/blob/master/main/yaml/biosample.yaml#L16 the definition of individual_id reads " In a complete data model "individual_id" represents the identifier of this biosample in the "individuals" collection."

I don't quite understand this. If we think in terms of donor_id (eg the initial patient the sample was taken from) and biosamples_id (eg SAME1234567 in EMBL-EBI BioSamples) - is this meant to be the donor_id?

Trying to align with Phenopackets, https://github.com/phenopackets/phenopacket-schema/blob/master/src/main/proto/org/phenopackets/schema/v1/core/base.proto#L108, where individual is defined as "An individual (or subject) typically corresponds to an individual human or another organism. FHIR mapping: Patient (https://www.hl7.org/fhir/patient.html)."

Data Use should be Ontology_class Array

In DUO, you are able to add secondary and primary use cases.
Also, you might want to add two or more data use cases (cancer and diabetes research for example)

If a biosample can have one data use code, should a biosample have more than one schemablock? thats a data management headache. having a set of data use codes instead of one might solve it

The "$ref" notion

IMO, "$ref" is good for defining a neat structure, but not convenient to implement. At the implementation level, I would recommend using URL as plain string.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.