Giter Club home page Giter Club logo

biolink-model's People

Contributors

actions-user avatar andrewsu avatar balhoff avatar bbopjenkins avatar beasleyjonm avatar caseyta avatar cbizon avatar cmungall avatar colleenxu avatar deepakunni3 avatar dependabot[bot] avatar diatomsrcool avatar evandietzmorris avatar gaurav avatar gloriachin avatar hsolbrig avatar karafecho avatar kevinschaper avatar kevinxin90 avatar kshefchek avatar lhannest avatar mbrush avatar nicholsn avatar nlharris avatar richardbruskiewich avatar sierra-moxon avatar vemonet avatar vincentvialard avatar yaphetkg avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biolink-model's Issues

Add evidence slot to the top level 'association'

At present the slots tied to the top level association type in the BLM include the following:

-  slots:
    - association type
    - subject
    - negated
    - relation
    - object
    - qualifiers
    - publications
    - provided by

Seems like a slot to capture evidence should also be included here. There is a has_evidence predicate in the blm, but it is not clear if this is meant to capture ECO codes, or actual evidence data, or both. The range of the has_evidence slot is 'evidence instance' - and I'm not sure what this is. For now probably simplest to allow the has_evidence slot to capture ECO codes or actual evidence. Thoughts @cmungall? If agreed, I can update the documentation in the BLM yaml.

show mandated prefixes for translator

From Eric D in gdoc

id [required]: MUST be a CURIE, MUST use translator-mandated prefix <- where is the list of these

currently in yaml, need to better expose

biological process and molecular activity

BLM contains concepts for "biological process" and "molecular activity". It also contains entries that are unions of other entries. Can we add an entry that is "biological process and molecular activity"? We have found that grouping those two concepts together is useful in building COPs. The biolink function returning "function" returns both of these types, so it would also be useful in annotating that service.

UPHENO mapping for phenotypic feature

Phenotypic feature is currently mapped to UPHENO:0000001, but I think that's a mistake. UPHENO:0000001 is a property "has phenotype affecting". Probably the desired mapping is UPHENO:0001001

Questions about provided_by and is_defined_by slots

Creating ticket here to continue discussion in comments of the TKG Spec here, concerning labels of the following edge properties:

  • is_defined_by [required]: A CURIE/URI for the translator group that made the KG
  • provided_by [required]: A CURIE prefix, e.g. Pharos, MGI, Monarch. The group that curated/asserted the edge.

Comment in the gdoc was to consider names for these properties again - e.g. switching the labels so that is_defined_by describes the primary source that originally defined or asserted the claim, and provided_by describes the Translator group that provided the association to the uber graph.

A follow up comment asked if it made any sense to make these more fine-grained? We have an edge property "source" where we store the specific function in our code that produced the edge. But I could imagine putting a curie prefix on that so that is_defined_by becomes something like GAMMA:uberongraph.get_anatomy_by_cell_graph. Ditto for provided_by. Currently we have a property called URL where we put the url we retrieved the info from (for url-derived edges)

And a final comment offered: say you have a relation provided by drugbank, who in turn obtained it from a publication. who should be credited in the provided_by field?

Clarify predicate v relation distinction and change field name to 'edge label'

In Translator we have the notion of minimal and maximal predicates. We had previously mapped minimal to 'predicate' and maximal to 'relation' but this is confusing.

Proposal is to keep relation as the true relationship type, at arbritrary specificity, using a CURIE if available.

For the 'min predicate' used edge_label. This is analogous to node labels in Neo4J. This is a human readable snake_case grouping for the relation

Clarify relationship with / map to DATS

https://github.com/biocaddie/WG3-MetadataSpecifications

Broadly speaking, these two are orthogonal and complementary. There are some linkage points, e.g

But the focus of DATS is on the intrinsic properties of the entities rather than linkages between the entities. We should write documentation that clarifies this, and map entity types where required.

Also: clarify relationship to bioschemas (#3)

Is the associations in BioLink model going to distinguish experimentally-validated results vs ML prediction results

Hi,

I'm wondering is the associations in BioLink model going to distinguish experimentally validated results against computational predicted results?

One example would be the 'ChemicalToGeneAssociation' (http://bioentity.io/vocab/ChemicalToGeneAssociation).

There are two available API endpoints:

  1. https://www.ebi.ac.uk/chembl/api/data/target_prediction
  2. http://www.dgidb.org/api/v2/interactions.json?drugs={drugname}

They both contain information about chemical2gene associations, but the first one is from computational predicted results, while the second one is from experimentally validated results.

Is the BioLink model going to regard these two cases as the same association or not?

Thanks!

Define standard serializations

todo: document this. Notes below

The biolink model is intended to be independent of any one serialization format or database technology. This adds an extra layer of abstraction when using for data exchange.

There are a few orthogonal choices here for exchanging links/associations

  1. Use the generic Association class or a subclass, e.g. G2T
  2. JSON or RDF/graph format
  3. Which evidence model to mix in

For RDF exchange, a reality is that there is multiple reification standards. The reference one for us is OBAN, so we could define this as core and provide RDF shapes to check this.

But for many a JSON is most convenient, so the generic association class in the json-schema is best for general exchange.

Semantic Type for Reactome Complex

Hi,

This question specifically regards the BioLink Semantic types. The example is a Reactome complex, e.g.R-HSA-5674003. A Reactome complex might be a combination of proteins, chemical compounds, etc. How would BioLink model assign semantic types for these biological entities, which could potentially be a mixture of multiple different biological entities?

Thanks!

New predicate proposals for Translator Min Predicate Set

Alignment of additional knowledge sources (beyond the original 5 reasoner KGs that informed the initial iteration of the ~40 predicate set here) has suggested ~20 additional predicates to add.

We would add these to the biolink-model.yaml file, alongside the initial set of predicates that have already been added. Here, predicates that are part of the minimal Translator set will be flagged using the 'subset' slot with the value "translator_minimal". This will allow consumers of the yaml to find the set of slots in this standard, and also enable derivation of a biolink-github.io web page that presents the hierarchy of only these predicates (like this one for blm types).

The hierarchy below presents the proposed new predicates (in bold) in the context of the hierarchy of predicates in the existing minimal set. Parentheticals explain the meaning and/or source requiring each new term.


  • interacts_with (grouping term for interaction predicates)

    • directly_interacts_with
      • molecularly_interacts_with
    • genetically_interacts_with (gene - gene, for BioGrid via Monarch)
  • coexists_with

    • co-localizes_with (gene/product -gene/product, for QuickGO via Gamma)
    • in_pathway_with
    • in_complex_with
    • in_cell_population_with
  • affects

    • regulates
      • positively_regulates
      • negatively_regulates
    • has_affected_sequence_feature
    • disrupts
    • treats
  • participates_in

    • input_of
    • output_of
  • has_participant

    • has_input
    • has_output
  • overlaps (new for Monarch - make parent of part of and has part)

    • part_of
    • has_part
  • is_homologous_to

    • is_parologous_to (Monarch)
    • is_orthologous_to (Monarch)
    • is_xenologous_to (Monarch)
  • affects risk for
    - prevents

  • contributes_to

    • causes
  • is_correlated_with
    - has_biomarker
    - is_biomarker_for

  • expressed_in

  • expresses (anatomy to gene, inverse of expressed_in, from HetNet via Gamma)

  • occurs_in (GO/QuickGO)

  • is_located_in (Wikidata)

  • is_location_of (SemMedDB)

  • is_model_of (for Monarch / MODs)

  • derives_from (for Monarch)

  • produces (for WD - between producing entity/agent and the product or material produced)

  • enables (for Monarch/GO, and WD/GO)

  • same_as (for WD exact match, and owl same_as in Monarch, etc)

  • in_taxon (Monarch, Wikidata)

  • has_gene_product

  • has_phenotype

  • manifestation_of

  • treated_by

  • precedes

  • derives_into

  • subclass_of


If there are questions about meaning/utility/name of any of the new proposed predicates, make comment here, or create new ticket if you anticipate prolonged debate.

Also, note that predicates for gene-disease associations are not included here - and are addressed in the ticket #52.

Why not identifiers?

My understanding is that the BLM will not have URI or curie style identifiers for its elements. Can somebody explain why? The advantage of having identifiers in my mind is that we're no longer stuck with particular labels. If we decide that "molecular activity" should be called "molecular function" to bring it in line with GO, then we can do that with impunity because the identifier would not change.

Note that I am not suggesting that the BLM has to have all BLM:00001 type identifiers. I think it would be entirely reasonable to use identifiers from other systems (like RO or SIO or whatever is appropriate for a given identity but choosing a single best identifier for each concept).

Inverse predicates (and their use in Translator KGs)

Different knowledge sources often assert associations in different directions. For example, the gene expressed_in anatomy, vs anatomy expresses gene.

On the 4-30 KG Standardization call, it was proposed that rather than enforcing such associations to always be made in one direction in KGs, such that only a single predicate is needed, we would allow assertion in either direction, and create the inverse predicates. A has_inverse slot in the biolink model will be used to indicate inverse predicates, and allow normalization to one direction when required.

The convention here will be to add the inverse_of statement in the blm only on the predicate representing the 'canonical' direction (which will need to be decided for each such pair of predicates. In this way we mark the canonical direction that is preferred for normalization.

Looking for feedback in this proposal before it gets implemented in the biolink-model.yaml file.

Wikidata predicate relations

The Semantic Medline Database and Wikidata use a set of predicates which should perhaps be added to the Biolink Model (perhaps, via the Translator predicate harmonization effort?). Here is the list of interest (Wikidata wd: curies given):

nse Body
[
{
"id": "wd:P3356",
"name": "positive diagnostic predictor",
"definition": ""
},
{
"id": "wd:P129",
"name": "physically interacts with (in molecular biology)",
"definition": ""
},
{
"id": "wd:P279",
"name": "subclass of",
"definition": ""
},
{
"id": "wd:P276",
"name": "location",
"definition": ""
},
{
"id": "wd:P1557",
"name": "manifestation of",
"definition": ""
},
{
"id": "wd:P361",
"name": "part of",
"definition": ""
},
{
"id": "wd:P156",
"name": "followed by",
"definition": ""
},
{
"id": "wd:P1056",
"name": "product",
"definition": ""
},
{
"id": "wd:P2888",
"name": "exact match",
"definition": ""
},
{
"id": "wd:P2175",
"name": "medical condition treated",
"definition": ""
},
{
"id": "wd:P2283",
"name": "uses",
"definition": ""
},
{
"id": "wd:P1542",
"name": "cause of",
"definition": ""
},
{
"id": "wd:property_id",
"name": "",
"definition": ""
},
{
"id": "kb:P2176",
"name": "drug used for treatment",
"definition": ""
},
{
"id": "wd:P703",
"name": "found in taxon",
"definition": ""
},
{
"id": "wd:P688",
"name": "encodes",
"definition": ""
},
{
"id": "wd:P684",
"name": "ortholog",
"definition": ""
},
{
"id": "wd:P682",
"name": "biological process",
"definition": ""
},
{
"id": "wd:P681",
"name": "cell component",
"definition": ""
},
{
"id": "wd:P680",
"name": "molecular function",
"definition": ""
},
{
"id": "wd:P3433",
"name": "biological variant of",
"definition": ""
},
{
"id": "wd:P31",
"name": "",
"definition": ""
},
{
"id": "wd:P2293",
"name": "genetic association",
"definition": ""
},
{
"id": "wd:P1552",
"name": "has quality",
"definition": ""
},
{
"id": "wd:P128",
"name": "regulates (molecular biology)",
"definition": ""
}
]

Predicates for gene-condition associations

In recent Translator Knowledge Graph (tkg) standardization calls we reviewed different approaches for creating predicates linking genes directly to disease. The proposal below would create a set of predicates for connecting genes to diseases separate from those used to connect variants to conditions. See column G in the spreadsheet here, starting at row 14, to get a sense of the Reasoner-requested predicates that informed the predicates in this proposal.

Approach 1

The proposed predicates below describe important ways genes are related to conditions, as informed by predicates used in one or more Reasoner/Translator KG. In reality most are shortcuts for the fact that a variant or a product of the gene is related to the condition in the indicated way. These 'shortcuts' are needed because many KSs and KGs don’t represent gene variants or products, and wish only to associate genes directly to a condition to which their variants or products contribute.

gene_associated_with_condition   
      gene_mutations*_contribute_to                        
            gene_mutations*_causal_for                
            gene_mutations*_affect_risk_for                
      gene_regulation_correlates_with                        
      gene_activity_contributes_to        
      gene_product_is_therapeutic_target_for             
  
*for labels, instead of 'gene_mutations' consider  'gene_alterations'?, 'gene_variants'?, 'gene_alleles'?

Requirements for many of these predicates come from Team IR and their GNBR resource - again for specifics see column G in the spreadsheet here, starting at row 14. Team Xray would likely use the generic top level predicate here to map to their 'gene_associated_with' predicate, but may also have a use case for the gene_mutation_contributes_to_condition predicate. Monarch would likely also use the gene_mutations_contribute_to predicate as their gene-disease associations are inferred across causal variants.

In addition to the predicates above, the following would be created to link things like variants or exposures to conditions. These would be used in KGs where the variant/allele is represented. Here we propose using relatively generic predicates such as 'causes' instead of 'causes_condition', which are not specific for variant as domain and condition as range. These predicates describe direct causal/correlation relationships - as it is the variant that is indeed doing the causing or correlating.

contributes_to
     causes
correlated_with
     biomarker_for
affects_risk_for

Pros of this Approach

  • semantically correct and clear predicates with simple mappings to KG predicates in use by reasoners.
  • predicates provide precise semantics for traversal and reasoning.
  • the gene-condition relations above could all be derivable from the variant-condition associations through inference (e.g. property chains) - provided that the variants are connected to the genes they affect. The would allow interoperation of KGs that do and don’t capture this more normalized/granular pattern.

Cons of this Approach

  • use of separate predicates for genes and variants results in larger number of predicates (but really only three are 'duplicated' in the proposal above).

An orthogonal consideration for this approach concerns the granularity of the proposed gene-condition predicates - i.e. within this approach of using separate predicates for gene-condition vs variant-condition relationships, perhaps can we merge some predicates where the precision/distinction between them is not required at the level of the minimal spec.

CURIE identifiers for biolink model entries

Although the name: fields of the model are both unique and human friendly, there are still likely to be some programmatic instances in which a more concisely specified globally unique CURIE will be more convenient and efficient than multi-word, space delimited, variable length (and sometimes long) Biolink Model ontology term names. For example, when encoding Biolink Model specific semantic data types and predicates as input parameters to API services or as the field values of JSON outputs, CURIEs are probably more concise but easily looked up by programs using an indexed Biolink model read into memory.

inconsistencies introduced by mappings should be surfaced when running validation on the model [was: Curies are mapped onto multiple categories]

SIO:010004 maps to both molecular entity and chemical substance
SIO:010450 maps to both transcript and RNA product
SIO:010046 maps to anatomical entity, macromolecular complex, and gross anatomical structure
GENO:0000512 maps to both allele and sequence variant
WD:Q4936952 maps to both anatomical entity and gross anatomical structure

Mappings should be functions: each curie should map onto a single category. We could set up Travis CI to check these sorts of things on every pull request.

literature co-occurence slot

We'd like store edges in a knowledge graph indicating that two entities are both mentioned in a particular article (or articles). Appearing together in the same abstract is interesting, but I don't think it's really appropriate to say that the two entities are "associated" or "interacting", so I'd like a slot that is used specifically for literature co_occurence.

Proposed label changes for a few Translator min predicates

Proposing a few minor changes to labels used for min predicates - mainly to maintain internal consistency of naming principles and assure clear distinction between terms as new predicates get added. And in some cases simply shorten/simplify labels where feasible.

  1. directly_interacts_with -> physically_interacts_with (to better distinguish from its new sibling ‘genetically_interacts_with')
  2. has_affected_sequence_feature -> affects_sequence_feature (a bit cleaner with shorter label) . . . or just eliminate this and use the more generic 'affects' predicate for these edges)

Questions for G2P schema - relationship type, nested schemas

In the G2P schema the range for relationship is RelationshipType. However, RelationshipType does not contain any fields/slots, where I would expect at least id and label. Or am I misinterpreting the spec?

Should certain fields contain nested schemas, e.g.

publications = fields.Nested(PublicationSchema, many=True)

instead of

publications = fields.Str()

Does provider need fields? https://biolink.github.io/biolink-model/docs/Provider.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.