biothings / pending.api Goto Github PK

Set of standalone APIs built with the BioThings SDK for the Translator Project

License: Apache License 2.0

Python 67.59% HTML 21.95% JavaScript 8.49% Dockerfile 0.24% Shell 0.05% Smarty 0.30% Jupyter Notebook 1.37%

api bioinformatics biothings translator webservice

pending.api's Introduction

This repository maintains a set of biomedical knowledgebase APIs built with the BioThings SDK. These APIs are either the Knowledgebase APIs built for the Translator Project or the "pending" APIs to be integrated into the official BioThings APIs (e.g. MyGene.info, MyVariant.info, MyChem.info, MyDisease.info etc.)

The list of Translator-associated knowledgebase APIs are hosted at: https://biothings.ncats.io.

There are additional APIs are hosted at https://pending.biothings.io.

Knowledgebase APIs for the Translator Project

Each knowledgebase API is created as a "data plugin" (see examples under plugins folder). The BioThings SDK package will then process the data plugin and turn it into a hosted "BioThings API". You can follow the tutorial of the data plugin for more details.

How to add a new API

Our internal developement team will handle the process of adding a new data plugin and deploying it as a new API. For our internal developers, please follow this documentation

How to update data for an existing API

For external collaborators who have submitted their "data plugins" as new APIs, you can follow this workflow to request a update of your data:

https://github.com/biothings/biothings_explorer/blob/main/docs/README-maintaining-a-data-source.md

The documentation is maintained at the biothings_explorer repository, as each knowledgebase API will be integrated into the BioThings Explorer application become a Translator's standard KP (Knowledge Provider) API.

pending.api's People

Contributors

Stargazers

Watchers

Forkers

polyg314 erikyao nikkibytes kannabhargav aojesanmi bettyli037 chevvak2 mostafa5000700 cwoskoski pahmadi8740

pending.api's Issues

update SemMedDB APIs

splitting out SemMedDB from the broader issue of updating APIs on pending #25

In addition to updating to the latest data files, we will remove the logic to convert to biolink model from the parser, leaving that to be done in the smartAPI mapping.

update text-mining targeted association API

Data Plugin: https://github.com/UCDenver-ccp/text_mining_targeted_association
API: https://biothings.ncats.io/text_mining_targeted_association

Animation errors in the index page when switched to Vuex 3.6.2

Related issue: #50

We have switched to Vuex 3.6.2 so some animation code may not work anymore. A downgrade to an older v3 version could be a quick fix.

update pending dgidb API

BTE is using https://pending.biothings.io/dgidb quite a bit, and I believe it hasn't been updated since it was made (in 2018/2019?).

We are interested in updating BOTH the parser + data for this API. This API is using the biolink-model predicates in the association.edge-label field....so we may want to remove this or map the dgidb relations to the most recent version of the biolink-model.

New EBI G2P gene2phenotype API

EBI gene2phenotype (https://www.ebi.ac.uk/gene2phenotype)

EBI gene2pheontype (or G2P) is a knowledge source providing human-curated gene-to-disease associations with a focus on cancer and developmental disease areas. Each G2P entry associates an allelic requirement and a mutational consequence at a defined locus with a disease entity. A confidence level and evidence link are assigned to each entry. The data file is available to download at:

https://www.ebi.ac.uk/gene2phenotype/downloads.

Columbia Open Health Data

source: https://u.pcloud.link/publink/show?code=XZ3ibtkZlJt9yHczhp5NjlGkui6E4uvTXL7y

The link contains a zip file, we only need to parse the "paired_concept_counts_associations" file.

type: EHR records

Manual dumper

First generate a static JSON/pickle file to store xrefs for all omop IDs (primary id used in omop)
Use this endpoint: http://cohd.smart-api.info/#/OMOP/xrefFromOMOP
may also consider this one: http://cohd.smart-api.info/#/OMOP/mapToStandardConceptID

Need to create two APIs based on this "paired_concept_counts_associations" file.

First API

Retrieves observed clinical frequencies of all pairs of concepts given a concept id.
sort the result, and keep only the first 100
Similar to this endpoint: http://cohd.smart-api.info/#/Clinical%20Frequencies/associatedConceptFreq
need to add xrefs field for both the primary concept(_id) as well as all associated concepts (outputs)

{
  "_id": "...",
  "xrefs": {
      "umls": ...,
      "mesh": ...,
  },
  "results": [
    {
      "associated_concept_id": 2213216,
      "associated_concept_name": "Cytopathology, selective cellular enhancement technique with interpretation (eg, liquid based slide preparation method), except cervical or vaginal",
      "associated_domain_id": "Measurement",
      "concept_count": 330,
      "concept_frequency": 0.0001843131625848748,
      "concept_id": 192855,
      "dataset_id": 1
    },
    {
      "associated_concept_id": 4214956,
      "associated_concept_name": "History of clinical finding in subject",
      "associated_domain_id": "Observation",
      "concept_count": 329,
      "concept_frequency": 0.00018375463784976913,
      "concept_id": 192855,
      "dataset_id": 1
    }
   ]
}

Second API

Retrieves observed clinical frequencies of a pair of concepts.
Similar to this endpoint: http://cohd.smart-api.info/#/Clinical%20Frequencies/pairedConceptFreq
need to add xrefs for both concept1 and concept2
use form concept1-concept2 to represent _id

{
  "_id": "concept1-concept2",
  "concept1": {
     "omop": "....",
     "xrefs": {
         "mesh": ...,
         "umls": ....
     }
  },
  "concept2": {
     "omop": "....",
     "xrefs": {
         "mesh": ...,
         "umls": ....
     }
  },
  "results": [
    {
      "concept_count": 10,
      "concept_frequency": 0.000005585247351056813,
      "concept_id_1": 192855,
      "concept_id_2": 2008271
    }
  ]
}

New API for BioStacks

BioStacks is a new source for text-mined Heterogeneous association network from Dr. Larry Hunter's group. Need to work with his team when it's ready.

fix slug for mrcoc API to "cooccurrence"

Currently, http://pending.biothings.io/mrcoc generates example URLs like this:

http://pending.biothings.io/mrcoc/coocurence/D001055-D003397

These give a 404 error because the correct URL should be this (notice spelling change in coocurence / cooccurence )

http://pending.biothings.io/mrcoc/cooccurence/D001055-D003397

But as long as we are fixing things, we ideally would use the correct spelling to cooccurrence

http://pending.biothings.io/mrcoc/cooccurrence/D001055-D003397

new API for functional gene sets from MSigDB

http://software.broadinstitute.org/gsea/msigdb/index.jsp

Switch the rendering for hostname "biothings.ci.transltr.io"

Currently we have 2 renderings of sites, "pending" and "ncats". They differs in aesthetics, yet sharing the same backend.

Previously, the hostname-to-rendering mapping is:

hostname	site rendering
biothings.ncats.io	"ncats"
pending.biothings.io	"pending"

Now we have a new hostname biothings.ci.transltr.io, which should uses "ncats" (temporarily):

hostname	site rendering
biothings.ncats.io	"ncats"
biothings.ci.transltr.io	"ncats"
pending.biothings.io	"pending"

Related module: web/handlers/__init__.py

new API for Text Mining Provider

Dumper

Create a dumper looking for newest release in this folder (currently only one release)

Parser

primary source file
use column "subject" as the primary id
use column "edge_label" as the root key
rows with same "subject" field should appear in the same record
rows with the same "subject" and "edge_label" should be in the same list as the value for that "edge_label"
reference secondary source filie based on evidence id

Example output

{
    "_id": "PR:000010159",
    "expressed_in": [
        {
            "uberon": "UBERON:0002355",
            "relation": "RO:0002206",
            "association_type": "GeneToExpressionSiteAssociation",
            "evidence": {
                "sentence": "...",
                "pmc": "PMC324396"
            }
        },
        {
            "uberon": "UBERON:0003066",
            "relation": "RO:0002206",
            "association_type": "GeneToExpressionSiteAssociation",
            "evidence": {
                "sentence": "...",
                "pmc": "PMC324396"
            }
        }
    ]
}

update ontology apis?

However, we may be able to annotate ontology lookup service (website) (smartapi entry) to retrieve this kind of information?

The following pending BioThings APIs are used by BTE (but aren't called upon often):

Gene Ontology Biological Process API
Gene Ontology Cellular Component API
Gene Ontology Molecular Activity API
Human Phenotype Ontology API
UBERON Ontology API

New data source for ICD and CPT codes from CMS

It would be useful to get relationships between medical procedures (expressed as CPT codes) and diseases (expressed as ICD-10 codes) from a more authoritative source than semmeddb, I found that we can find many of these links from the Centers for Medicare &Medicaid Services. For example, the CMS article on Cataract Extraction (https://www.cms.gov/medicare-coverage-database/view/article.aspx?articleid=56453) has sections for both ICD-10 codes and CPT codes.

To get this information in a structured format, they provide database dumps on their downloads page. Specifically, the file for Current Articles appears to be a zip file of CSV exports of some database tables. So if we search article_x_icd10_covered.csv for articles related to H25.89 ("Other age-related cataract"), we get several articles including 56453 (the article linked above):

$ gawks '$3=="\"H25.89\""' article_x_icd10_covered.csv
"56453","8","H25.89","11","1","N","2021-12-29 15:58:03","8401","Other age-related cataract","N"
"56453","8","H25.89","11","3","N","2021-12-29 15:58:04","8401","Other age-related cataract","N"
"56544","19","H25.89","7","1","N","2019-12-19 13:07:44","8233","Other age-related cataract","N"
"56544","19","H25.89","7","2","N","2019-12-19 13:07:44","8233","Other age-related cataract","Y"
"56549","12","H25.89","7","1","N","2019-09-11 17:42:02","8233","Other age-related cataract","N"
"56613","13","H25.89","11","1","N","2021-12-29 16:38:58","8401","Other age-related cataract","N"
"56615","31","H25.89","11","1","N","2021-10-29 12:40:48","8401","Other age-related cataract","N"
"56615","31","H25.89","11","2","N","2021-10-29 12:40:48","8401","Other age-related cataract","N"
"57068","8","H25.89","10","1","N","2021-04-19 17:55:50","8375","Other age-related cataract","N"
"57070","8","H25.89","10","1","N","2021-05-17 17:57:02","8375","Other age-related cataract","N"
"57195","10","H25.89","11","1","N","2021-12-15 14:54:37","8401","Other age-related cataract","N"
"57196","8","H25.89","11","1","N","2021-12-15 14:55:24","8401","Other age-related cataract","N"
"57483","10","H25.89","11","1","N","2021-09-20 19:54:07","8401","Other age-related cataract","N"
"57637","6","H25.89","9","1","N","2020-09-25 18:40:59","8375","Other age-related cataract","N"
"58590","9","H25.89","9","1","N","2021-01-08 19:53:39","8375","Other age-related cataract","N"
"58590","9","H25.89","9","2","N","2021-01-08 19:53:39","8375","Other age-related cataract","N"
"58591","5","H25.89","9","1","N","2021-01-08 19:54:27","8375","Other age-related cataract","N"
"58591","5","H25.89","9","2","N","2021-01-08 19:54:27","8375","Other age-related cataract","N"
"58592","14","H25.89","11","1","N","2021-10-29 12:41:22","8401","Other age-related cataract","N"
"58592","14","H25.89","11","2","N","2021-10-29 12:41:23","8401","Other age-related cataract","N"

If, in turn, we look for CPT codes associated the article 56453 in article_x_hcpc_code.csv, we get relevant procedures (including the positive controls listed):

$ gawks '$1=="\"56453\""' article_x_hcpc_code.csv
"56453","8","66840","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; ASPIRATION TECHNIQUE, 1 OR MORE STAGES","Removal of lens material"
"56453","8","66850","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; PHACOFRAGMENTATION TECHNIQUE (MECHANICAL OR ULTRASONIC) (EG, PHACOEMULSIFICATION), WITH ASPIRATION","Removal of lens material"
"56453","8","66852","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; PARS PLANA APPROACH, WITH OR WITHOUT VITRECTOMY","Removal of lens material""56453","8","66920","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; INTRACAPSULAR","Extraction of lens"
"56453","8","66930","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; INTRACAPSULAR, FOR DISLOCATED LENS","Extraction of lens"
"56453","8","66940","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; EXTRACAPSULAR (OTHER THAN 66840, 66850, 66852)","Extraction of lens"
"56453","8","66982","83","1","N","2021-12-29 15:58:03","EXTRACAPSULAR CATARACT REMOVAL WITH INSERTION OF INTRAOCULAR LENS PROSTHESIS (1-STAGE PROCEDURE), MANUAL OR MECHANICAL TECHNIQUE (EG, IRRIGATION AND ASPIRATION OR PHACOEMULSIFICATION), COMPLEX, REQUIRING DEVICES OR TECHNIQUES NOT GENERALLY USED IN ROUTINE CATARACT SURGERY (EG, IRIS EXPANSION DEVICE, SUTURE SUPPORT FOR INTRAOCULAR LENS, OR PRIMARY POSTERIOR CAPSULORRHEXIS) OR PERFORMED ON PATIENTS IN THE AMBLYOGENIC DEVELOPMENTAL STAGE; WITHOUT ENDOSCOPIC CYCLOPHOTOCOAGULATION","Xcapsl ctrc rmvl cplx wo ecp"
"56453","8","66983","83","1","N","2021-12-29 15:58:03","INTRACAPSULAR CATARACT EXTRACTION WITH INSERTION OF INTRAOCULAR LENS PROSTHESIS (1 STAGE PROCEDURE)","Cataract surg w/iol 1 stage"
"56453","8","66984","83","1","N","2021-12-29 15:58:03","EXTRACAPSULAR CATARACT REMOVAL WITH INSERTION OF INTRAOCULAR LENS PROSTHESIS (1 STAGE PROCEDURE), MANUAL OR MECHANICAL TECHNIQUE (EG, IRRIGATION AND ASPIRATION OR PHACOEMULSIFICATION); WITHOUT ENDOSCOPIC CYCLOPHOTOCOAGULATION","Xcapsl ctrc rmvl w/o ecp"

note that this information appears to be limited to articles with article_type equal to 6 (Billing and Coding), and there appear to be ~1500 of this type of article:

$ gawks '$3==6{print $1}' article_x_contractor.csv | sort -u | wc -l
1568

For Translator, it would be useful to be able to search by ICD-10 code and retrieve a list of CPT codes with the article_id as the source (and vice versa).

Text here is mostly copied from my notes on NCATSTranslator/testing#171...

Remove `fire` API from configuration

It's obsolete and can be removed

Update `text_mining_cooccurrence_kp` source

From Edgar Gatica:

I have finished making my changes to the cooccurrence text mining KP, which can be found here: https://github.com/UCDenver-ccp/text_mining_cooccurrence_kp
Unfortunately there isn’t currently any data in the prod DB for cooccurrence, so the files that dumper would copy and parser would read don’t yet exist. Hopefully that will be coming soon.
Relatedly, Bill asked me if I could get it to update more frequently, as he expects the databases to update daily once everything is running correctly. I modified the “dumper.schedule” attribute in the manifest to attempt to schedule it for the beginning of every day, rather than once weekly. I assumed it was syntax like cron, but if it isn’t (or if I need to do something else to change the schedule) please let me know.

New API for PhenoScanner

Genotype-phenotype associations from PhenoScanner. Need to request permission.

BioMuta parser error

With release from 2018-10-25, parser gives this error:

Not Adequate key:value or value format: Natural_Variant_Annotation:In 3MC1) (PubMed:26419238.

add "code" link to semmeddb api

I was expecting a link to the parser code in the header, similar to how we have it for other APIs

Update / fix MGIGene2Phenotype API

Tasks:

Fork the parser (rename to use biothings repo): https://github.com/kevinxin90/mgi_gene2phenotype
update the parser and data for the API https://pending.biothings.io/mgigene2phenotype/
- fix this: the mgi.has_homolog field either has the value of "null" OR has the NCBIGene ID for the mouse gene (compare this record to this info). These are "equivalent ids", not IDs for two different genes with the homolog relationship.
- Instead, the NCBIGene ID for the mouse gene should instead be in a field like xrefs.ncbigene

new API for SNPedia

https://www.snpedia.com/

license: CC-BY-NC-SA

bulk download instruction:
https://www.snpedia.com/index.php/Bulk

customize metadata tags for each API

Minor issue, but right now, the meta tags for each pending API page are the same as the home page. For example, title is "Translator KP APIs" for all APIs on biothings.ncats.io. For https://biothings.ncats.io/dgidb, perhaps change the title to "DGIdb API | Translator" or something similar? Description can be "API for DGIdb hosted by the BioThings Project, made for National Center for Advancing Translational Sciences (NCATS) Biomedical Data Translator Program". Would be helpful for SEO presumably (as well as my browser tab management)...

New API for LINCS

LINCS provides drugs-protein target association.

missing "code" metadata for semmeddb API

https://biothings.ncats.io/semmeddb/metadata

does not include a "code" section, which cause this API page misses the "code" link.

v.s. other pending APIs, like this one:

https://biothings.ncats.io/semmed_anatomy/metadata at https://biothings.ncats.io/semmed_anatomy

Both are manifest based data plugins:

https://github.com/r76941156/semmed_parser
v.s.
https://github.com/kevinxin90/semmedana

Old cord_* APIs can be removed

The 10 cord_* APIs can be removed as they contain outdated data at this point. User should make use of the text_mining_co_occurrence_kp API instead.

cord_protein
cord_anatomy
cord_bp
cord_cc
cord_cell
cord_chemical
cord_disease
cord_gene
cord_genomic_entity
cord_ma

new API for CTD

Heterogeneous association network from CTD

May need to split into multiple APIs based on the entity-types.

update EBI gene2phenotype api?

BTE uses the pending BioThings API EBI gene2phenotype (infrequently).

This issue is to track whether we decide to update this api or use an external api that has this data (EBI likely has an API to retrieve this data...)

update pending apis used by BTE

It would be useful to update the pending APIs that are being used by BTE, since some may be several years old and there are newer versions of the data.

This may involve updating the parsers.

I notice that DISEASES is a resource that seems to update weekly, which perhaps implies an automated weekly update process may be nice.

The APIs that are currently being used (using the names from here):

hpo
uberon
semmed
semmed_anatomy
semmedbp
semmedchemical
semmedgene
semmedphenotype
DISEASES
ebigene2phenotype
mgigene2phenotype
go_bp
go_mf
go_cc
dgidb

create new API for gene-disease associations from AGR

The Alliance for Genome Resources (AGR) maintains a list of gene-disease associations for seven organisms (including human, mouse, rat, fly, worm, zebrafish, and yeast) at https://www.alliancegenome.org/downloads. Files can be downloaded in either JSON or TSV. I don't think these data have been imported into any other biothings API. They are candidates for inclusion in mygene.info and mydisease.info, but as a starting point we should create a pending API. This would be incredibly useful for Translator...

New API for cancer-associated mutations from OncoKB

http://oncokb.org/

Create endpoint for GNBR

I'd like to incorporate the GNBR graph into a pending API: https://zenodo.org/record/1495808#.XYJeuChKibg would be similar in scope and process to semmeddb, I think.

Prioritization would be heavily influenced by @kevinxin90 based on how much he thinks it would add to BioThings explorer. Might be worth some initial exploration of the data before deciding. Not all pairs of entity types are available (eg no drug-drug relationships).

Ticket based on a discussion with @jakelever at the ncats Seattle hackathon. (if I remember Jake's estimates correctly, millions of edges for ~100k nodes.)

Update metadata for older pending APIs

denovodb is an example:

https://pending.biothings.io/denovodb/metadata

v.s. the latest one like phewas:

https://pending.biothings.io/phewas/metadata

The metadata structure has been updated. biothings_client does not work with the old pending APIs now (ref: biothings/biothings_client.py#6).

New API for Integrated Dietary Supplement Knowledge Base (iDISK)

Given the emphasis on rare diseases in Translator, and given that rare diseases often have a metabolic origin, and given that the bar for trying an off-label treatment for a pharmaceutical compound in a human patient is relatively high, it would be useful to have an API resource that links dietary supplements to other biomedical entities. The Integrated Dietary Supplement Knowledge Base (iDISK) appears to be an excellent resource, and it would be very useful for Translator to create a new API for this.

iDISK encompasses a terminology of 4208 DS ingredient concepts, which are linked via 6 relationship types to 495 drugs, 776 diseases, 985 symptoms, 605 therapeutic classes, 17 system organ classes, and 137 568 DS products. iDISK also contains 7 concept attribute types and 3 relationship attribute types. Evaluation of the data extraction and integration process showed average errors of 0.3%, 2.6%, and 0.4% for concepts, relationships and attributes, respectively

data: https://conservancy.umn.edu/handle/11299/204783 (in a UMLS-like format, or in a neo4j dump)
publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075538/

Generated queries should skipped invalid values

Values such as "Object" or "undefined"

New API for DISEASES

DISEASES (http://diseases.jensenlab.org/)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. It also provides confidence scores across supporting evidence which can be used to further compare or filter the different types and sources of evidence. Two data files (associations with full evidence and filtered list of unique associations) are available at https://diseases.jensenlab.org/Downloads.

DGIdb API

Source file: http://dgidb.org/data/interactions.tsv

Represent each line of the tsv file as a JSON document, hash all values in the line and use it as the _id for the document.

Output (use the first row as an example):

{
  "_id": "b59670809cf23553",
  "subject": {
    "NCBIGene": "1022",
    "SYMBOL": "CDK7",
    "id": "NCBIGene:1022"
  },
  "object": {
    "name": "BMS-387032",
    "CHEMBL.COMPOUND": "CHEMBL296468",
    "id": "CHEMBL.COMPOUND:CHEMBL29468"
  },
  "association": {
    "edge_label": "decreases_activity_of",
    "relation_name": "inhibitor",
    "pubmed": [],
    "provided_by": "CancerCommons"
  }
}

Specifications:

Gene

For subject.id field, use entrez_id (NCBIGene) value by default
If entrez_id field is empty, use mygene.info service to query for the NCBIGene ID based on gene_name field, e.g. http://mygene.info/v3/query?q=symbol:CDK7&fields=entrezgene&species=human
If couldn't resolve using mygene, use SYMBOL:{gene_name} as subject.id

Chemical

For object.id field, use drug_chembl_id(CHEMBL.COMPOUND) value by default
If drug_chembl_id is empty, use mychem.info service to query for CHEMBL.COMPOUND ID based on drug_name, e.g. http://mychem.info/v1/query?q=chembl.pref_name:riluzole&fields=chembl.molecule_chembl_id
If couldn't resolve using mychem, use name:{drug_name} as object.id

edge_label

consult this yaml file to convert the value of interaction_types into biolink predicate, e.g. "DGIdb:inhibitor" should be converted to "decreases_activity_of"
keep the value of "interaction_types" as the value for association.relation_name field

pubmed

could be multiple, split by "," into a list

provided_by

use the value of interaction_claim_source

New API for WikiData

Wikidata is another data source providing heterogeneous association network. Data is accessible from its SPARQL endpoint, needs to build a wrapper to return JSON output.

Index page is not rendered correctly due to Vuex upgrade

The API list is not shown in the index page, with errors in console:

The root cause is that the statement <script src="https://unpkg.com/vuex"></script> in the index page will refer to the latest Vuex, which has recently been upgrade to v4, while our code are based on v3.

It's fixable by referring to a specific version, like <script src="https://unpkg.com/[email protected]/dist/vuex.js"></script> (the last v3.x before v4).

New API for MGI mouse gene2phenotype

MGI mouse gene2phenotype (http://www.informatics.jax.org/phenotypes.shtml)
MGI provides comprehensive annotations for mouse genes, including their phenotype and disease associations. By mapping mouse genes to their human orthologs, mouse phenotypes to human phenotypes, these associations provide insightful links between human genes and phenotypes. The multiple phenotype-related data files are available to download at
http://www.informatics.jax.org/downloads/reports/index.html#pheno.

use semmeddb version number in /metadata

Current output from https://biothings.ncats.io/semmeddb/metadata looks like this:

{
  "biothing_type": "association",
  "build_date": "2021-08-31T23:23:50.843857+00:00",
  "build_version": "20210831",
  "src": {
    "semmed_parser": {
      "licence": "CC BY 4.0",
      "stats": {
        "semmed_parser": 114383742
      },
      "version": "2.0",
      "license_url": "https://skr3.nlm.nih.gov/SemMedDB/",
      "url": "https://skr3.nlm.nih.gov/SemMedDB/"
    }
  },
  "stats": {
    "total": 114383742
  }
}

under build_version, it appears like it is a date provided. Instead, we should use the official version number. For example, from https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemMedDB_download.html the latest version currently is semmedVER43_R.

CHEMBL IDs for drug response kp api

currently the parser is handling these IDs incorrectly. These IDs are only for the objects.

Current data:

we want to keep records with IDs in this format: CHEMBL:CHEMBL261849
we don't want to keep records with IDs in this format: CHEMBL:3137320. These seem to be duplications of other rows (that have the first ID format above)

so we want to take records with IDs like CHEMBL:CHEMBL261849 and end up with records in the API that...

Have a field with the key CHEMBL_COMPOUND (note the _ here!) and the value as the ID (CHEMBL261849)
Have a field with the key id and the value with the whole curie (CHEMBL.COMPOUND:CHEMBL3137320)

Example:

"object": {
    "CHEMBL_COMPOUND": "CHEMBL3137320", 
    "id": "CHEMBL.COMPOUND:CHEMBL3137320",
    "name": "BMN-673",
    "type": "biolink:SmallMolecule"
}

But right now, the records look like this, which means there's a bug involving how the "CHEMBL" sub-string is handled for the id and CHEMBL_COMPOUND values...:

"object": {
    "CHEMBL_COMPOUND": "CHEMBL.COMPOUND3137320", 
    "id": "CHEMBL.COMPOUND:CHEMBL.COMPOUND3137320",
    "name": "BMN-673",
    "type": "biolink:SmallMolecule"
}

new API for Genotype-phenotype associations from EBI GeneAtlas

http://geneatlas.roslin.ed.ac.uk/

Dockerize pending API web deployment

Let's create a Docker deployment file for the production deployment:

Required Elasticsearch server is hosted externally and provided via environment variables for the web API to connect and query.
No need of Nginx, run just one tornado process within one container.
Plan to deploy to both dev/stage and prod environment. Likely need to have a docker-compose file to include all required configurations for different settings.

Create API for SuppKG (Dietary Supplements)

SuppKG contains a variety of edges for Dietary Supplements.

Publication: https://pubmed.ncbi.nlm.nih.gov/35709900/
Preprint: https://arxiv.org/abs/2106.12741
Download link: https://github.com/zhang-informatics/SemRep_DS/tree/main/SuppKG

There are 595222 entries under the links. Here is one example record:

        {
            "relations": [
                {
                    "pmid": 1394115,
                    "sentence": "Turmeric and curcumin were also found to reverse the aflatoxin induced liver damage produced by feeding aflatoxin B1 (AFB1) (5 micrograms/day per 14 days) to ducklings.",
                    "conf": 0.9303833842,
                    "tuid": 0
                },
                {
                    "pmid": 1394115,
                    "sentence": "Reversal of aflatoxin induced liver damage by turmeric and curcumin.",
                    "conf": 0.9396179318000001,
                    "tuid": 0
                }
            ],
            "source": "C0001734",
            "target": "C0151763",
            "key": "CAUSES"
        },

I believe we want to create a record like this (where the info for name can be found in the nodes section of the json).

{
    "_id": "C0001734_C0151763_CAUSES",
    "subject": {
        "umls": "C0001734",
        "name": "aflatoxin",
        "semtypes": [ "bacs", "hops"]
    },
    "relation": [
        {
            "pmid": 1394115,
            "sentence": "Turmeric and curcumin were also found to reverse the aflatoxin induced liver damage produced by feeding aflatoxin B1 (AFB1) (5 micrograms/day per 14 days) to ducklings.",
            "conf": 0.9303833842,
            "tuid": 0
        },
        {
            "pmid": 1394115,
            "sentence": "Reversal of aflatoxin induced liver damage by turmeric and curcumin.",
            "conf": 0.9396179318000001,
            "tuid": 0
        }
    ],
    "object": {
        "umls": "C0151763",
        "name": "damage liver",
        "semtypes": [ "patf" ]
    },
    "predicate": "CAUSES"
}

Update Multiomics Wellness KP data plugin

Hi @chunlei Wu (Exploring Agent, Service Provider) - We have new tsv files (v1.5) for the multiomics wellness KP. I've updated the parser and manifest file appropriately and tested a local deployment of the API using Biothings Studio. There are ~288,000 documents, so it's not huge. I believe the update should go smoothly. When you have some time, could you deploy the updated KG? Here's the repo with the parser and manifest file: https://github.com/Hadlock-Lab/multiomics_wellness_kp. Please let me know if you have any questions. We appreciate your help.

Handler for BioThings API providing graph type data

Example Graph representation:

{
    "subject": {
        "id": "MONDO:000123",
        "type": "Disease"
    },
    "object": {
        "id": "NCBIGene:1017",
        "type": "Gene",
        "taxid": "9606"
    },
    "association": {
        "predicate": "negatively_regulates",
        "publications": ["PMID:123", "PMID:124"]
    }
}

Above output could be represented in another way by switching the subject & object and reverse the predicate, e.g.

{
    "subject": {
        "id": "NCBIGene:1017",
        "type": "Gene",
        "taxid": "9606"
    },
    "object": {
        "id": "MONDO:000123",
        "type": "Disease"
    },
    "association": {
        "predicate": "negatively_regulated_by",
        "publications": ["PMID:123", "PMID:124"]
    }
}

So if the user provides the following query

biothings.ncats.io/api1/query? \
    subject.id:"MONDO:000123" AND \
    object.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulates"

It should be translated into two queries

same as user query

biothings.ncats.io/api1/query? \
    subject.id:"MONDO:000123" AND \
    object.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulates"

reverse it

biothings.ncats.io/api1/query? \
    object.id:"MONDO:000123" AND \
    subbject.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulated_by"

And the response from the 2nd query should also be reversed and merge with the first query.

In summary:

translate user query into two queries (one original, one reverse query)
For the reverse query,

all fields starting with object (e.g. object.id) should be replaced with subject
all fields starting with subject (e.g. subject.id) should be replaced with object
reverse the value association.predicate based on a mapping file

For the reverse query result

change root key object into subject
change root key subject into object
reverse the value of association.predicate based on a mapping file

merge the results from two queries.

We have an API set up providing graph type data for testing: https://biothings.ncats.io/biggim

new API for InnateDB

https://www.innatedb.com/

Innate immunity interactions

add gene-phenotype associations to HPO API

The Human Phenotype Ontology project curates annotations between genes and phenotypes. Full files are downloadable here: https://hpo.jax.org/app/download/annotation. Would be very useful to add these annotations to the pending HPO API. After that, would be good to add these info to the SmartAPI record (https://smart-api.info/registry?q=a5b0ec6bfde5008984d4b6cde402d61f), and specifically the mapping file (https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/hpo/smartapi.yaml).

creating new API for TISSUES dataset

Adding TISSUES for BTE

From the same lab that made DISEASES (which is one of our pending biothings apis)....

There's TISSUES, a similar resource but it has manually curated, experimental, and text-mined associations between genes and tissues in several species (including human).

It looks like it may update weekly. Its downloads are available here.

Note: I think this lab does have an API available, but it might not have the batch query / other abilities of biothings apis.

refactoring DISEASES

There may need to be discussion for both the DISEASES and TISSUES apis on whether to keep the "channels" (assertions from different kinds of knowledge like manually curated knowledge vs experimental vs text-mined) in separate fields (tissues.knowledge vs tissues.text-mined)....

Right now, DISEASES is putting all the "channels" info together under one field for access. So BTE doesn't have a simple way to assign different Biolink predicates to the assertions made with different kinds of knowledge...

Old TextMiningKP API can be removed

The textminingkp API can be deleted as it contains only a sample dataset from over a year ago.

link APIs listed at http://pending.biothings.io/ with corresponding code repo

If I am interested in creating an issue for a specific pending API (e.g., http://pending.biothings.io/hpo) it's not clear where I should create that issue. It's also not clear where the parser code lives in case I want to do a pull request. As one possible low-tech solution, we could have a simple table or list in the README that links to the relevant parser repo.