Giter Club home page Giter Club logo

pending.api's Introduction

This repository maintains a set of biomedical knowledgebase APIs built with the BioThings SDK. These APIs are either the Knowledgebase APIs built for the Translator Project or the "pending" APIs to be integrated into the official BioThings APIs (e.g. MyGene.info, MyVariant.info, MyChem.info, MyDisease.info etc.)

The list of Translator-associated knowledgebase APIs are hosted at: https://biothings.ncats.io.

There are additional APIs are hosted at https://pending.biothings.io.

Knowledgebase APIs for the Translator Project

Each knowledgebase API is created as a "data plugin" (see examples under plugins folder). The BioThings SDK package will then process the data plugin and turn it into a hosted "BioThings API". You can follow the tutorial of the data plugin for more details.

How to add a new API

Our internal developement team will handle the process of adding a new data plugin and deploying it as a new API. For our internal developers, please follow this documentation

How to update data for an existing API

For external collaborators who have submitted their "data plugins" as new APIs, you can follow this workflow to request a update of your data:

https://github.com/biothings/biothings_explorer/blob/main/docs/README-maintaining-a-data-source.md

The documentation is maintained at the biothings_explorer repository, as each knowledgebase API will be integrated into the BioThings Explorer application become a Translator's standard KP (Knowledge Provider) API.

pending.api's People

Contributors

bettyli037 avatar chevvak2 avatar ctrl-schaff avatar ericz1803 avatar erikyao avatar everaldorodrigo avatar kannabhargav avatar kevinxin90 avatar marcodarko avatar neuralflux avatar newgene avatar pahmadi8740 avatar sirloon avatar tokebe avatar zcqian avatar zubairqazi avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pending.api's Issues

update SemMedDB APIs

splitting out SemMedDB from the broader issue of updating APIs on pending #25

In addition to updating to the latest data files, we will remove the logic to convert to biolink model from the parser, leaving that to be done in the smartAPI mapping.

update pending dgidb API

BTE is using https://pending.biothings.io/dgidb quite a bit, and I believe it hasn't been updated since it was made (in 2018/2019?).

We are interested in updating BOTH the parser + data for this API. This API is using the biolink-model predicates in the association.edge-label field....so we may want to remove this or map the dgidb relations to the most recent version of the biolink-model.

New EBI G2P gene2phenotype API

EBI gene2phenotype (https://www.ebi.ac.uk/gene2phenotype)

EBI gene2pheontype (or G2P) is a knowledge source providing human-curated gene-to-disease associations with a focus on cancer and developmental disease areas. Each G2P entry associates an allelic requirement and a mutational consequence at a defined locus with a disease entity. A confidence level and evidence link are assigned to each entry. The data file is available to download at:

https://www.ebi.ac.uk/gene2phenotype/downloads.

Columbia Open Health Data

source: https://u.pcloud.link/publink/show?code=XZ3ibtkZlJt9yHczhp5NjlGkui6E4uvTXL7y

The link contains a zip file, we only need to parse the "paired_concept_counts_associations" file.

type: EHR records

Manual dumper

First generate a static JSON/pickle file to store xrefs for all omop IDs (primary id used in omop)
Use this endpoint: http://cohd.smart-api.info/#/OMOP/xrefFromOMOP
may also consider this one: http://cohd.smart-api.info/#/OMOP/mapToStandardConceptID

Need to create two APIs based on this "paired_concept_counts_associations" file.

  • First API

    {
      "_id": "...",
      "xrefs": {
          "umls": ...,
          "mesh": ...,
      },
      "results": [
        {
          "associated_concept_id": 2213216,
          "associated_concept_name": "Cytopathology, selective cellular enhancement technique with interpretation (eg, liquid based slide preparation method), except cervical or vaginal",
          "associated_domain_id": "Measurement",
          "concept_count": 330,
          "concept_frequency": 0.0001843131625848748,
          "concept_id": 192855,
          "dataset_id": 1
        },
        {
          "associated_concept_id": 4214956,
          "associated_concept_name": "History of clinical finding in subject",
          "associated_domain_id": "Observation",
          "concept_count": 329,
          "concept_frequency": 0.00018375463784976913,
          "concept_id": 192855,
          "dataset_id": 1
        }
       ]
    }
     
    
    
  • Second API

    {
      "_id": "concept1-concept2",
      "concept1": {
         "omop": "....",
         "xrefs": {
             "mesh": ...,
             "umls": ....
         }
      },
      "concept2": {
         "omop": "....",
         "xrefs": {
             "mesh": ...,
             "umls": ....
         }
      },
      "results": [
        {
          "concept_count": 10,
          "concept_frequency": 0.000005585247351056813,
          "concept_id_1": 192855,
          "concept_id_2": 2008271
        }
      ]
    }
    

New API for BioStacks

BioStacks is a new source for text-mined Heterogeneous association network from Dr. Larry Hunter's group. Need to work with his team when it's ready.

fix slug for mrcoc API to "cooccurrence"

Currently, http://pending.biothings.io/mrcoc generates example URLs like this:

http://pending.biothings.io/mrcoc/coocurence/D001055-D003397

These give a 404 error because the correct URL should be this (notice spelling change in coocurence / cooccurence )

http://pending.biothings.io/mrcoc/cooccurence/D001055-D003397

But as long as we are fixing things, we ideally would use the correct spelling to cooccurrence

http://pending.biothings.io/mrcoc/cooccurrence/D001055-D003397

Switch the rendering for hostname "biothings.ci.transltr.io"

Currently we have 2 renderings of sites, "pending" and "ncats". They differs in aesthetics, yet sharing the same backend.

Previously, the hostname-to-rendering mapping is:

hostname site rendering
biothings.ncats.io "ncats"
pending.biothings.io "pending"

Now we have a new hostname biothings.ci.transltr.io, which should uses "ncats" (temporarily):

hostname site rendering
biothings.ncats.io "ncats"
biothings.ci.transltr.io "ncats"
pending.biothings.io "pending"

Related module: web/handlers/__init__.py

new API for Text Mining Provider

  1. Dumper
  • Create a dumper looking for newest release in this folder (currently only one release)
  1. Parser
  • primary source file
  • use column "subject" as the primary id
  • use column "edge_label" as the root key
  • rows with same "subject" field should appear in the same record
  • rows with the same "subject" and "edge_label" should be in the same list as the value for that "edge_label"
  • reference secondary source filie based on evidence id
  1. Example output
{
    "_id": "PR:000010159",
    "expressed_in": [
        {
            "uberon": "UBERON:0002355",
            "relation": "RO:0002206",
            "association_type": "GeneToExpressionSiteAssociation",
            "evidence": {
                "sentence": "...",
                "pmc": "PMC324396"
            }
        },
        {
            "uberon": "UBERON:0003066",
            "relation": "RO:0002206",
            "association_type": "GeneToExpressionSiteAssociation",
            "evidence": {
                "sentence": "...",
                "pmc": "PMC324396"
            }
        }
    ]
}

update ontology apis?

However, we may be able to annotate ontology lookup service (website) (smartapi entry) to retrieve this kind of information?

The following pending BioThings APIs are used by BTE (but aren't called upon often):

  • Gene Ontology Biological Process API
  • Gene Ontology Cellular Component API
  • Gene Ontology Molecular Activity API
  • Human Phenotype Ontology API
  • UBERON Ontology API

New data source for ICD and CPT codes from CMS

It would be useful to get relationships between medical procedures (expressed as CPT codes) and diseases (expressed as ICD-10 codes) from a more authoritative source than semmeddb, I found that we can find many of these links from the Centers for Medicare &Medicaid Services. For example, the CMS article on Cataract Extraction (https://www.cms.gov/medicare-coverage-database/view/article.aspx?articleid=56453) has sections for both ICD-10 codes and CPT codes.

To get this information in a structured format, they provide database dumps on their downloads page. Specifically, the file for Current Articles appears to be a zip file of CSV exports of some database tables. So if we search article_x_icd10_covered.csv for articles related to H25.89 ("Other age-related cataract"), we get several articles including 56453 (the article linked above):

$ gawks '$3=="\"H25.89\""' article_x_icd10_covered.csv
"56453","8","H25.89","11","1","N","2021-12-29 15:58:03","8401","Other age-related cataract","N"
"56453","8","H25.89","11","3","N","2021-12-29 15:58:04","8401","Other age-related cataract","N"
"56544","19","H25.89","7","1","N","2019-12-19 13:07:44","8233","Other age-related cataract","N"
"56544","19","H25.89","7","2","N","2019-12-19 13:07:44","8233","Other age-related cataract","Y"
"56549","12","H25.89","7","1","N","2019-09-11 17:42:02","8233","Other age-related cataract","N"
"56613","13","H25.89","11","1","N","2021-12-29 16:38:58","8401","Other age-related cataract","N"
"56615","31","H25.89","11","1","N","2021-10-29 12:40:48","8401","Other age-related cataract","N"
"56615","31","H25.89","11","2","N","2021-10-29 12:40:48","8401","Other age-related cataract","N"
"57068","8","H25.89","10","1","N","2021-04-19 17:55:50","8375","Other age-related cataract","N"
"57070","8","H25.89","10","1","N","2021-05-17 17:57:02","8375","Other age-related cataract","N"
"57195","10","H25.89","11","1","N","2021-12-15 14:54:37","8401","Other age-related cataract","N"
"57196","8","H25.89","11","1","N","2021-12-15 14:55:24","8401","Other age-related cataract","N"
"57483","10","H25.89","11","1","N","2021-09-20 19:54:07","8401","Other age-related cataract","N"
"57637","6","H25.89","9","1","N","2020-09-25 18:40:59","8375","Other age-related cataract","N"
"58590","9","H25.89","9","1","N","2021-01-08 19:53:39","8375","Other age-related cataract","N"
"58590","9","H25.89","9","2","N","2021-01-08 19:53:39","8375","Other age-related cataract","N"
"58591","5","H25.89","9","1","N","2021-01-08 19:54:27","8375","Other age-related cataract","N"
"58591","5","H25.89","9","2","N","2021-01-08 19:54:27","8375","Other age-related cataract","N"
"58592","14","H25.89","11","1","N","2021-10-29 12:41:22","8401","Other age-related cataract","N"
"58592","14","H25.89","11","2","N","2021-10-29 12:41:23","8401","Other age-related cataract","N"

If, in turn, we look for CPT codes associated the article 56453 in article_x_hcpc_code.csv, we get relevant procedures (including the positive controls listed):

$ gawks '$1=="\"56453\""' article_x_hcpc_code.csv
"56453","8","66840","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; ASPIRATION TECHNIQUE, 1 OR MORE STAGES","Removal of lens material"
"56453","8","66850","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; PHACOFRAGMENTATION TECHNIQUE (MECHANICAL OR ULTRASONIC) (EG, PHACOEMULSIFICATION), WITH ASPIRATION","Removal of lens material"
"56453","8","66852","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; PARS PLANA APPROACH, WITH OR WITHOUT VITRECTOMY","Removal of lens material""56453","8","66920","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; INTRACAPSULAR","Extraction of lens"
"56453","8","66930","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; INTRACAPSULAR, FOR DISLOCATED LENS","Extraction of lens"
"56453","8","66940","83","1","N","2021-12-29 15:58:03","REMOVAL OF LENS MATERIAL; EXTRACAPSULAR (OTHER THAN 66840, 66850, 66852)","Extraction of lens"
"56453","8","66982","83","1","N","2021-12-29 15:58:03","EXTRACAPSULAR CATARACT REMOVAL WITH INSERTION OF INTRAOCULAR LENS PROSTHESIS (1-STAGE PROCEDURE), MANUAL OR MECHANICAL TECHNIQUE (EG, IRRIGATION AND ASPIRATION OR PHACOEMULSIFICATION), COMPLEX, REQUIRING DEVICES OR TECHNIQUES NOT GENERALLY USED IN ROUTINE CATARACT SURGERY (EG, IRIS EXPANSION DEVICE, SUTURE SUPPORT FOR INTRAOCULAR LENS, OR PRIMARY POSTERIOR CAPSULORRHEXIS) OR PERFORMED ON PATIENTS IN THE AMBLYOGENIC DEVELOPMENTAL STAGE; WITHOUT ENDOSCOPIC CYCLOPHOTOCOAGULATION","Xcapsl ctrc rmvl cplx wo ecp"
"56453","8","66983","83","1","N","2021-12-29 15:58:03","INTRACAPSULAR CATARACT EXTRACTION WITH INSERTION OF INTRAOCULAR LENS PROSTHESIS (1 STAGE PROCEDURE)","Cataract surg w/iol 1 stage"
"56453","8","66984","83","1","N","2021-12-29 15:58:03","EXTRACAPSULAR CATARACT REMOVAL WITH INSERTION OF INTRAOCULAR LENS PROSTHESIS (1 STAGE PROCEDURE), MANUAL OR MECHANICAL TECHNIQUE (EG, IRRIGATION AND ASPIRATION OR PHACOEMULSIFICATION); WITHOUT ENDOSCOPIC CYCLOPHOTOCOAGULATION","Xcapsl ctrc rmvl w/o ecp"

note that this information appears to be limited to articles with article_type equal to 6 (Billing and Coding), and there appear to be ~1500 of this type of article:

$ gawks '$3==6{print $1}' article_x_contractor.csv | sort -u | wc -l
1568

For Translator, it would be useful to be able to search by ICD-10 code and retrieve a list of CPT codes with the article_id as the source (and vice versa).

Text here is mostly copied from my notes on NCATSTranslator/testing#171...

Update `text_mining_cooccurrence_kp` source

From Edgar Gatica:

I have finished making my changes to the cooccurrence text mining KP, which can be found here: https://github.com/UCDenver-ccp/text_mining_cooccurrence_kp
Unfortunately there isn’t currently any data in the prod DB for cooccurrence, so the files that dumper would copy and parser would read don’t yet exist. Hopefully that will be coming soon.
Relatedly, Bill asked me if I could get it to update more frequently, as he expects the databases to update daily once everything is running correctly. I modified the “dumper.schedule” attribute in the manifest to attempt to schedule it for the beginning of every day, rather than once weekly. I assumed it was syntax like cron, but if it isn’t (or if I need to do something else to change the schedule) please let me know.

BioMuta parser error

With release from 2018-10-25, parser gives this error:

Not Adequate key:value or value format: Natural_Variant_Annotation:In 3MC1) (PubMed:26419238.

Update / fix MGIGene2Phenotype API

Tasks:

customize metadata tags for each API

Minor issue, but right now, the meta tags for each pending API page are the same as the home page. For example, title is "Translator KP APIs" for all APIs on biothings.ncats.io. For https://biothings.ncats.io/dgidb, perhaps change the title to "DGIdb API | Translator" or something similar? Description can be "API for DGIdb hosted by the BioThings Project, made for National Center for Advancing Translational Sciences (NCATS) Biomedical Data Translator Program". Would be helpful for SEO presumably (as well as my browser tab management)...

Old cord_* APIs can be removed

The 10 cord_* APIs can be removed as they contain outdated data at this point. User should make use of the text_mining_co_occurrence_kp API instead.

cord_protein
cord_anatomy
cord_bp
cord_cc
cord_cell
cord_chemical
cord_disease
cord_gene
cord_genomic_entity
cord_ma

new API for CTD

Heterogeneous association network from CTD

May need to split into multiple APIs based on the entity-types.

update EBI gene2phenotype api?

BTE uses the pending BioThings API EBI gene2phenotype (infrequently).

This issue is to track whether we decide to update this api or use an external api that has this data (EBI likely has an API to retrieve this data...)

update pending apis used by BTE

It would be useful to update the pending APIs that are being used by BTE, since some may be several years old and there are newer versions of the data.

This may involve updating the parsers.

I notice that DISEASES is a resource that seems to update weekly, which perhaps implies an automated weekly update process may be nice.

The APIs that are currently being used (using the names from here):

  • hpo
  • uberon
  • semmed
  • semmed_anatomy
  • semmedbp
  • semmedchemical
  • semmedgene
  • semmedphenotype
  • DISEASES
  • ebigene2phenotype
  • mgigene2phenotype
  • go_bp
  • go_mf
  • go_cc
  • dgidb

create new API for gene-disease associations from AGR

The Alliance for Genome Resources (AGR) maintains a list of gene-disease associations for seven organisms (including human, mouse, rat, fly, worm, zebrafish, and yeast) at https://www.alliancegenome.org/downloads. Files can be downloaded in either JSON or TSV. I don't think these data have been imported into any other biothings API. They are candidates for inclusion in mygene.info and mydisease.info, but as a starting point we should create a pending API. This would be incredibly useful for Translator...

Create endpoint for GNBR

I'd like to incorporate the GNBR graph into a pending API: https://zenodo.org/record/1495808#.XYJeuChKibg would be similar in scope and process to semmeddb, I think.

Prioritization would be heavily influenced by @kevinxin90 based on how much he thinks it would add to BioThings explorer. Might be worth some initial exploration of the data before deciding. Not all pairs of entity types are available (eg no drug-drug relationships).

Ticket based on a discussion with @jakelever at the ncats Seattle hackathon. (if I remember Jake's estimates correctly, millions of edges for ~100k nodes.)

New API for Integrated Dietary Supplement Knowledge Base (iDISK)

Given the emphasis on rare diseases in Translator, and given that rare diseases often have a metabolic origin, and given that the bar for trying an off-label treatment for a pharmaceutical compound in a human patient is relatively high, it would be useful to have an API resource that links dietary supplements to other biomedical entities. The Integrated Dietary Supplement Knowledge Base (iDISK) appears to be an excellent resource, and it would be very useful for Translator to create a new API for this.

iDISK encompasses a terminology of 4208 DS ingredient concepts, which are linked via 6 relationship types to 495 drugs, 776 diseases, 985 symptoms, 605 therapeutic classes, 17 system organ classes, and 137 568 DS products. iDISK also contains 7 concept attribute types and 3 relationship attribute types. Evaluation of the data extraction and integration process showed average errors of 0.3%, 2.6%, and 0.4% for concepts, relationships and attributes, respectively

data: https://conservancy.umn.edu/handle/11299/204783 (in a UMLS-like format, or in a neo4j dump)
publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075538/

New API for DISEASES

DISEASES (http://diseases.jensenlab.org/)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. It also provides confidence scores across supporting evidence which can be used to further compare or filter the different types and sources of evidence. Two data files (associations with full evidence and filtered list of unique associations) are available at https://diseases.jensenlab.org/Downloads.

DGIdb API

Source file: http://dgidb.org/data/interactions.tsv

Represent each line of the tsv file as a JSON document, hash all values in the line and use it as the _id for the document.

Output (use the first row as an example):

{
  "_id": "b59670809cf23553",
  "subject": {
    "NCBIGene": "1022",
    "SYMBOL": "CDK7",
    "id": "NCBIGene:1022"
  },
  "object": {
    "name": "BMS-387032",
    "CHEMBL.COMPOUND": "CHEMBL296468",
    "id": "CHEMBL.COMPOUND:CHEMBL29468"
  },
  "association": {
    "edge_label": "decreases_activity_of",
    "relation_name": "inhibitor",
    "pubmed": [],
    "provided_by": "CancerCommons"
  }
}

Specifications:

  1. Gene
  1. Chemical
  1. edge_label
  • consult this yaml file to convert the value of interaction_types into biolink predicate, e.g. "DGIdb:inhibitor" should be converted to "decreases_activity_of"
  • keep the value of "interaction_types" as the value for association.relation_name field
  1. pubmed
  • could be multiple, split by "," into a list
  1. provided_by
  • use the value of interaction_claim_source

New API for WikiData

Wikidata is another data source providing heterogeneous association network. Data is accessible from its SPARQL endpoint, needs to build a wrapper to return JSON output.

Index page is not rendered correctly due to Vuex upgrade

The API list is not shown in the index page, with errors in console:

image

The root cause is that the statement <script src="https://unpkg.com/vuex"></script> in the index page will refer to the latest Vuex, which has recently been upgrade to v4, while our code are based on v3.

It's fixable by referring to a specific version, like <script src="https://unpkg.com/[email protected]/dist/vuex.js"></script> (the last v3.x before v4).

New API for MGI mouse gene2phenotype

MGI mouse gene2phenotype (http://www.informatics.jax.org/phenotypes.shtml)
MGI provides comprehensive annotations for mouse genes, including their phenotype and disease associations. By mapping mouse genes to their human orthologs, mouse phenotypes to human phenotypes, these associations provide insightful links between human genes and phenotypes. The multiple phenotype-related data files are available to download at
http://www.informatics.jax.org/downloads/reports/index.html#pheno.

use semmeddb version number in /metadata

Current output from https://biothings.ncats.io/semmeddb/metadata looks like this:

{
  "biothing_type": "association",
  "build_date": "2021-08-31T23:23:50.843857+00:00",
  "build_version": "20210831",
  "src": {
    "semmed_parser": {
      "licence": "CC BY 4.0",
      "stats": {
        "semmed_parser": 114383742
      },
      "version": "2.0",
      "license_url": "https://skr3.nlm.nih.gov/SemMedDB/",
      "url": "https://skr3.nlm.nih.gov/SemMedDB/"
    }
  },
  "stats": {
    "total": 114383742
  }
}

under build_version, it appears like it is a date provided. Instead, we should use the official version number. For example, from https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemMedDB_download.html the latest version currently is semmedVER43_R.

CHEMBL IDs for drug response kp api

currently the parser is handling these IDs incorrectly. These IDs are only for the objects.

Current data:

  • we want to keep records with IDs in this format: CHEMBL:CHEMBL261849
  • we don't want to keep records with IDs in this format: CHEMBL:3137320. These seem to be duplications of other rows (that have the first ID format above)

so we want to take records with IDs like CHEMBL:CHEMBL261849 and end up with records in the API that...

  • Have a field with the key CHEMBL_COMPOUND (note the _ here!) and the value as the ID (CHEMBL261849)
  • Have a field with the key id and the value with the whole curie (CHEMBL.COMPOUND:CHEMBL3137320)

Example:

"object": {
    "CHEMBL_COMPOUND": "CHEMBL3137320", 
    "id": "CHEMBL.COMPOUND:CHEMBL3137320",
    "name": "BMN-673",
    "type": "biolink:SmallMolecule"
}

But right now, the records look like this, which means there's a bug involving how the "CHEMBL" sub-string is handled for the id and CHEMBL_COMPOUND values...:

"object": {
    "CHEMBL_COMPOUND": "CHEMBL.COMPOUND3137320", 
    "id": "CHEMBL.COMPOUND:CHEMBL.COMPOUND3137320",
    "name": "BMN-673",
    "type": "biolink:SmallMolecule"
}

Dockerize pending API web deployment

Let's create a Docker deployment file for the production deployment:

  • Required Elasticsearch server is hosted externally and provided via environment variables for the web API to connect and query.
  • No need of Nginx, run just one tornado process within one container.
  • Plan to deploy to both dev/stage and prod environment. Likely need to have a docker-compose file to include all required configurations for different settings.

Create API for SuppKG (Dietary Supplements)

SuppKG contains a variety of edges for Dietary Supplements.

Publication: https://pubmed.ncbi.nlm.nih.gov/35709900/
Preprint: https://arxiv.org/abs/2106.12741
Download link: https://github.com/zhang-informatics/SemRep_DS/tree/main/SuppKG

There are 595222 entries under the links. Here is one example record:

        {
            "relations": [
                {
                    "pmid": 1394115,
                    "sentence": "Turmeric and curcumin were also found to reverse the aflatoxin induced liver damage produced by feeding aflatoxin B1 (AFB1) (5 micrograms/day per 14 days) to ducklings.",
                    "conf": 0.9303833842,
                    "tuid": 0
                },
                {
                    "pmid": 1394115,
                    "sentence": "Reversal of aflatoxin induced liver damage by turmeric and curcumin.",
                    "conf": 0.9396179318000001,
                    "tuid": 0
                }
            ],
            "source": "C0001734",
            "target": "C0151763",
            "key": "CAUSES"
        },

I believe we want to create a record like this (where the info for name can be found in the nodes section of the json).

{
    "_id": "C0001734_C0151763_CAUSES",
    "subject": {
        "umls": "C0001734",
        "name": "aflatoxin",
        "semtypes": [ "bacs", "hops"]
    },
    "relation": [
        {
            "pmid": 1394115,
            "sentence": "Turmeric and curcumin were also found to reverse the aflatoxin induced liver damage produced by feeding aflatoxin B1 (AFB1) (5 micrograms/day per 14 days) to ducklings.",
            "conf": 0.9303833842,
            "tuid": 0
        },
        {
            "pmid": 1394115,
            "sentence": "Reversal of aflatoxin induced liver damage by turmeric and curcumin.",
            "conf": 0.9396179318000001,
            "tuid": 0
        }
    ],
    "object": {
        "umls": "C0151763",
        "name": "damage liver",
        "semtypes": [ "patf" ]
    },
    "predicate": "CAUSES"
}

Update Multiomics Wellness KP data plugin

Hi @chunlei Wu (Exploring Agent, Service Provider) - We have new tsv files (v1.5) for the multiomics wellness KP. I've updated the parser and manifest file appropriately and tested a local deployment of the API using Biothings Studio. There are ~288,000 documents, so it's not huge. I believe the update should go smoothly. When you have some time, could you deploy the updated KG? Here's the repo with the parser and manifest file: https://github.com/Hadlock-Lab/multiomics_wellness_kp. Please let me know if you have any questions. We appreciate your help.

Handler for BioThings API providing graph type data

Example Graph representation:

{
    "subject": {
        "id": "MONDO:000123",
        "type": "Disease"
    },
    "object": {
        "id": "NCBIGene:1017",
        "type": "Gene",
        "taxid": "9606"
    },
    "association": {
        "predicate": "negatively_regulates",
        "publications": ["PMID:123", "PMID:124"]
    }
}

Above output could be represented in another way by switching the subject & object and reverse the predicate, e.g.

{
    "subject": {
        "id": "NCBIGene:1017",
        "type": "Gene",
        "taxid": "9606"
    },
    "object": {
        "id": "MONDO:000123",
        "type": "Disease"
    },
    "association": {
        "predicate": "negatively_regulated_by",
        "publications": ["PMID:123", "PMID:124"]
    }
}

So if the user provides the following query

biothings.ncats.io/api1/query? \
    subject.id:"MONDO:000123" AND \
    object.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulates"

It should be translated into two queries

  1. same as user query
biothings.ncats.io/api1/query? \
    subject.id:"MONDO:000123" AND \
    object.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulates"
  1. reverse it
biothings.ncats.io/api1/query? \
    object.id:"MONDO:000123" AND \
    subbject.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulated_by"

And the response from the 2nd query should also be reversed and merge with the first query.

In summary:

  1. translate user query into two queries (one original, one reverse query)
  2. For the reverse query,
  • all fields starting with object (e.g. object.id) should be replaced with subject
  • all fields starting with subject (e.g. subject.id) should be replaced with object
  • reverse the value association.predicate based on a mapping file
  1. For the reverse query result
  • change root key object into subject
  • change root key subject into object
  • reverse the value of association.predicate based on a mapping file
  1. merge the results from two queries.

We have an API set up providing graph type data for testing: https://biothings.ncats.io/biggim

add gene-phenotype associations to HPO API

The Human Phenotype Ontology project curates annotations between genes and phenotypes. Full files are downloadable here: https://hpo.jax.org/app/download/annotation. Would be very useful to add these annotations to the pending HPO API. After that, would be good to add these info to the SmartAPI record (https://smart-api.info/registry?q=a5b0ec6bfde5008984d4b6cde402d61f), and specifically the mapping file (https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/hpo/smartapi.yaml).

creating new API for TISSUES dataset

Adding TISSUES for BTE

From the same lab that made DISEASES (which is one of our pending biothings apis)....

There's TISSUES, a similar resource but it has manually curated, experimental, and text-mined associations between genes and tissues in several species (including human).

It looks like it may update weekly. Its downloads are available here.

Note: I think this lab does have an API available, but it might not have the batch query / other abilities of biothings apis.


refactoring DISEASES

There may need to be discussion for both the DISEASES and TISSUES apis on whether to keep the "channels" (assertions from different kinds of knowledge like manually curated knowledge vs experimental vs text-mined) in separate fields (tissues.knowledge vs tissues.text-mined)....

Right now, DISEASES is putting all the "channels" info together under one field for access. So BTE doesn't have a simple way to assign different Biolink predicates to the assertions made with different kinds of knowledge...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.