clinical-genomics / patientmatcher Goto Github PK

A MatchMaker Exchange server

Home Page: https://clinical-genomics.github.io/patientMatcher/

License: MIT License

Python 96.46% Dockerfile 0.34% JavaScript 0.10% HTML 3.10%

matching-algorithm patients gene score variants similarity-score patient-matching matchmaking-server matchmaker matchmaking

patientmatcher's Introduction

patientMatcher - a Python and MongoDB - based MatchMaker Exchange server

PatientMatcher is a Python (Flask) and MongoDB - based implementation of a MatchMaker Exchange (MME) server, developed and actively maintained by Clinical Genomics, Science For Life Laboratory in Stockholm. PatientMatcher is designed as a standalone application, but can easily communicate with external applications via REST API. The MME Stockholm node is being implemented in clinical production in collaboration with the Genomic Medicine Center Karolinska at the Karolinska University Hospital.

Info on how to test PatientMatcher or to set up a server containing an app frontend and backend is available on the documentation pages.

patientmatcher's People

Contributors

Stargazers

Watchers

Forkers

ousamg john1711 northwestwitch thriledlokki983 jamesscheffor mafatihul

patientmatcher's Issues

HPO files from Jenkins lab have changed so phenotypes can't be parsed any more.

Phenotype file is parsed wrongly. Update the code to be able to parse the new file:

http://compbio.charite.de/jenkins/job/hpo.annotations/lastStableBuild/artifact/misc/phenotype_annotation.tab

Create the code to notify patient contact when a match happens.

Send them an email.
But It would be best to add a parameter in the config file so the server admin can decide to notify contacts or no. Something like this:

MATCH_NOTIFICATIONS = True

Create specific user and password for connection to the database

It's OK to wait until something works!

Look into why some servers don't like extra stuff in request headers

Host for instance. And remove host as a param in external_matcher, because it's not used any more.
Totally NOT urgent, to do when there is time

Group match results by searched node

Not really a bug here, but it would add more useful info to the match objects.

Match objects saved to database now look like this:

matches = [
        {    # External match where test_patient is the query and with results
            '_id' : 'match_1',
            'has_matches' : True,
            'data' : {
                'patient' : {
                    'id' : 'P0000079',
                    'contact' : {
                        'href' : 'mailto:[email protected]'
                    }
                }
            },
            **'results' : [
                {'patient' : { 'patient_data' : 'test_stuff'}},
                {'patient' : { 'patient_data2' : 'test_stuff2'}},
            ],**
            **'searched_nodes' : [{ 'id': 'node1_id' , 'label': 'node1_label'}, { 'id': 'node2_id' , 'label': 'node2_label'}]**
            'match_type' : 'external'
        },
       ...
     ]

It would be nice to save matches like this instead:

matches = [
        {    # External match where test_patient is the query and with results
            '_id' : 'match_1',
            'has_matches' : True,
            'data' : {
                'patient' : {
                    'id' : 'P0000079',
                    'contact' : {
                        'href' : 'mailto:[email protected]'
                    }
                }
            },
            **'results' : [ {
                   'searched_node' : {{ 'id': 'node1_id' , 'label': 'node1_label'}},
                   'patients' : [{'patient' : { 'patient_data' : 'test_stuff'}}, {'patient' : { 'patient_data2' : 'test_stuff2'}}]
             },
             {
                   'searched_node' : {{ 'id': 'node2_id' , 'label': 'node2_label'}},
                   'patients' : [{'patient' : { 'patient_data3' : 'test_stuff'}} ]
            },
          ],**# end of results
            'match_type' : 'external'
        },
       ...
     ]

MatchMaker updated requirements n.2 --> Disclaimer!

New MME Service Requirements go into effect March 1.

Point n.7 :

For each database to which a MME service is connected by an API, the connected database’s disclaimers and terms should be posted on the MME service’s website and displayed with query results. Disclaimers can be found on GitHub (https://github.com/ga4gh/mme-apis). Disclaimers and terms may be returned with query results, which should supersede those found on the previously mentioned GitHub repository.

I'm still not sure what this means.

MatchMaker updated requirements n.3 --> metrics!

New point of the updated MME Service Requirements go into effect March 1:

Implement MME Metrics API (https://github.com/ga4gh/mme-apis/blob/master/metrics-api.md) and make metrics publicly available

I guess this will be done by creating a new endpoint. It should be relatively easy.

Introduce a result score threshold to limit the number of returned low score results

An arbitrary result threshold in the config file.

Set it to 0 to return all hits
Return only results above this threshold
Number of max returned results (with highest score) is still decided by MAX_RESULTS param

Fine-tuning of the SCORE_THRESHOLD param for Scout matching results

The current implementation of patientMatcher at Clinical Genomics returns the 5 patients most similar (with the highest patient score) to the patient used in the query.
Any patient with a patient score above 0 might be returned, and the patient might be returned even if it has an extremely low score, for instance due to genotype not matching at all and phenotypes quite far in the HPO hierarchy
Since it is better not to return any match then to send results with unspecific matches the SCORE_THRESHOLD should be fine-tuned. Action:

Return patients with a score compatible with at least a perfect gene (not variant) matching or an extremely similar phenotype to the query patient.

Current configs for patientMatcher @ CG:
MAX_GT_SCORE = 0.75
MAX_PHENO_SCORE = 0.25

Assuming that GT features are composed of 3 genes/variants, and assuming that there is gene matching for one of these features (1/4 of the max possible score) -->
0.75 / 3 (3 GT feature) / 4 --> 0.0625
with some phenotype similarity --> 0.08? 0.1?

0.0625 maybe to ensure that at least there is matching at the gene level (not variant).

Test on stage!

patientMatcher is safe to use and GDPR compliant

Upload our disclaimer in ga4gh MME disclaimers list

Send it here: https://github.com/ga4gh/mme-apis/tree/master/disclaimers

Improve phenotype matching algorithm

By assigning extra score if users provide at least 3 HPO terms.

patientMatcher Conda environment created

Open "metrics" endpoint to the public

Wait for feedback from the MME committee.

Improve phenotype matching by ranking all patients in MME node by similarity with query patient

After ranking by similarity order them by most similar and return only those with highest similarity (or those sharing variants or genes with query patient, even if they have lower phenotype similarity)

Add option to notify minimal or complete matching info by emails

For security reasons it's good to introduce an option to notify complete or partial info for MME matches by email.

I've changed the email body to follow the MME directives, see #65. Still it's good to have the option of NOT showing variants and phenotype terms of the matchings.

So add a config parameter that allows to show only contact info and cases ID in the notification emails.

Remove pymongo<3.7 dependency

Of course fix code that will become deprecated first

Installing error due some dependency problems

  Running setup.py develop for patientMatcher
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

pytest-cov 2.10.1 requires pytest>=4.6, but you'll have pytest 4.4.1 which is incompatible.

Genotype similarity bug

GT similarity is 0 when providing variant info

Write code to capture indirect matches

For instance if an external node sends a query patient x and there are 3 matches on the database (a, b, c) then the matching result is saved already. Create a method that allows you to ask the server: are there external matches for patient b? And the server returns external patient x

API views should produce and consume well formatted json data

Fix also the tests accordingly

Create script to update resources

HPO terms: https://hpo.jax.org/app/download/ontology

Phenotype annotations: https://hpo.jax.org/app/download/annotation

Introduce the possibility to choose the external node to send a request to

Maybe it's interesting for final users. I am aware of at least another MME service which allows users to do that (RD Connect)

Feedback from MyGene2

Make sure patientMatcher users are able to read email addresses from matches on other nodes

Scale-up simulation

Try to insert 5000 patients (100 times those 50 benchmarking patients) and see how the server behaves.

Use ensembl gene id as main gene id

But allow matching also with patients having genes described by entrez gene id and gene official symbol

Capture HTML error messages from other nodes responses

So far ive assumed that server errors would also be returned a json but this is not universally valid because some node mighy return plain HTML, ad this causes a json parsing errror in the external matches handler. So do try to parse json but catch get html text as an option as well