Giter Club home page Giter Club logo

mimic-fhir's People

Contributors

alexmbennett2 avatar alistairewj avatar dokotela avatar evan8456 avatar joennlae avatar piotrszul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mimic-fhir's Issues

Java validator testing switch

Using thejava validator to validate resources in the python testing folder. Currently using HAPI but could be nice to have the java validator as another option since its quicker to use.

To Do

  • Add popup warning for Java Validator use when testing
  • Validate CodeSystems and ValueSets with Java Validator
  • Decouple database loading of fhir jsons and validation of fhir json
    • Store the files locally? Or commit them as static tests?

Microbiology Updates

The structure of the microbiology data in FHIR is not the most straightforward. Go through and confirm the data is coming across fine. To test:

  • Observation_micro_test
    • If no organism attached to test, make sure some information is passed on (currently looks like it would just say there is a test that occurred)
    • Check for duplicate organisms referenced
  • Observation_micro_org
    • Ensure all micro_org have a test
    • Check for micro_org without any susceptibility refs
    • Check for duplicate org generation
  • Observation_micro_susc
    • Check if the interpretation valueset, custom interpretation for every entry don't make sense. The MIMIC column that is used might allow for custom text entry

Misc questions

  • Do we need/want a Specimen resource
    • There is an id for specimen but not much else, still probably worth it
  • Do dilution values need to be stored in ObservaitonMicroSusc?
  • Add invariant to micro profiles to specify one of hasMember or valueString MUST be present
    • Maybe not necessary for ObservtionMicroSusc since no hasMember here
    • US core already has an invariant at the top level that covers this!

Medication updates

An update to the medication resources in FHIR needs to be done. The following updates to each resource:

  • Medication- MedicationMix

    • Update so only generating distinct medication mixes (currently generating one for every prescription). Look into using product_decription/product_code. Currently with prescriptions we get ~23,000 medication resources, but only 10,625 are straight medication resources and only ~2,000 prescriptions are needed for multiple drugs. The rest of the prescriptions are just one drug (so should be mapped to medication resource instead of making a medication mix).
    • Debate UUID source text, should it be product_code or some concatenation of values. Currently pharmacy_id which won't work after distinct medication mixes are made
    • Add medication-drug binding to mimic-medication profile (not bound right now until the medication/medication mix sorted)
  • MedicationAdministration

    • Set provenance for administration (icu vs hosp). But also think of better provenance groupings than icu/hosp
      • Do we set this as the EventHistory provenance or add an extension for the info?
    • Reference the medication mixes via product_description (found in emar_detail for hosp, not in inputevents or d_items). ICU medadmin should be fine without medication mixes as they are a subset of about ~400 meds just for ICU
    • Add in SIG to the text field (or see if there is a good spot for it, SIG may go in MedicationRequest actually)
    • dosage.dose.value must be numeric but the source column emar.dose_due does provides some text and some numeric values. Need to either decide on some logic how to deal with this, or have multiple medication admin for the dose_given?
      • Multiple admin for each dose given (but still text in dose_given... so need to strip that out)
      • Potentially have a parent med admin with dose_due information (kinda like a Dispense resource, but no way to connect Medication Dispense resource into Medication Administration resource
  • MedicationRequest

    • Add medication requests not in pharmacy. Pull from prescriptions table
    • Add fhir_etl table for status element for mapping of MIMIC-IV status value to FHIR standard value. status element is bound in MedicationRequest as a code, so no codesystem can be supplied like in a CodeableConcept. Need this updated so that profiles will validate in anneal.
    • Add additional info to dosageInstructions. Look into prescriptions and pharmacy columns for anything on dose, rate, duration etc. Document in MIMIC-IV to FHIR mappings file
    • Reference medication mixes via the updated medication-codes
  • MedicationDispense

    • Investigate the use in adding MedicationDispense. Can link to MedicationAdministrations

Misc To Do

  • Delete fhir_etl.map_drug_id. NDC mapping to be done later so all drugs pulled into Medication resource
  • Map out medication landscape for MIMIC (jupyter notebook)
    • Example code from MIMIC-IV playing with medications: https://colab.research.google.com/drive/1REu-ofzNzqsTT1cxLHIegPB0nGmwKaM0?usp=sharing
    • Look into emar, emar_detail, and inputevents
    • Look at specific cases for common medication types
      • Pills - ranitidine/acetominophin
      • Infusion - heparin/noepinephrine. Heparin will have varying dose_due from the titration.
      • Antibiotic - vancamycin. See how it is administered (ie IV)
      • Saline - good example of med admin with NULL medication
  • Medication-site codesystem needs to be cleaned up
    • Looks like a lot of free text in here... so mapping some of that?

mimic-profiles

Create FHIR profiles to bind custom MIMIC valuesets

  • Encounter
  • Patient
  • Observation
  • Condition
  • Procedure
  • Medication Administration
  • Medication
  • Medication Request

QA patient ethnicity/race

Currently passing in ethnicity from MIMIC for both race and ethnicity. Will definitely need to update race to map to proper race/ethnicity.

Thoughts:

  • Also is it even validating the extension then? Cause I've just been passing MIMIC values right in...
  • Map race to proper values

Java validator against local files

Add example files for the validator to test. This will decouple the loading process from the db.

But then that is really only a validation of the validator... so not really worth it? I guess it says the validator is working BUT it does not test any current data.

Micro specimen add type

The micobiologyevents table has a column spec_type_desc that could be translated to the Specimen type.

Thoughts:

  • May recombine Specimen since they both will be mapping type
  • Compare labs and micro type, see if they fully overlap
  • Create new terminology if needed

Fix argument parsing for py_mimic_fhir

I noted two potential bugs in running python py_mimic_fhir:

  • If you do not have the MIMIC_FHIR_LOG_PATH environment variable set, and you try to use the --log_path input argument, it doesn't work (complains log_path is not provided)
  • If you provide a directory which does not exist, it fails. Probably for logs we should warn/create the folder.

Bundling resources

Bundle resources to automatically detect references and more efficiently send resources to HAPI. To Do:

  • Test referential integrity inside bundles
    • Encounter and Condtion
    • Observation micro (test, org, and susc)
  • Microbiology needs to be bundled, lots of inter resource referencing going on
  • Test patient bundles. Create bundles of all the resources with links to a single patient
  • Verify maximum bundle size created for one patient
    • I think the labevents/chartevents will be what pushes this over the edge
    • Could create base patient bundle and then send out labevents/chartevents as own bundles after
  • Create bundling order
    1 Start with data resources: Organization/Medication
    2. Patient related: patient/encounter/condition/procedure
    3. Micro: ObservationMicroTest/ObservationMicroOrg/ObservationMicroSusc
    4. Labs: ObservatioLabs
    5. Meds: MedicationRequest/MedicationDispense/MedicationAdministration
    6. ICU base: EncounterICU/MedicationAdministrationICU
    7. ICU Observations: ObservationChartevents/ObservationDatetimeevents/ObservationOutputevents

Pathling import of mimic-fhir resources

For the tutorial demo of mimic-fhir, my hope was to use Pathling to demonstrate different fhir functions. But I am hitting a roadblock right off the bat, the $import is failing. The steps I have taken:

  • Started pathling with docker: docker run --rm -p 8080:8080 aehrc/pathling
  • Put the patient.ndjson in the /usr/share/staging folder (default for pathling)
  • Created parameters and posted to pathling server using this notebook

But I am getting an error saying that there is an error reading the file. I've given all permissions to the /usr/share/staging folder so that should be fine. @jpwiedekopf you had mentioned using Pathling, any ideas?

Update fhir_etl.subjects with demo script

Need to download install mimic cxr for this, then add this to fhir_etl.subjects:

WITH cxr AS (
    SELECT subject_id,
        min(study_id) as study_id_min,
        max(study_id) AS study_id_max,
        count(*) as n
    FROM mimic_cxr.record_list
    GROUP BY 1
)
select pt.subject_id,
    gender,
    anchor_age,
    anchor_year,
    anchor_year_group,
    dod,
    cxr.n as n_cxr,
    ie.stay_id,
    ie.intime,
    ie.outtime,
    ROW_NUMBER() OVER (
        PARTITION BY pt.subject_id
        ORDER BY ie.intime
    ) AS rn,
    dense_rank() OVER (
        ORDER BY pt.subject_id
    ) AS rank
from mimic_core.patients pt
    INNER JOIN mimic_icu.icustays ie ON pt.subject_id = ie.subject_id
    LEFT JOIN cxr ON pt.subject_id = cxr.subject_id
where anchor_age > 0
    and anchor_year_group in ('2011 - 2013', '2014 - 2016')
ORDER BY subject_id,
    intime
limit 134;

Limit to 100 potentially, since that is the demo size we are going for.

Update medication mapping

With the latest release of MIMIC-IV, the medication tables have been updated to facilitate better medication mapping in FHIR.

New columns:

  • prescriptions
    • poe_id
    • poe_seq
    • formulary_drug_cd

What the new values provide is matching between emar and prescriptions, but ICU will still just be medication text fields....

How medication will be mapped:

  • Medication
    • Pull in all formulary drug codes, this will be foundation for hospital medication
    • Will also need to pull all names for ICU medications. These are just simple medications, so store full name. Convert in anneal step to get these to ndc/gsn
      • have itemid and label for code/display
    • And maybe all names from pharmacy, since no formulary drug codes there.
  • MedicationRequest
    • Pulls in from Prescriptions
  • MedicationDispense
    • Pulls from Pharmacy
  • MedicationAdministration
    • Pulls from emar_detail
  • MedicationAdministrationICU
    • Pulls from inputevents

Potential issues

  • in emar product codes not always present, sometimes product_description is present without the code...
    • this is where poe_id would come in handy for some of these
  • Can link prescription drug codes to emar_detail drug codes, but no mapping really to pharmacy.... This is where it could back to having single drug names.
    • Potentially create single drug names out of ICU meds and pharmacy meds, then have meds from formulary drug codes
    • Mapping from MedicationRequest->MedicationAdministration seems straight forward, medication dispense less so...
  • Should extra codes be stored? Ie if there is ndc and gsn, should we store them as reference? Like there is a primary code and then secondary??
  • Common identifiers for hosp medications
    • pharmacy_id is present in pharmacy, prescriptions and emar_detail for the majority, but this would create a TON of duplicate medications if that was the source.

Potential solutions:

  • Create a separate medication profile for ICU, so maps to the itemid/labels (only 474 drugs)
    • Then have a main medication profile that takes formulary drug codes, and skip pharmacy??

Add comments to ObservationMicroSusc and ObservationMicroTest

Add comments to Observation.note for both ObservationMicroSusc and ObservationMicroTest.

For ObservationMicroSusc, store all comments in the note.

For ObservationMicroTest, currently storing the comments in the valueString. Might be worth having in both spots? Debate this. Comments is a result when there is no result...

Export patient bundles (vs resources)

Add functionality to export patient bundles in place of just full export of resources. Currently outputting each research type and all the resources associated in one file. Exporting in a patient bundle seems to be the community push (but some bundles will be very big)

Patient bundles may need to be organized by patient and encounter? Cause an individual patient could have ~50,000 resources on their own (one patient has 42,000 chartevents).

Integrate terminology script into py_mimic_fhir

Terminology is currently generated in a jupyter notebook, move this into the py_mimic_fhir package.

Steps to do this:

  • Do this as its own branch after cli branch is complete
  • Add terminology module from code in the jupyter notebook
  • Add CLI options to run the terminology through
  • Test CLI terminology generation

ConceptMaps needed

This is a list of ConceptMap resources that will likely be needed in the future

How to handle lab results in comments

There a quite a few results that appear to be reported in the comments. For example there are ~4 million labevents entries that have no value or valuenum but have a comments='NEG.'. This seems like a decent chunk of data that would not translate over properly to FHIR without some parsing.

Thoughts

  • Could pull in comments to valueString if labevents value and valuenum are missing
    • Only issue is when the comments are excessive and not really a result. Works nicely if negative is the result, but not so nicely when it is a full comment
  • comments are currently being captured in FHIR notes , is that enough?
  • top offenders that could be translated first
    1. NEG -> Negative (~4 million results)
    2. Normal (~880,000 results)
    3. Rare (~210,000)
    4. None (~200,000)
  • confusing comments with a lot of hits, that could be translated
    1. Random... don't know what this means really here (~650,000)
    2. Hold -> kinda like a registered result then? (~572,000)

Test ValueSet reference and expansion

Test referencing the ValueSet in the CodeSystem, then expanding when it arrives in the FHIR Server.

Thoughts:

  • May need to ValueSets for generation of the implementation guide - mimic-profiles
    • Test IG generation first without the ValueSets to see if this is possible
    • Add reference to valuesets in codesystems with ValueSets gone and remake IG
  • Test ValueSet expansion with HAPI FHIR

Clean HAPI Server - Deleting old data on HAPI

We need a method to cleanse the server between runs to ensure there are no hanging references.

There are a couple methods to clean the server when it is up and running:

  • $delete - will delete all resources of a certain type, but keeps a record of the patient (ie patient was deleted)
    • $delete-cascading - can give a resource and it will delete it and any references to it, so we can pass patients and it would delete all resources related with that patient
  • $expunge will remove any record of patient
  • $delete-expunge - will delete and then expunge (seems redundant if expunging anyways)

What I've tested:

  • $delete - works
  • $delete-cascading - works but is a little slow, nice to have everything associated with a patient deleted
  • $expunge - Does NOT work. Posted to chat.fhir.org to see if there are any ideas. Kinda need expunge more than delete

Stopgap solutions:

  • Drop database that hapi is looking at. The next launch of HAPI will remake the database, but it will be more overhead when starting (normally ~5 minutes but takes ~20 minutes)

Pull in transfers as location for encounters

The mimc4fhir team pulled in mimic_core.transfers. They set the transfer information in the Encounter.

What we can do then:

  • Create careunit Locations, these will be referenced in the Encounters
  • Pull in Location history to Encounters based on time window of Encounter

Pull in demo subjects

Demo mimic-fhir store should be based on the patients found in the MIMIC-IV to OMOP mapping.

Alistair sent over list of patients that should be pulled in, so write those subjects to fhir_etl.subjects. This will affect all tables

Versioning Terminology based on MIMIC

The terminology in mimic-fhir should reference the version of MIMIC that they are based off of.

There are a couple spots to put this in CodeSystem/ValueSet:

  • version: this is the business version of the codesystem, but we could map this to the mimic version?
  • useContext.value[x]: could store the mimic version here to be the context of creating the mimic-fhir terminology
  • purpose: why the code system is definied, so could put the version here. But this is a markdown datatype

Catch failed bundles and write out to file

When sending bundles to HAPI FHIR catch any failed bundles and write out to file. These can then be rerun later.

To Do

  • Catch failed bundles
  • Output list of failed resources/bundles
    • Output resources or bundle? I guess I need to output the patient associated with the resources
    • Output error message too
  • Find way to rerun failed bundles
    • Need a way to create bundle from a list of resources file

Test demo data validate and export

Test the 100 demo patients through the current pipeline to see timing and issues.

Steps to test

  • Validate patient and all related resources - send to hapi fhir
  • Export all resources - pull from hapi fhir
    • Need to clean database before this cause it has old resources, so wouldn't be an accurate test

ICU observation warning

The ICU observations (chartevents, datetimeevents, outputevents) all have a warning column. There is no real mapping into the Observation resources for it. Is it worth creating an extension for warning? Also what does the warning even tell us?

I want to get a better idea of the warning column before converting into an extension

Server Tests

Tests to make sure HAPI server setup is working for basic resource actions. Tests:

  • Check a custom valueset works with $validate-code
  • Check a custom codeystem works with $validate-code
  • PUT each resource (ie Patient, Encounter etc)
    • Will need to make sure there is some sequential order since most resources reference patient/encounter
  • Bundle transaction

Add CLI functionality to py_mimic_fhir

Add CLI functionality to run through py_mimic_fhir. I see it with two modes:

  1. Validation
  • validator argument with options Java or HAPI
  • output argument with options: resources or bundles
  • env argument to point to env file? Or have default assumption probably
  1. Terminology
  • Generate all the terminology through this
  • Just start with option to generate all (very fast so not really an issue)
  • Could add option to create one

Encounter References Hosp v ICU

ICU tables were converted to FHIR resources with references to only the ICU encounter, not hospital encounter. The limitation is that FHIR only allows one Encounter reference from the resources.

Current solution:

  • The ICU resources reference the ICU encounter
  • The ICU encounter references the Hosp encounter

To Do

  • Think if there is a better way to link ICU resources to the base Hosp encounter

Update micro specimen codesystem

The microbiologyevents table has codes and labels for all specimen. Only taking in the spec_type_desc right now. Should have pulled in spec_itemId. So the update would be:

  • New codeystem
    • code: spec_itemid
    • display: spec_type_desc

Currently just using spec_type_desc as the code

Look into passing bundles in parallel

Currently passing bundles in for loop. But HAPI seems to have a bit of a limit when passing, so may not be as big an issue.

Attempted to pass bundles using list comprehension but was 7-8 times slower.

Other ideas

  • map requests
  • threading

Terminology generation verification

Need to make sure terminology generation can be run repeatably when updating between versions of mimic.

Current steps to generate terminology:

  • Run script in sql/codeystem to get the codes
  • Drop the codes into a template csv
  • Generate CodeSystem with python script
  • Create valueset csv that points to CodeSystem
  • Generate ValueSet with python script

To Do

  • Confirm all codeystems have a sql script (some where simple(ie 5 codes) so may not have created)
  • Create any missing codesystem scripts
  • Create Python function to call scripts and generate CodeSystem (bypass the csv step)
  • Add valueset and codesystem generation functions into py_mimic_fhir (is this needed? Created notebook right now)

Experience on the Quickstart

Hey,
I ran the script and not all tables went through without issues. The problem is the mappings in fhir_etl. Only subjects and uuid_namespace were created. The others were not found.

psql:fhir_patient.sql:117: FEHLER: Relation »fhir_etl.map_gender« existiert nicht ZEILE 52: LEFT JOIN fhir_etl.map_gender mg

Bildschirmfoto 2022-04-01 um 11 41 51

Validate codes against custom valuesets

The resources have bindings to custom valuesets for MIMIC. We need to verify the codes are being validated properly.

Current issue:

  • A custom valuesets for admission-class have been created and is bound to the Encounter class element, but validation is failing. The valueset is not being found by HAPI properly so it does not find the codes inside the valueset.
    • Posted on chat.fhir and working through this
    • Tested valueset on Java validator and still not passing so may be issue with format of our URL or resource

To Do:

  • Validate admission-class valueset against Java validator
  • $validate-code against admission-class valueset
  • Validate Encounter resource against admission-class valueset on the HAPI Server

Update terminology to match mimic names

Make sure the terminology that references mimic tables users there exact table names.

ie datetime-d-items should be datetimeevents-d-items

To update:

  • datetime-d-items -> datetimeevents-d-items
  • procedure-d-items -> procedureevents-d-items
  • Look for others

ValueSets/fhir_etl needed

Go through valueset bindings of the resources and find which ones can be mapped to custom valuesets and which ones need to be mapped in Postgres with fhir_etl tables.

Review lab tables for more data to pull into fhir

Review labevents and confirm that all columns we want are making it into FHIR. The initial pass I did pulled in anything that I could map to the ObservationLab profile. There may need to be extensions to fit others (ie lab urgency).

Some notes:

  • All status are set to 'final' in fhir currently
    • status can be set to registered/preliminary/final/amended

Refactor Bundler

The Bundler runs over each of the resources to bundle and validate. The class has grown and could potentially be better organized.

General thoughts:

  • post_all_bundles function is clunky and could probably be rewritten to avoid repetition. A lot of it is logging and recording responses.
    • Is there a way to iterate over the classes objects here?
  • Could pass bundle list with groupings versus having individual bundle functions for each bundle. But the individual bundle functions are very nice for testing

None of these changes are necessary for Bundler to run, just potential refactor ideas

ICD Codes

MIMIC ICD Codes are not strictly ICD-9 or ICD-10. There are codes in MIMIC that are obsolete ICD codes across ICD versions.

Current solution

  • created FHIR codesystems for MIMIC ICD-9 and ICD-10

To Do

  • Map codes into FHIR with concept mapping
  • Look into a way to utilize the FHIR ICD codesystems directly with MIMIC

Canonical ValueSet reference in CodeSystem

Set the canonical ValueSet in the CodeSystem if it captures the whole codesystem.

To test this:

  • Update the a CodeSystem to reference the ValueSet
  • SUSHI Check
    • Remove the ValueSet from the IG generation
    • Regenerate the IG without the Valueset
  • HAPI Check
    • Delete a ValueSet
    • Post the updated CodeSystem
    • Expand the valueset based on that

Test cases needed

To ensure the integrity of the data some tests need to be created:

  • Referential integrity

    • Check any references in the scripts (ie patient, encounter, condition, medication) exist in the main table
    • Encounter condition -> Condition
  • Check for null json strings

    • Null json strings can be created when a system element is needed but the actual value element is missing
    • Or references with null identifiers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.