kind-lab / mimic-fhir Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 13.0 866 KB

A version of MIMIC-IV in FHIR

License: MIT License

Jupyter Notebook 57.91% PLpgSQL 5.28% Shell 0.30% Python 36.52%

mimic-fhir's People

Contributors

Stargazers

Watchers

Forkers

jayakishore7 thetarho unite-genomics fhir-fli fhirfly bjohn22 nielxu jimyung6642 tkdagdelen joennlae avain piotrszul aehrc

mimic-fhir's Issues

Java validator testing switch

Using thejava validator to validate resources in the python testing folder. Currently using HAPI but could be nice to have the java validator as another option since its quicker to use.

To Do

Add popup warning for Java Validator use when testing
Validate CodeSystems and ValueSets with Java Validator
Decouple database loading of fhir jsons and validation of fhir json
- Store the files locally? Or commit them as static tests?

Microbiology Updates

The structure of the microbiology data in FHIR is not the most straightforward. Go through and confirm the data is coming across fine. To test:

Observation_micro_test
- If no organism attached to test, make sure some information is passed on (currently looks like it would just say there is a test that occurred)
- Check for duplicate organisms referenced
Observation_micro_org
- Ensure all micro_org have a test
- Check for micro_org without any susceptibility refs
- Check for duplicate org generation
Observation_micro_susc
- Check if the interpretation valueset, custom interpretation for every entry don't make sense. The MIMIC column that is used might allow for custom text entry

Misc questions

Do we need/want a Specimen resource
- There is an id for specimen but not much else, still probably worth it
Do dilution values need to be stored in ObservaitonMicroSusc?
Add invariant to micro profiles to specify one of hasMember or valueString MUST be present
- Maybe not necessary for ObservtionMicroSusc since no hasMember here
- US core already has an invariant at the top level that covers this!

Medication updates

An update to the medication resources in FHIR needs to be done. The following updates to each resource:

Medication- MedicationMix
- Update so only generating distinct medication mixes (currently generating one for every prescription). Look into using product_decription/product_code. Currently with prescriptions we get ~23,000 medication resources, but only 10,625 are straight medication resources and only ~2,000 prescriptions are needed for multiple drugs. The rest of the prescriptions are just one drug (so should be mapped to medication resource instead of making a medication mix).
- Debate UUID source text, should it be product_code or some concatenation of values. Currently pharmacy_id which won't work after distinct medication mixes are made
- Add medication-drug binding to mimic-medication profile (not bound right now until the medication/medication mix sorted)
MedicationAdministration
- Set provenance for administration (icu vs hosp). But also think of better provenance groupings than icu/hosp
  - Do we set this as the EventHistory provenance or add an extension for the info?
- Reference the medication mixes via product_description (found in emar_detail for hosp, not in inputevents or d_items). ICU medadmin should be fine without medication mixes as they are a subset of about ~400 meds just for ICU
- Add in SIG to the text field (or see if there is a good spot for it, SIG may go in MedicationRequest actually)
- dosage.dose.value must be numeric but the source column emar.dose_due does provides some text and some numeric values. Need to either decide on some logic how to deal with this, or have multiple medication admin for the dose_given?
  - Multiple admin for each dose given (but still text in dose_given... so need to strip that out)
  - Potentially have a parent med admin with dose_due information (kinda like a Dispense resource, but no way to connect Medication Dispense resource into Medication Administration resource
MedicationRequest
- Add medication requests not in pharmacy. Pull from prescriptions table
- Add fhir_etl table for status element for mapping of MIMIC-IV status value to FHIR standard value. status element is bound in MedicationRequest as a code, so no codesystem can be supplied like in a CodeableConcept. Need this updated so that profiles will validate in anneal.
- Add additional info to dosageInstructions. Look into prescriptions and pharmacy columns for anything on dose, rate, duration etc. Document in MIMIC-IV to FHIR mappings file
- Reference medication mixes via the updated medication-codes
MedicationDispense
- Investigate the use in adding MedicationDispense. Can link to MedicationAdministrations

Misc To Do

Delete fhir_etl.map_drug_id. NDC mapping to be done later so all drugs pulled into Medication resource
Map out medication landscape for MIMIC (jupyter notebook)
- Example code from MIMIC-IV playing with medications: https://colab.research.google.com/drive/1REu-ofzNzqsTT1cxLHIegPB0nGmwKaM0?usp=sharing
- Look into emar, emar_detail, and inputevents
- Look at specific cases for common medication types
  - Pills - ranitidine/acetominophin
  - Infusion - heparin/noepinephrine. Heparin will have varying dose_due from the titration.
  - Antibiotic - vancamycin. See how it is administered (ie IV)
  - Saline - good example of med admin with NULL medication
Medication-site codesystem needs to be cleaned up
- Looks like a lot of free text in here... so mapping some of that?

mimic-profiles

Create FHIR profiles to bind custom MIMIC valuesets

QA patient ethnicity/race

Currently passing in ethnicity from MIMIC for both race and ethnicity. Will definitely need to update race to map to proper race/ethnicity.

Thoughts:

Also is it even validating the extension then? Cause I've just been passing MIMIC values right in...
Map race to proper values

Java validator against local files

Add example files for the validator to test. This will decouple the loading process from the db.

But then that is really only a validation of the validator... so not really worth it? I guess it says the validator is working BUT it does not test any current data.

Micro specimen add type

The micobiologyevents table has a column spec_type_desc that could be translated to the Specimen type.

Thoughts:

May recombine Specimen since they both will be mapping type
Compare labs and micro type, see if they fully overlap
Create new terminology if needed

Fix argument parsing for py_mimic_fhir

I noted two potential bugs in running python py_mimic_fhir:

If you do not have the MIMIC_FHIR_LOG_PATH environment variable set, and you try to use the --log_path input argument, it doesn't work (complains log_path is not provided)
If you provide a directory which does not exist, it fails. Probably for logs we should warn/create the folder.

Bundling resources

Bundle resources to automatically detect references and more efficiently send resources to HAPI. To Do:

Test referential integrity inside bundles
- Encounter and Condtion
- Observation micro (test, org, and susc)
Microbiology needs to be bundled, lots of inter resource referencing going on
Test patient bundles. Create bundles of all the resources with links to a single patient
Verify maximum bundle size created for one patient
- I think the labevents/chartevents will be what pushes this over the edge
- Could create base patient bundle and then send out labevents/chartevents as own bundles after
Create bundling order
1 Start with data resources: Organization/Medication
2. Patient related: patient/encounter/condition/procedure
3. Micro: ObservationMicroTest/ObservationMicroOrg/ObservationMicroSusc
4. Labs: ObservatioLabs
5. Meds: MedicationRequest/MedicationDispense/MedicationAdministration
6. ICU base: EncounterICU/MedicationAdministrationICU
7. ICU Observations: ObservationChartevents/ObservationDatetimeevents/ObservationOutputevents

Pathling import of mimic-fhir resources

For the tutorial demo of mimic-fhir, my hope was to use Pathling to demonstrate different fhir functions. But I am hitting a roadblock right off the bat, the $import is failing. The steps I have taken:

Started pathling with docker: docker run --rm -p 8080:8080 aehrc/pathling
Put the patient.ndjson in the /usr/share/staging folder (default for pathling)
Created parameters and posted to pathling server using this notebook

But I am getting an error saying that there is an error reading the file. I've given all permissions to the /usr/share/staging folder so that should be fine. @jpwiedekopf you had mentioned using Pathling, any ideas?

Update fhir_etl.subjects with demo script

Need to download install mimic cxr for this, then add this to fhir_etl.subjects:

WITH cxr AS (
    SELECT subject_id,
        min(study_id) as study_id_min,
        max(study_id) AS study_id_max,
        count(*) as n
    FROM mimic_cxr.record_list
    GROUP BY 1
)
select pt.subject_id,
    gender,
    anchor_age,
    anchor_year,
    anchor_year_group,
    dod,
    cxr.n as n_cxr,
    ie.stay_id,
    ie.intime,
    ie.outtime,
    ROW_NUMBER() OVER (
        PARTITION BY pt.subject_id
        ORDER BY ie.intime
    ) AS rn,
    dense_rank() OVER (
        ORDER BY pt.subject_id
    ) AS rank
from mimic_core.patients pt
    INNER JOIN mimic_icu.icustays ie ON pt.subject_id = ie.subject_id
    LEFT JOIN cxr ON pt.subject_id = cxr.subject_id
where anchor_age > 0
    and anchor_year_group in ('2011 - 2013', '2014 - 2016')
ORDER BY subject_id,
    intime
limit 134;

Limit to 100 potentially, since that is the demo size we are going for.

Update medication mapping

With the latest release of MIMIC-IV, the medication tables have been updated to facilitate better medication mapping in FHIR.

New columns:

prescriptions
- poe_id
- poe_seq
- formulary_drug_cd

What the new values provide is matching between emar and prescriptions, but ICU will still just be medication text fields....

How medication will be mapped:

Medication
- Pull in all formulary drug codes, this will be foundation for hospital medication
- Will also need to pull all names for ICU medications. These are just simple medications, so store full name. Convert in anneal step to get these to ndc/gsn
  - have itemid and label for code/display
- And maybe all names from pharmacy, since no formulary drug codes there.
MedicationRequest
- Pulls in from Prescriptions
MedicationDispense
- Pulls from Pharmacy
MedicationAdministration
- Pulls from emar_detail
MedicationAdministrationICU
- Pulls from inputevents

Potential issues

in emar product codes not always present, sometimes product_description is present without the code...
- this is where poe_id would come in handy for some of these
Can link prescription drug codes to emar_detail drug codes, but no mapping really to pharmacy.... This is where it could back to having single drug names.
- Potentially create single drug names out of ICU meds and pharmacy meds, then have meds from formulary drug codes
- Mapping from MedicationRequest->MedicationAdministration seems straight forward, medication dispense less so...
Should extra codes be stored? Ie if there is ndc and gsn, should we store them as reference? Like there is a primary code and then secondary??
Common identifiers for hosp medications
- pharmacy_id is present in pharmacy, prescriptions and emar_detail for the majority, but this would create a TON of duplicate medications if that was the source.

Potential solutions:

Create a separate medication profile for ICU, so maps to the itemid/labels (only 474 drugs)
- Then have a main medication profile that takes formulary drug codes, and skip pharmacy??

Add comments to ObservationMicroSusc and ObservationMicroTest

Add comments to Observation.note for both ObservationMicroSusc and ObservationMicroTest.

For ObservationMicroSusc, store all comments in the note.

For ObservationMicroTest, currently storing the comments in the valueString. Might be worth having in both spots? Debate this. Comments is a result when there is no result...

Export patient bundles (vs resources)

Add functionality to export patient bundles in place of just full export of resources. Currently outputting each research type and all the resources associated in one file. Exporting in a patient bundle seems to be the community push (but some bundles will be very big)

Patient bundles may need to be organized by patient and encounter? Cause an individual patient could have ~50,000 resources on their own (one patient has 42,000 chartevents).

Limit in Observation Labs SQL

Hey,
I was looking at your SQL scripts and wondering why the labs script is limited to 1000 at the end?

Confirm bundles validating codes

Check that bundles toss errors if invalid codes are present

Integrate terminology script into py_mimic_fhir

Terminology is currently generated in a jupyter notebook, move this into the py_mimic_fhir package.

Steps to do this:

Do this as its own branch after cli branch is complete
Add terminology module from code in the jupyter notebook
Add CLI options to run the terminology through
Test CLI terminology generation

Use a different variable than "HOST" in env

The $HOST variable used in the .env is actually a standard environment variable. We should use something more specific like SQLHOST or something similar.

ConceptMaps needed

This is a list of ConceptMap resources that will likely be needed in the future

Encounter
- type from MIMIC-IV admissions ontology to US Core Encounter Type
- class from MIMIC-IV admissions ontology to US Core Act Encounter Code
- hospitalization.admitSource from MIMIC-IV admissions ontology to AdmitSource
Encounter (ICU)
- type from MIMIC-IV icustays ontology to US Core Encounter Type
- type from MIMIC-IV icustays ontology to US Core Act Encounter Code
Condition
- code - in MIMIC-IV it is a mix of ICD-9 / ICD-10 from multiple years.
Medication
- code - in MIMIC-IV it is NDC/GSN/text, mapped to Medication Clinical Drug
MedicationAdministration
- method from event_text in emar and ordercategorydescription in inputevents, map to SNOMED codes in Administration Method Codes
- route from emar_detail and pharmacy could be mapped to SNOMED route codes
MedicatinAdministrationICU
- category from inputevents ordercategoryname, could map to Medication Admin Category . MIMIC provides much more granularity here so we could keep the new Codesystem versus the category would become just inpatient/outpatient effectively.
MedicationRequest
- status is converted within the MIMIC-IV to FHIR code, but should have a conceptmap as well
- route to FHIR route codes
MedicationRequest
- method is from emar event_txt, should go to SNOMED administration code valueset
- route is here too
ObservationLabs
- interpretation is from labevents flag, need to map into Observation Interpretation
ProcedureICU
- bodysite is from procedureevents location, need to map into SNOMED bodysite
- category is from procedureevents ordercategoryname, need to map into Procedure Category

Add provenance element

How to handle lab results in comments

There a quite a few results that appear to be reported in the comments. For example there are ~4 million labevents entries that have no value or valuenum but have a comments='NEG.'. This seems like a decent chunk of data that would not translate over properly to FHIR without some parsing.

Thoughts

Could pull in comments to valueString if labevents value and valuenum are missing
- Only issue is when the comments are excessive and not really a result. Works nicely if negative is the result, but not so nicely when it is a full comment
comments are currently being captured in FHIR notes , is that enough?
top offenders that could be translated first
1. NEG -> Negative (~4 million results)
2. Normal (~880,000 results)
3. Rare (~210,000)
4. None (~200,000)
confusing comments with a lot of hits, that could be translated
1. Random... don't know what this means really here (~650,000)
2. Hold -> kinda like a registered result then? (~572,000)

Test ValueSet reference and expansion

Test referencing the ValueSet in the CodeSystem, then expanding when it arrives in the FHIR Server.

Thoughts:

May need to ValueSets for generation of the implementation guide - mimic-profiles
- Test IG generation first without the ValueSets to see if this is possible
- Add reference to valuesets in codesystems with ValueSets gone and remake IG
Test ValueSet expansion with HAPI FHIR

NamingSystem for identifiers

Currently, any identifiers in mimic-fhir reference a custom URLs such as http://mimic.fhir.mit.edu/identifier/patient, but nothing is defined for that. It was brought up that a NamingSystem could make sense here. Will look into this further!

Update create_fhir_tables with fhir_etl

Add fhir_etl tables that are used to generate the main tables. These are primarily tables for required bindings ie patient gender or procedure status.

Clean HAPI Server - Deleting old data on HAPI

We need a method to cleanse the server between runs to ensure there are no hanging references.

There are a couple methods to clean the server when it is up and running:

$delete - will delete all resources of a certain type, but keeps a record of the patient (ie patient was deleted)
- $delete-cascading - can give a resource and it will delete it and any references to it, so we can pass patients and it would delete all resources related with that patient
$expunge will remove any record of patient
$delete-expunge - will delete and then expunge (seems redundant if expunging anyways)

What I've tested:

$delete - works
$delete-cascading - works but is a little slow, nice to have everything associated with a patient deleted
$expunge - Does NOT work. Posted to chat.fhir.org to see if there are any ideas. Kinda need expunge more than delete

Stopgap solutions:

Drop database that hapi is looking at. The next launch of HAPI will remake the database, but it will be more overhead when starting (normally ~5 minutes but takes ~20 minutes)

Pull in transfers as location for encounters

The mimc4fhir team pulled in mimic_core.transfers. They set the transfer information in the Encounter.

What we can do then:

Create careunit Locations, these will be referenced in the Encounters
Pull in Location history to Encounters based on time window of Encounter

Pull in demo subjects

Demo mimic-fhir store should be based on the patients found in the MIMIC-IV to OMOP mapping.

Alistair sent over list of patients that should be pulled in, so write those subjects to fhir_etl.subjects. This will affect all tables

Versioning Terminology based on MIMIC

The terminology in mimic-fhir should reference the version of MIMIC that they are based off of.

There are a couple spots to put this in CodeSystem/ValueSet:

version: this is the business version of the codesystem, but we could map this to the mimic version?
useContext.value[x]: could store the mimic version here to be the context of creating the mimic-fhir terminology
purpose: why the code system is definied, so could put the version here. But this is a markdown datatype

Catch failed bundles and write out to file

When sending bundles to HAPI FHIR catch any failed bundles and write out to file. These can then be rerun later.

To Do

Catch failed bundles
Output list of failed resources/bundles
- Output resources or bundle? I guess I need to output the patient associated with the resources
- Output error message too
Find way to rerun failed bundles
- Need a way to create bundle from a list of resources file

mimiciv_ed.edstays referenced in fhir_patient.sql is not part of mimiciv 2.2?

Hi, when running fhir_patient.sql it doesn't find mimiciv_ed.edstays, I believe this table is not part of MIMIC-IV v2.2, or am I missing something?
The script runs fine when the related code is removed (just using adm_RACE).

Test demo data validate and export

Test the 100 demo patients through the current pipeline to see timing and issues.

Steps to test

Validate patient and all related resources - send to hapi fhir
Export all resources - pull from hapi fhir
- Need to clean database before this cause it has old resources, so wouldn't be an accurate test

ICU observation warning

The ICU observations (chartevents, datetimeevents, outputevents) all have a warning column. There is no real mapping into the Observation resources for it. Is it worth creating an extension for warning? Also what does the warning even tell us?

I want to get a better idea of the warning column before converting into an extension

Server Tests

Tests to make sure HAPI server setup is working for basic resource actions. Tests:

Check a custom valueset works with $validate-code
Check a custom codeystem works with $validate-code
PUT each resource (ie Patient, Encounter etc)
- Will need to make sure there is some sequential order since most resources reference patient/encounter
Bundle transaction

Add CLI functionality to py_mimic_fhir

Add CLI functionality to run through py_mimic_fhir. I see it with two modes:

Validation

validator argument with options Java or HAPI
output argument with options: resources or bundles
env argument to point to env file? Or have default assumption probably

Terminology

Generate all the terminology through this
Just start with option to generate all (very fast so not really an issue)
Could add option to create one

Encounter References Hosp v ICU

ICU tables were converted to FHIR resources with references to only the ICU encounter, not hospital encounter. The limitation is that FHIR only allows one Encounter reference from the resources.

Current solution:

The ICU resources reference the ICU encounter
The ICU encounter references the Hosp encounter

To Do

Think if there is a better way to link ICU resources to the base Hosp encounter

Calculate largest bundle, use as benchmark in tests

Find the largest bundle in our demo dataset. This can be a benchmark to see what might break the bundling and validation.

Update micro specimen codesystem

The microbiologyevents table has codes and labels for all specimen. Only taking in the spec_type_desc right now. Should have pulled in spec_itemId. So the update would be:

New codeystem
- code: spec_itemid
- display: spec_type_desc

Currently just using spec_type_desc as the code

Look into passing bundles in parallel

Currently passing bundles in for loop. But HAPI seems to have a bit of a limit when passing, so may not be as big an issue.

Attempted to pass bundles using list comprehension but was 7-8 times slower.

Other ideas

map requests
threading

Terminology generation verification

Need to make sure terminology generation can be run repeatably when updating between versions of mimic.

Current steps to generate terminology:

Run script in sql/codeystem to get the codes
Drop the codes into a template csv
Generate CodeSystem with python script
Create valueset csv that points to CodeSystem
Generate ValueSet with python script

To Do

Confirm all codeystems have a sql script (some where simple(ie 5 codes) so may not have created)
Create any missing codesystem scripts
Create Python function to call scripts and generate CodeSystem (bypass the csv step)
Add valueset and codesystem generation functions into py_mimic_fhir (is this needed? Created notebook right now)

Experience on the Quickstart

Hey,
I ran the script and not all tables went through without issues. The problem is the mappings in fhir_etl. Only subjects and uuid_namespace were created. The others were not found.

psql:fhir_patient.sql:117: FEHLER: Relation »fhir_etl.map_gender« existiert nicht ZEILE 52: LEFT JOIN fhir_etl.map_gender mg

Validate codes against custom valuesets

The resources have bindings to custom valuesets for MIMIC. We need to verify the codes are being validated properly.

Current issue:

A custom valuesets for admission-class have been created and is bound to the Encounter class element, but validation is failing. The valueset is not being found by HAPI properly so it does not find the codes inside the valueset.
- Posted on chat.fhir and working through this
- Tested valueset on Java validator and still not passing so may be issue with format of our URL or resource

To Do:

Validate admission-class valueset against Java validator
$validate-code against admission-class valueset
Validate Encounter resource against admission-class valueset on the HAPI Server

Update terminology to match mimic names

Make sure the terminology that references mimic tables users there exact table names.

ie datetime-d-items should be datetimeevents-d-items

To update:

datetime-d-items -> datetimeevents-d-items
procedure-d-items -> procedureevents-d-items
Look for others

ValueSets/fhir_etl needed

Go through valueset bindings of the resources and find which ones can be mapped to custom valuesets and which ones need to be mapped in Postgres with fhir_etl tables.

Review lab tables for more data to pull into fhir

Review labevents and confirm that all columns we want are making it into FHIR. The initial pass I did pulled in anything that I could map to the ObservationLab profile. There may need to be extensions to fit others (ie lab urgency).

Some notes:

All status are set to 'final' in fhir currently
- status can be set to registered/preliminary/final/amended

Refactor Bundler

The Bundler runs over each of the resources to bundle and validate. The class has grown and could potentially be better organized.

General thoughts:

post_all_bundles function is clunky and could probably be rewritten to avoid repetition. A lot of it is logging and recording responses.
- Is there a way to iterate over the classes objects here?
Could pass bundle list with groupings versus having individual bundle functions for each bundle. But the individual bundle functions are very nice for testing

None of these changes are necessary for Bundler to run, just potential refactor ideas

Timing estimate for full fhir generation in postgres

Estimate the time it will take to generate all the fhir resources in Postgres. Test 100,000 generation for each resource/profile.

ICD Codes

MIMIC ICD Codes are not strictly ICD-9 or ICD-10. There are codes in MIMIC that are obsolete ICD codes across ICD versions.

Current solution

created FHIR codesystems for MIMIC ICD-9 and ICD-10

To Do

Map codes into FHIR with concept mapping
Look into a way to utilize the FHIR ICD codesystems directly with MIMIC

Canonical ValueSet reference in CodeSystem

Set the canonical ValueSet in the CodeSystem if it captures the whole codesystem.

To test this:

Update the a CodeSystem to reference the ValueSet
SUSHI Check
- Remove the ValueSet from the IG generation
- Regenerate the IG without the Valueset
HAPI Check
- Delete a ValueSet
- Post the updated CodeSystem
- Expand the valueset based on that

Test cases needed

To ensure the integrity of the data some tests need to be created:

Referential integrity
- Check any references in the scripts (ie patient, encounter, condition, medication) exist in the main table
- Encounter condition -> Condition
Check for null json strings
- Null json strings can be created when a system element is needed but the actual value element is missing
- Or references with null identifiers

kind-lab / mimic-fhir Goto Github PK

mimic-fhir's People

Contributors

Stargazers

Watchers

Forkers

mimic-fhir's Issues

Recommend Projects

Recommend Topics

Recommend Org