Giter Club home page Giter Club logo

qmarcou / computer_assisted_icd_coding_paper Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 632 KB

Companion code repository to Marcou et al. 2024, Creating a computer assisted ICD coding system: Performance metric choice and use of the ICD hierarchy

Home Page: https://doi.org/10.1016/j.jbi.2024.104617

Shell 0.20% Python 15.77% Jupyter Notebook 83.76% R 0.26%
hierarchical-learning icd-9 imbalanced-learning medication mimic-iii multitask-learning omop-cdm recommender-system

computer_assisted_icd_coding_paper's Introduction

This repository is a companion to the article Creating a computer assisted ICD coding system: Performance metric choice and use of the ICD hierarchy published in the Journal of Biomedical Informatics in March 2024.

In brief

This repository contains python/Tensorflow code to reproduce the results from the aforementionned article. Due to licensing and privacy concerns I cannot release the actual data inside the repository, nor individual model predictions. The data used are however freely available (see details below) to any researcher. For reproducibility I provide stay IDs for the different cross-validation splits, as these IDs do not convey any personal information.

The python code allows to preprocess the MIMIC-III-OMOP data to enable training on medication data using RxNorm ingredients. The keras_extra companion package implements various custom Tensorflow objects in particular the RE@R metric, hierarchical multilabel learning and label imbalance correction techniques.

Repository structure:

project
│
└───code  # contains all the python code
│   │   
│   └─── conda_env_files # contains files necessary to generate
│   │                    # reproducible conda environments via conda-lock 
│   └─── libs # contains all the custom python modules for the project and 
│   │     │   # the associated tests
│   │     └─── keras_extra # custom tensorflow/keras extensions 
│   └─── profiling # few profiling tests
│   │
│   └─── scripts # contains the jupyter notebooks and learning scripts
│   
└─── data # contains mostly placeholders to ensure directory structure, and some
|             scripts for postgresql database extraction.
│   
│   
└─── models # directory structure to host models created by
            # the learning jobs

Data

The article relies on the MIMIC-III v1.4 dataset that can be accessed on Physionet, and mounted as a PostgreSQL database. The data has then been mapped to the OMOP-CDM using ETL scripts from Paris et al. The resulting tables of interest were dumped to csv.

To further map the resulting OMOP data to RxNorm ingredients I have downloaded ontology data from the default OHDSI Athena vocabularies v5.0 06-DEC-21. As chapter and subchapter information were missing from the vocabulary, they were added using R scripts contained in this repository.

Python Scripts

I provide fully reproducible python environments for both CPU and GPU. They were generated via conda-lock in the code/conda_env_files directory (see the corresponding README).

Data preprocessing and checks is performed via the mimic-omop_checks_and_preproc and mimic-omop_hot_encoding jupyter notebooks (to be run in that order).

Neural networks hyper parameter tuning and training scripts are contained in the learning_scripts folder ( hyperSearch_sequentialNN_*.py files). The scripts should be run with that folder as working directory.

Performance analysis and more generally code to produce the figures and tables of the article is contained in the mimic-omop_sequentialNN notebook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.