Giter Club home page Giter Club logo

dl-healthcare-final-project's Introduction

Representation Learning of Electronic Healthcare Records for Downstream Prediction Tasks: Comparing Deep Techniques

This repository contains code and explanations for the comparison of a phenotype prediction task using three different representations of Electronic Health data, all created with deep learning techniques.

The baseline comes from [1], a study proposing four predictive tasks as benchmarks for comparing different learned representations of structured EHR data. I compared the results of their phenotype prediction task on a similar task for two other EHR representations: visit representations produced by Med2Vec [2] and patient representations produced by ConvAE [3].

The Benchmark study uses variants of LSTM to create patient representations that retain temporal information, and includes not only demographics and diagnoses, but items from chart and lab events as well. Their phenotype prediction task-specific dataset predicts for the presence or absence of 25 selected care conditions (groups of related diagnoses) in the final sequence for a patient, given all previous sequences for that patient.

Med2Vec uses a multi-layer perceptron and Skip-Gram to learn interpretable dense vectors of patient visits, represented as demographic data and co-occurring diagnosis codes within temporally-ordered sequences of visits per patient. Their initial phenotype prediction task consists of predicting (grouped) diagnoses of a visit given preceding visit codes within a specified window of time.

ConvAE uses Convolutional Neural Networks and Stacked AutoEncoders on structured EHR data, and NLP techniques on unstructured clinical notes to create general-purpose dense-vectors of temporally-ordered patient information. The original study doesn't present a phenotype prediction task; however, they claim that their all-purpose representations of patients can be used for such a task.

The repository is split into one directory per study, each of which contains my own code (some of which is code that I corrected from publicly available repositories for the studies) and notes on how to reproduce the results. The common_resources directory contains mappings used by all the models.

This repo is part of my final project for "Deep Learning for Healthcare" for the Spring 2021 semester of my Masters degree in Data Science at the University of Illinois Urbana-Champagne.

References

[1] Harutyunyan, H., Khachatrian, H., Kale, D., Ver Steeg, G. & Galstyan, A.
Multitask learning and benchmarking with clinical time series data
https://www.nature.com/articles/s41597-019-0103-9

[2] Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Jimeng Sun
Multi-layer Representation Learning for Medical Concepts
https://arxiv.org/pdf/1602.05568.pdf

[3] Landi, I., Glicksberg, B. S., Lee, H. C., Cherng, S., Landi, G., Danieletto, M., Dudley, J. T., Furlanello, C., & Miotto, R.
Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 96 (2020).
https://www.nature.com/articles/s41746-020-0301-z

dl-healthcare-final-project's People

Contributors

jenka13all avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.