This project preprocesses the OpenAPS Data Commons dataset into .csv files that are time series friendly. At the moment the code is only preprocessing OpenAPS data. The OpenAPS Data Commons also includes Loop and AndroidAPS data.
It implements the following pattern finding techniques on this dataset:
- Statistics - see example Confidence Intervals, Violin Plots, Box plots & Heatmap Notebook
- K-means clustering - see example K-means Notebook
- Matrix Profile - see example Matrix Profile Notebook
- Agglomerative Clustering - see example Agglomerative Clustering Notebook
If you want to work with the OpenAPS Data Commons you need to apply and get a copy of that data set. See OpenAPS Data Commons on how to do that. You can then use the preprocessing scripts in this project to safe you a lot of work.
This project is the source code for the study published in paper. This code is being regularly updated and the version used for the paper was Tag NeurIPS22_ts4h If you use any of this code please cite this paper as following in your work:
BibTeX:
@article{Degen2022,
author = {Isabella Degen and Zahraa S. Abdallah},
doi = {10.48550/arxiv.2211.07393},
month = {11},
title = {Temporal patterns in insulin needs for Type 1 diabetes},
url = {https://arxiv.org/abs/2211.07393v1},
year = {2022},
}
Formatted citation:
.
└── insulin-need/
├── data (AUTO CREATED. DO NOT CHECK IN!)/
│ └── perid/
│ ├── p1
│ ├── p2
│ └── ...
├── src/
│ ├── scripts
│ └── <various py files>
├── examples/ -> contains example Jupyter notebooks that show how to use the code
├── tests
├── README.md
├── conda.yml -> use to create Python env
├── requirements.txt -> refer to if you have version problems
└── private-yaml-template.yaml -> local config - see instructions
Conda was used as Python environment. You can use the following commands to setup a conda env with all the required dependencies:
- Create conda env
conda env create -f conda.yml
andconda activate tmp-22
- If you add new dependencies to the conda.yml file you can update the
env
conda env update -n tmp-22 --file conda.yml --prune
Notes:
- The code was run and tested on a Mac x86_64 and on a Mac arm64 (M1 mac)
- The conda.yml file uses the latest versions of all dependencies, fixing only Python to 3.9 and pandas
to 1.4.
If the test don't run for you, and you think it's dependency related, you can compare your versions
conda list
with the versions originally used requirements.txt (osx_64) or requirements-m1.txt (osx_arm64)
The code was developed, tested and run using the PyCharm Professional IDE. This documentation assumes that you run the scripts and tests with that IDE. This should work in the free Community Edition of PyCharm.
You need to configure where your downloaded OpenAPS Data Commons dataset zip files are:
- Rename the private-yaml-template.yaml to
private.yaml
- Change the
openAPS_data_path
property in the file to the string of the absolute path to the folder where your copy of the OpenAPS data is
Note: do not check the data into git!
To run most of the code you require the OpenAPS Commons dataset. You can request access from the OpenAPS website.
Once you have the data and have provided the path in the private.yaml
configuration file
you're ready to generate the preprocessed versions of the original zip file.
You have two options. Most code assumes you have the files for 1 created:
- leave the
flat_file: FALSE
in theprivate.yaml
file and a preprocessed version per id gets created in a folder calleddata/perid
- change the
flat_file: TRUE
in theprivate.yaml
file and a preprocessed version containing all id gets created
To create the preprocessed data files run the following scripts depending on what you need:
- Write preprocessed irregular raw OpenAPS data
files src/scripts/write_processed_device_status_file.py creates
per id or for all ids
irregular_iob_cob_bg.csv
of IOB, COB and BG data - Write preprocessed and hourly & daily down-sampled OpenAPS data
files src/scripts/write_processed_device_status_file.py creates
per id or for all ids
hourly_iob_cob_bg.csv
anddaily_iob_cob_bg.csv
. IOB, COB and BG are aggregated using mean, max, min and std. - CGM Data for all systems src/scripts/write_blood_glucose_df.py:
creates
bg_df.csv
per id or for all ids
The output of script 1 and 2 are prerequisites for all methods in this project. The preprocessing done in script 1. transforms the timestamps to uniform UTC timestamps, drops records with no timestamp and removes duplicates. The scripts at the moment only read the OpenAPS system files (not Loop or AndroidAPS, just yet).
Older scripts:
- src/scripts/write_device_status_df_dedubed.py crates per id or for
all ids
device_status_dedubed.csv
- src/scripts/write_device_status_df.py
These scripts read the columns configured in the device_status_col_type
property in the
configurations.py file of the OpenAPS system. Given not all columns are read there are be
duplicated entries.write_device_status_df_dedubed.py
removes those duplicated entries!
irregular_iob_cob_bg.csv: system, id, datetime, iob, bg, cob
hourly_iob_cob_bg.csv & daily_iob_cob_bg.csv: datetime, id, system, iob mean, cob mean, bg mean, iob min, cob min, bg min, iob max, cob max, bg max, iob std, cob std, bg std, iob count, cob count, bg count
bg_df.csv id, time, bg
device_status_dedubed.csv
id, pump/status/status, pump/status/timestamp, pump/status/suspended, pump/status/bolusing, pump/clock, device , created_at, openaps/enacted/duration, openaps/enacted/IOB, openaps/enacted/rate, openaps/enacted/COB , openaps/enacted/eventualBG, openaps/enacted/reason, openaps/enacted/bg, openaps/enacted/timestamp, openaps/iob/iob , openaps/iob/activity, openaps/iob/timestamp, openaps/iob/basaliob, openaps/iob/netbasalinsulin , openaps/iob/lastTemp/rate, openaps/iob/bolusinsulin, openaps/iob/lastBolusTime, openaps/enacted/sensitivityRatio , openaps/enacted/insulinReq, openaps/enacted/deliverAt, openaps/enacted/units
Once you have the data generated with the scripts above you should be able to successfully run the tests. If they pass your environment is correctly setup and you have all the data that you need for the methods.
Note some tests use real data and are automatically ignored anywhere where the data files/path are not available. Pay attention as the methods don't work without the proper files! Some tests are by default ignored because they take a really long time to run. You can run them manually if needed.
There are example notebooks available that show how to use the code. They include many examples on how to read the data files and how you can use the differently sampled files shaped into different length time series.