Investigating the association between Fitbit wearable data and self-reported measures of life satisfaction
This repository contains the code base for our MED-264 group's Final Project.
The data is part of All of Us Registered Tier Dataset (version 7). The notebooks were developed on Python 3.7
and All of Us Jupyter Notebook environment.
- data_collection.ipynb - This notebook extracts the data from All of Us dataset using
GoogleBigQuery
query and saves it to the persistent disk on the created workspace. - data_preprocessing.ipynb - In this notebook, the saved dataframes are read and upon observing missingness, the feature list is filtered.
- data_cleaning.ipynb - In this notebook, the missing data for each feature is imputed with the patient level mean.
- data_splitting.ipynb - In this notebook, the dataset is split into train and test after feature engineering. The split ensures that there is no leakage of patient level data on train and test sets.
- model_building.ipynb - Traditional machine learning models such as Logistic Regression, Decision Tree Classifier, Random Forest Classifier, and XGBoost Classifier are chosen to perform both multi-class and binary class classification tasks. The results of these are available in the notebooks.
- data_correlation_and_statistics.ipynb - General statistics about the population and correlation among features is captured in this notebook.
- python_ordinal_regression.ipynb - Ordinal Regression regression is carried out to observe the odd ratios and 95% confidence intervals. Furthermore, the statistical significance (p-values) is reported in this notebook.
- assets/ - Contains all the illustrations derived from our study.
We would like thank Dr. Tsung-Ting Kuo (instructor) for arranging lectures with various other lecturers for our sessions. We would like to also thank the TAs of this course, Grace Yufei Yu and Aaron Boussina.