ojwalch / sleep_classifiers Goto Github PK

View Code? Open in Web Editor NEW

174.0 174.0 81.0 156 KB

Classify sleep from heart rate and acceleration via Apple Watch

Python 89.35% MATLAB 10.61% M 0.04%

sleep_classifiers's People

Contributors

Stargazers

Watchers

Forkers

ky66 merillium danielsaaf jmscraig mukami12 ericgadbin iqram20 tpscrpt gaoqunxia drenther kellischeuble ruikewang ericcanton chetanpawar93 hedyehkh tonymou comeondatamining mehrtashm93 ayahabusara liamcarroll melih-durmaz stillwaterrunsdeep moen-chishti marta18a aperezho mpardo98 nothing2say-dev georgiach96 tvk66866 piranitagomez gpubrr042 goncaloxyz daeunni krisak86 tschaff2 jiaoc1 chiarosun mbarandas mupanyue kaminkim econwang mengobi hackerg7 me912 bmarzban adityap2202 lauralatorrem hao199509 ubaidh ejohnmedina park-hy-00 edsml-wl123 lyriotiago shuentang ohmp-appinventiv syfantid tolism philippe-heitzmann lucian-f likith012 peakszhang uzairbcm faisal-fida young-won srijan2-0-2-3 malik-khan1 gayalkuruppu creatorsean 7hestral ly-gif plagosh nahariaprateek sandy4321 alina-zhi haider213 willowarana dalradhan unolop akhilmosali

sleep_classifiers's Issues

What is sleep phase 4

Expected behavior:

Each line in this file has the format: date (in seconds since PSG start) stage (0-5, wake = 0, N1 = 1, N2 = 2, N3 = 3, REM = 5)

Actual behavior:
some phases are marked as "4" in 5383425 (which seems to be another form of deep sleep)

About the dataset

I dont not understand what is date (in seconds since PSG start)，could you tell me how to correlation with the labels data, looking forward to you reply.

FileNotFoundError

Great works Olivia.
I would like to replicate the result from the paper you co-authored. However, Python compiler returned an error
FileNotFoundError

Apparently, this error due to insufficient motion data from the data set downloaded from this link

Specifically, the following ids are missing from the motion folder.

9961348 , 8258170, 9106476, 8686948, 8530312,, 9618981, 8692923, 844359

May I know whether this deliberate or due to unforeseen error?

Appreciate the time taken entertaining this thread

Dear ojwalch
I'm a Python beginners, your code is too complex to understand for me, I don't know how you preprocessing your data, In my before study, them only is a line feature for a label, I don't have any experience on this time series data, I'm puzzled how you can make HR and motion data to correlate with sleep labels, Looking for your reply.

Specify MESA subsample in README

Dear Olivia,

I have been granted access to the MESA dataset by NSSR. The paper references a subsample of 188 subjects. I was wondering if you could specify which IDs you used for your subsample. This would improve my comparison with your paper, and would save me and potential others time for finding a good subsample. Maybe, you could include these in the README of this project?

sleep_classifiers/source/mesa/metadata_service.py

Lines 12 to 14 in 375f50e

 def get_all_files(): 

 project_root = str(utils.get_project_root()) 

 return glob.glob(project_root + "/data/mesa/polysomnography/edfs/*edf")

sleep_classifiers/source/mesa/mesa_data_service.py

Lines 9 to 17 in 375f50e

 all_files = MetadataService.get_all_files() 

 all_subjects = [] 

 for file in all_files: 

 file_id = file[-8:-4] 

 subject = MesaSubjectBuilder.build(file_id) 

 if subject is not None: 

 all_subjects.append(subject) 

 return all_subjects

Kind regards,
Yorick

Variable mixup in Curve Performance Builder

Hi Olivia,

I might be mistaken, as the build_three_class_roc_with_binary_search is a bit of a tough read, but I think there is a mistake in this code:

sleep_classifiers/source/analysis/performance/curve_performance_builder.py

Lines 261 to 265 in 375f50e

 rem_roc_performance = ROCPerformance(false_positive_rates=goal_fraction_wake_scored_as_sleep_spread, 

 true_positive_rates=cumulative_nrem_accuracies) 

 nrem_roc_performance = ROCPerformance(false_positive_rates=goal_fraction_wake_scored_as_sleep_spread, 

 true_positive_rates=cumulative_rem_accuracies)

rem_roc_performance uses cumulative_nrem_accuracies, and vice versa. Should that be correct?

(I'm working on a CNN for this dataset, and I wanted to compare with your performance)

File not found

Hi,Olivia!
I did not find the file named "AppleWatchSubjects".Is it necessary?Where can I find it ?
And I am confused about the code. Your code is very useful but I am pretty new to writing code. Could you tell me which file I need to run in the folder? Is it archive? What files do I need to modify?
I will be very grateful if you could help me.

_clock_proxy.txt file not found

Hi Olivia! Sorry to bother you again.
I was running preprocessing_runner.py and ran into data_plot_builder.py, in row 73, I found I don't have the "_clock_proxy.txt" file so that the DataPlotBuilder is not plotting any figures. I'm working on validating your paper, but I'm quite new on python, so I'm experiencing many problems.

In addition, when I tried to replace ActivityCountService.build_activity_counts() with ActivityCountService.build_activity_counts_without_matlab(subject, data), I don't know what argument I should input for "data". (I don't know how to change the path for my matlab so I'm trying to use the python without matlab)

when I was running analysis_runner.py, pycharm told me "3509524_circadian_feature.out" file not found. I went back to check and found in the folder "C:\Users\SSSyy\PycharmProjects\sleep_classifiers\outputs\features" I don't have files for circadian except cosine/count/hr/psg/time_feature.out these four type of files.

Thank you in advance!
Best,
Yue

quickstart guide

Is there available anything like a quickstart guide to help us bring this up?

Thanks!

deleted

How to interpret the accelerometer data?

Thanks for putting together this dataset! I'm considering using it for a homework assignment in my applied statistics class. I have a basic question about the acceleration data. I must be missing something obvious...

Looking at fig. 1 of your paper I see long stretches where the acceleration is non-zero. How should that be interpreted? The watch can't be undergoing constant acceleration because it would fly off to outer space, right? Is this some quirk of the MEMS accelerometer, or is there some counteracting force that isn't reported in these measurements?

Issue getting figures

Hello I am able to run preprocessing_runner.py, but when running analysis_runner.py to generate the figures I get a runtime error from the classifier_service.py run_in_parallel method in regards to the use of pool. It says:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Please lmk if this is something easily fixed.

sh: 1: /Applications/MATLAB_R2019a.app/bin/matlab: not found

I ran preprocessing_runner.py
But an error occurred.
The error message says that there is no MATLAB_R2019a.
Do I need to install MATLAB?
The comment says that the MATLAB code is replaced with Python code.
I use Ubuntu 18.04 and Python 3.7.3.

sh: 1: /Applications/MATLAB_R2019a.app/bin/matlab: not found sh: 1: /Applications/MATLAB_R2019a.app/bin/matlab: not found Traceback (most recent call last): File "preprocessing_runner.py", line 31, in <module> run_preprocessing(subject_ids) File "preprocessing_runner.py", line 24, in run_preprocessing FeatureBuilder.build(str(subject)) File "/home/wapeul/sleepstage/sleep_classifiers/source/preprocessing/feature_builder.py", line 21, in build FeatureBuilder.build_from_time(subject_id, valid_epochs) File "/home/wapeul/sleepstage/sleep_classifiers/source/preprocessing/feature_builder.py", line 43, in build_from_time TimeBasedFeatureService.write_circadian_model(subject_id, circadian_feature) File "/home/wapeul/sleepstage/sleep_classifiers/source/preprocessing/time/time_based_feature_service.py", line 37, in write_circadian_model np.savetxt(feature_path, feature, fmt='%f') File "/home/wapeul/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1377, in savetxt "Expected 1D or 2D array, got %dD array instead" % X.ndim) ValueError: Expected 1D or 2D array, got 0D array instead

Runtime error with test data.

I imported the linked data into the required data directories, ran preprocessing successfully, then hit an error when running the analysis portion I get the following stacktrace:

Any hints on debugging this further?

>python source/analysis/analysis_runner.py
Running Random Forest...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/pope/miniconda3/envs/venv/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/pope/miniconda3/envs/venv/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_service.py", line 73, in run_single_data_split_sw
    return ClassifierService.run_single_data_split(training_x, training_y, testing_x, testing_y,
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_service.py", line 92, in run_single_data_split
    classifier = ClassifierService.train_classifier(training_x, training_y, attributed_classifier, scoring)
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_service.py", line 106, in train_classifier
    classifier.class_weight = ClassifierService.get_class_weights(training_y)
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_service.py", line 114, in get_class_weights
    class_weights = class_weight.compute_class_weight('balanced',
  File "/Users/pope/miniconda3/envs/venv/lib/python3.9/site-packages/sklearn/utils/class_weight.py", line 53, in compute_class_weight
    weight = recip_freq[le.transform(classes)]
IndexError: arrays used as indices must be of integer (or boolean) type
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/analysis_runner.py", line 186, in <module>
    figure_leave_one_out_roc_and_pr()
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/analysis_runner.py", line 53, in figure_leave_one_out_roc_and_pr
    classifier_summary = SleepWakeClassifierSummaryBuilder.build_leave_one_out(attributed_classifier, feature_sets)
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_summary_builder.py", line 33, in build_leave_one_out
    return SleepWakeClassifierSummaryBuilder.run_feature_sets(data_splits, subject_dictionary,
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_summary_builder.py", line 42, in run_feature_sets
    raw_performance_results = ClassifierService.run_sw(data_splits, attributed_classifier,
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_service.py", line 18, in run_sw
    return ClassifierService.run_in_parallel(ClassifierService.run_single_data_split_sw,
  File "/Users/pope/Projects/gadgetbridge_analysis/sleep_classifiers/source/analysis/classification/classifier_service.py", line 59, in run_in_parallel
    results = pool.map(single_run_wrapper, data_splits)
  File "/Users/pope/miniconda3/envs/venv/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Users/pope/miniconda3/envs/venv/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value

the random seed for parameter selection and step data

Hi,
I am really impressed by your paper. When I tried to reproduce your results with your codes and dataset, the results looks different from your results in paper eg. Figure4. Can I assume this difference is caused by randomness in hyperparameter-selection? If so, what is the random seed you used that I can exactly reproduce your results?
Best

Should the motion data be rearranged?

The first ''column'', which recorded the measured time, of these data has negative diff values. Should they be sorted by this ''column'' when using?

question about the steps in predicting circadian clock

In your 'clock proxy' and time-based feature part, you used the steps to estimate the light. Can you provide me more detail things about how did you use steps data to predict the circadian clock? For example, what is the threshold value you set to say there is light? In your paper, you said 'if steps were above a threshold, the light was assumed to be one of three levels. Did you use this circadian data in training your models?

	def get_all_files():
	project_root = str(utils.get_project_root())
	return glob.glob(project_root + "/data/mesa/polysomnography/edfs/*edf")

	all_files = MetadataService.get_all_files()
	all_subjects = []
	for file in all_files:
	file_id = file[-8:-4]
	subject = MesaSubjectBuilder.build(file_id)
	if subject is not None:
	all_subjects.append(subject)

	return all_subjects

	rem_roc_performance = ROCPerformance(false_positive_rates=goal_fraction_wake_scored_as_sleep_spread,
	true_positive_rates=cumulative_nrem_accuracies)

	nrem_roc_performance = ROCPerformance(false_positive_rates=goal_fraction_wake_scored_as_sleep_spread,
	true_positive_rates=cumulative_rem_accuracies)