trybnetic / paat Goto Github PK

The physical activity analysis toolbox (PAAT) is a comprehensive toolbox to analyse raw acceleration data.

License: MIT License

Python 100.00%

physical-activity actigraphy actigraph medical-informatics health-science accelerometry

paat's Issues

Create time vector for different sampling frequencies

So far, the time vector can only be created for certain sampling frequencies. This is fine for the ActiGraph data, but a more generic solution would be nice.

If someone wants to fix this problem, feel free to open a pull request. For the time being, I will add a NotImplementedError to paat.io._create_time_array() and paat.io._create_time_vector(), which will refer to this issue.

Implement loading data from hdf5

Instead of loading data from the gt3x files, data should also be loadable from hdf5 files.

Default value for rescaling after loading the gt3x file

paat.io.read_gt3x currently has the following arguments:

def read_gt3x(file, rescale=True):
    ...

However, rescaling the data needs roughly 4x as much memory as non-rescaled data which might cause problems and unexpected excessive memory usage on user-side. Setting the default to false and stating that clearly in the documentation and putting paat.io._rescale_log_data as a public function to paat.preprocessing (e.g. as paat.preprocessing.rescale) would be a good alternative.

Differences in reading GT3X files

When comparing our implementation against ActiGraph's implementation Pygt3x, we observe some differences in ~1/3 of the values. This is tested in test_against_actigraph_implementation() which fails with the following message:

________________________________ test_against_actigraph_implementation _________________________________

unscaled_data =                            X    Y   Z
2022-01-03 10:20:00.000  206  159  22
2022-01-03 10:20:00.010  206  153  22
2022...970   73  245 -14
2022-01-03 10:29:59.980   74  245 -14
2022-01-03 10:29:59.990   73  246 -13

[60000 rows x 3 columns]

    def test_against_actigraph_implementation(unscaled_data):
        with FileReader(FILE_PATH_SIMPLE) as reader:
            ref = reader.to_pandas()
    
>       npt.assert_almost_equal(unscaled_data[["X", "Y", "Z"]].values, ref[["X", "Y", "Z"]].values)
E       AssertionError: 
E       Arrays are not almost equal to 7 decimals
E       
E       Mismatched elements: 59359 / 180000 (33%)
E       Max absolute difference: 1
E       Max relative difference: 1.
E        x: array([[206, 159,  22],
E              [206, 153,  22],
E              [206, 150,  20],...
E        y: array([[206, 159,  22],
E              [206, 153,  22],
E              [206, 150,  20],...

tests/test_io.py:35: AssertionError

It is not clear to me yet, why this is the case as our implementation (from my review), seem to follow ActiGraph's specification of the GT3X file format.

Use ActiGraphs reading library

ActiGraph released an own Python package to read GT3X files, which could and should be used.

They provide a pandas DataFrame with X, Y, and Z axis as columns and the timestamps as index. This aligns with my experience of being the best data representation and should be used as the standard in this package as well (#4).

API discussion

There are some critical decisions that have to be done regarding the API. read_gt3x from gt3x returns

actigraph_acc, actigraph_time, meta_data = = gt3x.read_gt3x('path/to/gt3x/file')

I ran into the same problem in the process of developing my sleep detector. We should at some point decide how the API should look like and how, we want to represent the data while processing it. This holds particularly for the time data in actigraph_time and the meta data in meta_data.

Add test on creating ActiGraph counts

We should test that the ActiGraph counts that are produced match with the ones ActiLife calculates.

Add test data

Try to find a test GT3X file that can be uploaded and used to test the code.

Fix warnings raised by pytest

When running pytest, several warnings are raised. Among them also many DeprecationWarning, which we should have a look at before they will break the code soon:

$ pytest
===================================== test session starts =====================================
platform linux -- Python 3.8.8, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /home/msw/Documents/paat
collected 4 items                                                                             

tests/test_io.py .                                                                      [ 25%]
tests/test_wear_time.py ...                                                             [100%]

====================================== warnings summary =======================================
../../.miniconda3/envs/paat-dev/lib/python3.8/site-packages/pip/_vendor/packaging/version.py:127: 404 warnings
  /home/msw/.miniconda3/envs/paat-dev/lib/python3.8/site-packages/pip/_vendor/packaging/version.py:127: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
    warnings.warn(

../../.miniconda3/envs/paat-dev/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22
  /home/msw/.miniconda3/envs/paat-dev/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/test_wear_time.py::test_detect_non_wear_time_syed2021
  /home/msw/.miniconda3/envs/paat-dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
    warnings.warn('`model.predict_classes()` is deprecated and '

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================= 4 passed, 406 warnings in 247.24s (0:04:07) =========================

Proof-read and harmonize doc strings

As this package so far is a result of continuous development from multiple contributors, we should at some point spend the time to proof-read all the doc strings and harmonize parameter descriptions to reduce confusion for potential users.

The following files need to be proof-read:

paat/io.py
paat/preprocessing.py
paat/wear_time.py

The other files do not contain significant code yet but should then follow the conventions from the files above.

Rescale data while reading from hdf5

Add flag to rescale the acceleration data on load for paat.io.load_dset() like paat.io.read_gt3x() to enable rescaling the data on loading.

Hide tensorflow progress bars

Set the verbosity of the tensorflow predictions to 0

model.predict(X, verbose=0)

to prevent the progress bars being printed to stdout.

Improve code quality with pylint and pycodestyle

Use pylint and pycodestyle to make the code consistent.

Module for various helper functions

As pointed out in #5, there are some functions like _create_time_vector in the io module or

def calculate_cutpoints(vec):
    cutpoints = np.where(vec[:-1] - vec[1:] != 0)[0] + 1
    return cutpoints.reshape(cutpoints.shape[0]//2,2)

that are needed for different functions in paat.io or paat.wear_time, but are also useful for other purposes. We might want to organize these functions in a new module at some point. This issue is opened to keep that in mind

Activity vector not created correctly

paat.create_activity_column() creates a vector with only 3 char long entries. This is likely due to the initialization of the vector and should be probably fixed to the longest given column name

Wear time API

Currently, three non-wear time algorithms are implemented. However, they have different return values which is not intuitive.

Shaheen's algorithm returns both nw_vector and nw_data

nw_vector, nw_data = wear_time.detect_non_wear_time_syed2021(acceleration, hz=meta['Sample_Rate'])

while Vincent's algorithm as well as the naive baseline algorithm return just nw_vector:

nw_vector = wear_time.detect_non_wear_time_hees2013(acceleration, hz=meta['Sample_Rate'])
nw_vector = wear_time.detect_non_wear_time_naive(acceleration, hz=meta['Sample_Rate'])

I think here, we should decide to either return just nw_vector (which should be a n_samples x 1 vector indicating whether the device was worn at the corresponding time step) or both, nw_vector and nw_data, with nw_data being a list of list with start and end sample of the non wear time period.

I haven't checked yet, but Shaheen also mentioned that the labeling of nw_vector might be inconsistent between the function. If this is the case, this should also be fixed.

trybnetic / paat Goto Github PK

paat's Issues

Create time vector for different sampling frequencies

Implement loading data from hdf5

Default value for rescaling after loading the gt3x file

Differences in reading GT3X files

Use ActiGraphs reading library

API discussion

Add test on creating ActiGraph counts

Add test data

Fix warnings raised by pytest

Proof-read and harmonize doc strings

Rescale data while reading from hdf5

Hide tensorflow progress bars

Improve code quality with pylint and pycodestyle

Module for various helper functions

Activity vector not created correctly

Wear time API

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent