Giter Club home page Giter Club logo

paat's Issues

Create time vector for different sampling frequencies

So far, the time vector can only be created for certain sampling frequencies. This is fine for the ActiGraph data, but a more generic solution would be nice.

If someone wants to fix this problem, feel free to open a pull request. For the time being, I will add a NotImplementedError to paat.io._create_time_array() and paat.io._create_time_vector(), which will refer to this issue.

Default value for rescaling after loading the gt3x file

paat.io.read_gt3x currently has the following arguments:

def read_gt3x(file, rescale=True):
    ...

However, rescaling the data needs roughly 4x as much memory as non-rescaled data which might cause problems and unexpected excessive memory usage on user-side. Setting the default to false and stating that clearly in the documentation and putting paat.io._rescale_log_data as a public function to paat.preprocessing (e.g. as paat.preprocessing.rescale) would be a good alternative.

Differences in reading GT3X files

When comparing our implementation against ActiGraph's implementation Pygt3x, we observe some differences in ~1/3 of the values. This is tested in test_against_actigraph_implementation() which fails with the following message:

________________________________ test_against_actigraph_implementation _________________________________

unscaled_data =                            X    Y   Z
2022-01-03 10:20:00.000  206  159  22
2022-01-03 10:20:00.010  206  153  22
2022...970   73  245 -14
2022-01-03 10:29:59.980   74  245 -14
2022-01-03 10:29:59.990   73  246 -13

[60000 rows x 3 columns]

    def test_against_actigraph_implementation(unscaled_data):
        with FileReader(FILE_PATH_SIMPLE) as reader:
            ref = reader.to_pandas()
    
>       npt.assert_almost_equal(unscaled_data[["X", "Y", "Z"]].values, ref[["X", "Y", "Z"]].values)
E       AssertionError: 
E       Arrays are not almost equal to 7 decimals
E       
E       Mismatched elements: 59359 / 180000 (33%)
E       Max absolute difference: 1
E       Max relative difference: 1.
E        x: array([[206, 159,  22],
E              [206, 153,  22],
E              [206, 150,  20],...
E        y: array([[206, 159,  22],
E              [206, 153,  22],
E              [206, 150,  20],...

tests/test_io.py:35: AssertionError

It is not clear to me yet, why this is the case as our implementation (from my review), seem to follow ActiGraph's specification of the GT3X file format.

Use ActiGraphs reading library

ActiGraph released an own Python package to read GT3X files, which could and should be used.

They provide a pandas DataFrame with X, Y, and Z axis as columns and the timestamps as index. This aligns with my experience of being the best data representation and should be used as the standard in this package as well (#4).

API discussion

There are some critical decisions that have to be done regarding the API. read_gt3x from gt3x returns

actigraph_acc, actigraph_time, meta_data = = gt3x.read_gt3x('path/to/gt3x/file')

I ran into the same problem in the process of developing my sleep detector. We should at some point decide how the API should look like and how, we want to represent the data while processing it. This holds particularly for the time data in actigraph_time and the meta data in meta_data.

Add test data

Try to find a test GT3X file that can be uploaded and used to test the code.

Fix warnings raised by pytest

When running pytest, several warnings are raised. Among them also many DeprecationWarning, which we should have a look at before they will break the code soon:

$ pytest
===================================== test session starts =====================================
platform linux -- Python 3.8.8, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /home/msw/Documents/paat
collected 4 items                                                                             

tests/test_io.py .                                                                      [ 25%]
tests/test_wear_time.py ...                                                             [100%]

====================================== warnings summary =======================================
../../.miniconda3/envs/paat-dev/lib/python3.8/site-packages/pip/_vendor/packaging/version.py:127: 404 warnings
  /home/msw/.miniconda3/envs/paat-dev/lib/python3.8/site-packages/pip/_vendor/packaging/version.py:127: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
    warnings.warn(

../../.miniconda3/envs/paat-dev/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22
  /home/msw/.miniconda3/envs/paat-dev/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/test_wear_time.py::test_detect_non_wear_time_syed2021
  /home/msw/.miniconda3/envs/paat-dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
    warnings.warn('`model.predict_classes()` is deprecated and '

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================= 4 passed, 406 warnings in 247.24s (0:04:07) =========================

Proof-read and harmonize doc strings

As this package so far is a result of continuous development from multiple contributors, we should at some point spend the time to proof-read all the doc strings and harmonize parameter descriptions to reduce confusion for potential users.

The following files need to be proof-read:

  • paat/io.py
  • paat/preprocessing.py
  • paat/wear_time.py

The other files do not contain significant code yet but should then follow the conventions from the files above.

Hide tensorflow progress bars

Set the verbosity of the tensorflow predictions to 0

model.predict(X, verbose=0)

to prevent the progress bars being printed to stdout.

Module for various helper functions

As pointed out in #5, there are some functions like _create_time_vector in the io module or

def calculate_cutpoints(vec):
    cutpoints = np.where(vec[:-1] - vec[1:] != 0)[0] + 1
    return cutpoints.reshape(cutpoints.shape[0]//2,2)

that are needed for different functions in paat.io or paat.wear_time, but are also useful for other purposes. We might want to organize these functions in a new module at some point. This issue is opened to keep that in mind

Activity vector not created correctly

paat.create_activity_column() creates a vector with only 3 char long entries. This is likely due to the initialization of the vector and should be probably fixed to the longest given column name

Wear time API

Currently, three non-wear time algorithms are implemented. However, they have different return values which is not intuitive.

Shaheen's algorithm returns both nw_vector and nw_data

nw_vector, nw_data = wear_time.detect_non_wear_time_syed2021(acceleration, hz=meta['Sample_Rate'])

while Vincent's algorithm as well as the naive baseline algorithm return just nw_vector:

nw_vector = wear_time.detect_non_wear_time_hees2013(acceleration, hz=meta['Sample_Rate'])
nw_vector = wear_time.detect_non_wear_time_naive(acceleration, hz=meta['Sample_Rate'])

I think here, we should decide to either return just nw_vector (which should be a n_samples x 1 vector indicating whether the device was worn at the corresponding time step) or both, nw_vector and nw_data, with nw_data being a list of list with start and end sample of the non wear time period.

I haven't checked yet, but Shaheen also mentioned that the labeling of nw_vector might be inconsistent between the function. If this is the case, this should also be fixed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.