mcyeh / mstamp Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 25.0 38 KB

MATLAB 66.36% Python 33.64%

mstamp's People

Stargazers

Watchers

mstamp's Issues

About input data

Hi,

I am wondering input for mstramp or mstomp function has to be z-normalized. I tested with some un-normalized data and it seemed that we have to z-normalized data since a warning will be thrown for un-normalized data:
mstamp_stomp.py:127: RuntimeWarning: invalid value encountered in sqrt matrix_profile = np.sqrt(matrix_profile)

Am I right?

Thanks.

Bug in cumsum

In mstamp_stomp.py, you have written:

dist_profile_dim = np.argsort(dist_profile, axis=0)
dist_profile_sort = np.sort(dist_profile, axis=0)
dist_profile_cumsum = np.zeros(sub_num)
for j in range(n_dim):
    dist_profile_cumsum += dist_profile_sort[j, :]
    dist_profile_mean = dist_profile_cumsum / (j + 1)

However, dist_profile_sort are actually squared distances! Thus, I believe that your cumsum is wrong or, at least, inconsistent from your paper. For example, if dist_profile_sort was:

[[1, 1, 1],
 [1, 4, 9]]

Then, in the for loop when j = 0:

dist_profile_cumsum = [1, 1, 1]
dist_profile_mean = [1, 1, 1]

When j = 1:

dist_profile_cumsum = [2, 5, 10]
dist_profile_mean = [1, 2.5, 5]

One would think that the dist_profile_mean (which is still a squared distance) could be evaluated by taking the square root and you would get:

[1, 1.5811, 2.236]

However, when we compare this with using straight distances (and not squared distances), we see that the above is not correct when j = 1:

dist_profile_cumsum = [2, 3, 4]
dist_profile_mean = [1, 1.5, 2]

Notice that [1, 1.5811, 2.236] is different from [1, 1.5, 2]! To be more precise, the problem is that you've computed the mean of the squared distances and then, later, you take the square root. This is different from mean of the summed distances.

The should be:

dist_profile = np.sqrt(dist_profile)  # Added this line 

dist_profile_dim = np.argsort(dist_profile, axis=0)
dist_profile_sort = np.sort(dist_profile, axis=0)
dist_profile_cumsum = np.zeros(sub_num)
for j in range(n_dim):
    dist_profile_cumsum += dist_profile_sort[j, :]
    dist_profile_mean = dist_profile_cumsum / (j + 1)
    update_pos = dist_profile_mean < matrix_profile[j, :]
    profile_index[j, update_pos] = i
    matrix_profile[j, update_pos] = dist_profile_mean[update_pos]
    if return_dimension:
        profile_dimension[j][:, update_pos] = \
            dist_profile_dim[:j + 1, update_pos]

# matrix_profile = np.sqrt(matrix_profile)  # Removed this line

License

Hello,

I was wondering if you would kindly put an open-source license to this repository. This will enable others to make use of your matrix-profile algorithm and software.

Thank you,
Deepak George Thomas

`que_sig` discarded while looping over dimensions

In the python implementation of the stamp algorithm, I noticed that while looping and computing _mass over dimensions, only the que_sig over the last dimension is used on line 90:

         for j in range(n_dim):
            que = seq[j, i:i + sub_len]
            dist_profile[j, :], que_sig = _mass(
                seq_freq[j, :], que, seq_len, sub_len,
                seq_mu[j, :], seq_sig[j, :])

        if skip_loc[i] or np.any(que_sig < _EPS):
            continue

https://github.com/mcyeh/mstamp/blob/master/Python/mstamp_stamp.py lines 84 - 91

Is this discarding intentional, or is just that the current implementation is not using it any further?

Misaligned Exclusion Zones

There is an off-by-one error when calculating the exclusion zones. The code should read:
exc_zone_ed = min(sub_num, i + exc_zone + 1)

mstamp/Python/mstamp_stamp.py

Line 94 in b409615

exc_zone_ed = min(sub_num, i + exc_zone)

mstamp/Python/mstamp_stomp.py

Line 122 in b409615

exc_zone_ed = min(sub_num, i + exc_zone)

will it work for multivariate time series ?

great code thanks
may you still clarify :
will it work for multivariate time series .
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values

color weight gender height age
1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32

mcyeh / mstamp Goto Github PK

mstamp's People

Stargazers

Watchers

Forkers

mstamp's Issues

About input data

Bug in cumsum

License

`que_sig` discarded while looping over dimensions

Misaligned Exclusion Zones

will it work for multivariate time series ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent