Giter Club home page Giter Club logo

mstamp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mstamp's Issues

About input data

Hi,

I am wondering input for mstramp or mstomp function has to be z-normalized. I tested with some un-normalized data and it seemed that we have to z-normalized data since a warning will be thrown for un-normalized data:
mstamp_stomp.py:127: RuntimeWarning: invalid value encountered in sqrt matrix_profile = np.sqrt(matrix_profile)

Am I right?

Thanks.

Bug in cumsum

In mstamp_stomp.py, you have written:

dist_profile_dim = np.argsort(dist_profile, axis=0)
dist_profile_sort = np.sort(dist_profile, axis=0)
dist_profile_cumsum = np.zeros(sub_num)
for j in range(n_dim):
    dist_profile_cumsum += dist_profile_sort[j, :]
    dist_profile_mean = dist_profile_cumsum / (j + 1)

However, dist_profile_sort are actually squared distances! Thus, I believe that your cumsum is wrong or, at least, inconsistent from your paper. For example, if dist_profile_sort was:

[[1, 1, 1],
 [1, 4, 9]]

Then, in the for loop when j = 0:

dist_profile_cumsum = [1, 1, 1]
dist_profile_mean = [1, 1, 1]

When j = 1:

dist_profile_cumsum = [2, 5, 10]
dist_profile_mean = [1, 2.5, 5]

One would think that the dist_profile_mean (which is still a squared distance) could be evaluated by taking the square root and you would get:

[1, 1.5811, 2.236]

However, when we compare this with using straight distances (and not squared distances), we see that the above is not correct when j = 1:

dist_profile_cumsum = [2, 3, 4]
dist_profile_mean = [1, 1.5, 2]

Notice that [1, 1.5811, 2.236] is different from [1, 1.5, 2]! To be more precise, the problem is that you've computed the mean of the squared distances and then, later, you take the square root. This is different from mean of the summed distances.

The should be:

dist_profile = np.sqrt(dist_profile)  # Added this line 

dist_profile_dim = np.argsort(dist_profile, axis=0)
dist_profile_sort = np.sort(dist_profile, axis=0)
dist_profile_cumsum = np.zeros(sub_num)
for j in range(n_dim):
    dist_profile_cumsum += dist_profile_sort[j, :]
    dist_profile_mean = dist_profile_cumsum / (j + 1)
    update_pos = dist_profile_mean < matrix_profile[j, :]
    profile_index[j, update_pos] = i
    matrix_profile[j, update_pos] = dist_profile_mean[update_pos]
    if return_dimension:
        profile_dimension[j][:, update_pos] = \
            dist_profile_dim[:j + 1, update_pos]

# matrix_profile = np.sqrt(matrix_profile)  # Removed this line

License

Hello,

I was wondering if you would kindly put an open-source license to this repository. This will enable others to make use of your matrix-profile algorithm and software.

Thank you,
Deepak George Thomas

`que_sig` discarded while looping over dimensions

In the python implementation of the stamp algorithm, I noticed that while looping and computing _mass over dimensions, only the que_sig over the last dimension is used on line 90:

         for j in range(n_dim):
            que = seq[j, i:i + sub_len]
            dist_profile[j, :], que_sig = _mass(
                seq_freq[j, :], que, seq_len, sub_len,
                seq_mu[j, :], seq_sig[j, :])

        if skip_loc[i] or np.any(que_sig < _EPS):
            continue

https://github.com/mcyeh/mstamp/blob/master/Python/mstamp_stamp.py lines 84 - 91

Is this discarding intentional, or is just that the current implementation is not using it any further?

will it work for multivariate time series ?

great code thanks
may you still clarify :
will it work for multivariate time series .
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values

color weight gender height age
1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.