mstamp's People
Forkers
jakasb tomfisher rob-med redraven984 gustavocarita lebedov tylerwmarrs jingzbu iramin jacobwjs thiagotguimaraes whitewalker007 ngminhtrung tchanda90 shayan-taheri rsadek concenterate aung2phyowai hhhaoyi quatrope quentto exp-time-series-tools vanbenschoten monster912 sirius768mstamp's Issues
About input data
Hi,
I am wondering input for mstramp or mstomp function has to be z-normalized. I tested with some un-normalized data and it seemed that we have to z-normalized data since a warning will be thrown for un-normalized data:
mstamp_stomp.py:127: RuntimeWarning: invalid value encountered in sqrt matrix_profile = np.sqrt(matrix_profile)
Am I right?
Thanks.
Bug in cumsum
In mstamp_stomp.py
, you have written:
dist_profile_dim = np.argsort(dist_profile, axis=0)
dist_profile_sort = np.sort(dist_profile, axis=0)
dist_profile_cumsum = np.zeros(sub_num)
for j in range(n_dim):
dist_profile_cumsum += dist_profile_sort[j, :]
dist_profile_mean = dist_profile_cumsum / (j + 1)
However, dist_profile_sort
are actually squared distances! Thus, I believe that your cumsum
is wrong or, at least, inconsistent from your paper. For example, if dist_profile_sort
was:
[[1, 1, 1],
[1, 4, 9]]
Then, in the for
loop when j = 0
:
dist_profile_cumsum = [1, 1, 1]
dist_profile_mean = [1, 1, 1]
When j = 1
:
dist_profile_cumsum = [2, 5, 10]
dist_profile_mean = [1, 2.5, 5]
One would think that the dist_profile_mean
(which is still a squared distance) could be evaluated by taking the square root and you would get:
[1, 1.5811, 2.236]
However, when we compare this with using straight distances (and not squared distances), we see that the above is not correct when j = 1
:
dist_profile_cumsum = [2, 3, 4]
dist_profile_mean = [1, 1.5, 2]
Notice that [1, 1.5811, 2.236]
is different from [1, 1.5, 2]
! To be more precise, the problem is that you've computed the mean of the squared distances and then, later, you take the square root. This is different from mean of the summed distances.
The should be:
dist_profile = np.sqrt(dist_profile) # Added this line
dist_profile_dim = np.argsort(dist_profile, axis=0)
dist_profile_sort = np.sort(dist_profile, axis=0)
dist_profile_cumsum = np.zeros(sub_num)
for j in range(n_dim):
dist_profile_cumsum += dist_profile_sort[j, :]
dist_profile_mean = dist_profile_cumsum / (j + 1)
update_pos = dist_profile_mean < matrix_profile[j, :]
profile_index[j, update_pos] = i
matrix_profile[j, update_pos] = dist_profile_mean[update_pos]
if return_dimension:
profile_dimension[j][:, update_pos] = \
dist_profile_dim[:j + 1, update_pos]
# matrix_profile = np.sqrt(matrix_profile) # Removed this line
License
Hello,
I was wondering if you would kindly put an open-source license to this repository. This will enable others to make use of your matrix-profile algorithm and software.
Thank you,
Deepak George Thomas
`que_sig` discarded while looping over dimensions
In the python implementation of the stamp
algorithm, I noticed that while looping and computing _mass
over dimensions, only the que_sig
over the last dimension is used on line 90:
for j in range(n_dim):
que = seq[j, i:i + sub_len]
dist_profile[j, :], que_sig = _mass(
seq_freq[j, :], que, seq_len, sub_len,
seq_mu[j, :], seq_sig[j, :])
if skip_loc[i] or np.any(que_sig < _EPS):
continue
https://github.com/mcyeh/mstamp/blob/master/Python/mstamp_stamp.py lines 84 - 91
Is this discarding intentional, or is just that the current implementation is not using it any further?
Misaligned Exclusion Zones
will it work for multivariate time series ?
great code thanks
may you still clarify :
will it work for multivariate time series .
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values
color weight gender height age
1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.