thedatumorg / kshape-python Goto Github PK
View Code? Open in Web Editor NEWPython implementation of k-Shape
License: MIT License
Python implementation of k-Shape
License: MIT License
Is this single threaded ? or how can I use multi-threading ?
This error with the tslearn implementation of K-Shape has been reported in the tslearn github repo. Posting here in case there is some knowledge about why this might be occurring already floating around:
some questions regarding your kshape code:
When I implemented
n1=np.zeros((20, 1440))
n2=np.full((30, 1440), 614631)
n3=np.concatenate((n1,n2),axis=0)
univariate_ts_datasets = np.expand_dims(n3, axis=2)
gpu_model = ks_gpu(univariate_ts_datasets, 3, centroid_init='zero', max_iter=100)
I got error:
Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.
Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.
info: 89
, submat: 0
info: 1531
Traceback (most recent call last):
File "D:/apps_data/attempt/k-shape.py", line 279, in <module>
gpu_model = ks_gpu(univariate_ts_datasets, num_clusters, centroid_init='zero', max_iter=100)
File "D:\apps_home\Download\anaconda\envs\pytorch\lib\site-packages\kshape\core_gpu.py", line 116, in kshape
idx, centroids = _kshape(x, k, centroid_init=centroid_init, max_iter=max_iter)
File "D:\apps_home\Download\anaconda\envs\pytorch\lib\site-packages\kshape\core_gpu.py", line 100, in _kshape
centroids[j] = torch.unsqueeze(_extract_shape(idx, x, j, centroids[j]), dim=1)
File "D:\apps_home\Download\anaconda\envs\pytorch\lib\site-packages\kshape\core_gpu.py", line 76, in _extract_shape
_, vec = torch.symeig(m, eigenvectors=True)
RuntimeError: symeig_cuda: the algorithm failed to converge; 1531 off-diagonal elements of an intermediate tridiagonal form did not converge to zero.
Process finished with exit code 1
It seems that y = zscore(a, axis=1, ddof=1)
in core_gpu.py
, line 67 got some huge elements, and s = y[:, :, 0].transpose(0, 1).mm(y[:, :, 0])
, line 69 got values containing inf
.
ks_cpu can deal with this dataset successfully.
I use Python 3.7, Pytorch 1.8.0.
Shouldn't the default be 1? This is how the paper describes the z-normalization.
Is it possible to cluster sequences with more than one dimension?
i.e. a tensor of shape (n_samples, seq_length, n_features)
as opposed to just (n_samples, seq_length)
hello guys,
you should definetley check your code against the k-shape implementation from tslearn.
your code has very high peaks in ram usage for k > 4; m = 56; n = 5,000,000.
(n is the number of time series, k is the number of clusters, and m is the length of the time series; from the original paper)
It goes up to 196 gigabyte of ram usage. The tslearn-implementation only uses up to 7 gigabyte (like your implementation) but it does not have the peaks. Your solution builds up a peak and then goes down rapidly.
at least put a link to the tslearn project in your readme-file.
https://github.com/rtavenar/tslearn/blob/master/tslearn/clustering.py#L597
thanks for your implementation though. At least you tried and open sourced it. Thanks for that =)
Hello Mic92,
I was trying to run your code, however it does not work properly neither at ipython notebook nor at python. Could you check why is this happening? Thanks in advance for your time.
The error at ipython is:
An exception has occurred, use %tb to see the full traceback.
SystemExit: 0
To exit: use 'exit', 'quit', or Ctrl-D
The error at python is:
Traceback (most recent call last):
File "kshape.py", line 176, in
clusters = kshape(zscore(time_series), cluster_num)
File "kshape.py", line 13, in zscore
mns = a.mean(axis=axis)
File "/opt/conda/lib/python2.7/site-packages/numpy/core/_methods.py", line 65, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type
The code where I want to execute it is:
from kshape import kshape, zscore
time_series = [['c1'], ['c2'],['c3'], ['c4'], ['c5'], ['g1'], ['g2'], ['g3'], ['g4'], ['g5'], ['g6'],
['m1'], ['m2'], ['m3'], ['m4'], ['m5'], ['n1'], ['n2'], ['n3'], ['n4'], ['s1'], ['s2'],
['s3'], ['s4'], ['s5'], ['s6'], ['s7'], ['s8'], ['w1'], ['w2'], ['w3'], ['w4'], ['w5'],
['w6'], ['x1'], ['x2'], ['x3']]
for cluster_num in range(2, 37):
clusters = kshape(zscore(time_series), cluster_num)
print 'clusters', clusters
Hi,
Am I supposed to get different clustering result from each run on the same data set? I was running the following code:
univariate_ts_datasets = np.expand_dims(np.random.rand(200, 60), axis=2)
num_clusters = 3
for j in range(5):
ksc = KShapeClusteringCPU(num_clusters, centroid_init='zero', max_iter=100, n_jobs=-1)
ksc.fit(univariate_ts_datasets)
labels = ksc.labels_ # or ksc.predict(univariate_ts_datasets)
cluster_centroids = ksc.centroids_
print(labels)
My understanding is that there's nothing random in the algo since I set the centroid init to be zero. But I get different results, e.g. sometimes the first ts and second ts belong to the same cluster and sometimes they don't
Thanks,
James
Hello, in the demo that in MTS clustering, that is "multivariate_ts_datasets = np.random.rand(200, 60, 6)". What is the 60,6 represents?60 is the sequence length? 6 is feature number?
Hello Mic92,
I was trying to run your code, however the results of clusters are sometimes has 0 centers.
When the cluster_num = 2, it is fine. However, when the cluster_num = 4, it has two 0 centroid.
time_series = [[1,2,3,4], [0,1,2,3], [0,1,2,3], [1,2,2,3],[2,2,2,2],[9,9,8,7],[2,2,1,1]]
cluster_num =4
clusters = _kshape(zscore(time_series), cluster_num)
clusters
(array([1, 1, 1, 1, 2, 2, 2]), array([[ 0. , 0. , 0. , 0. ],
[-1.2222319 , -0.35269008, 0.52140717, 1.05351481],
[-0.56324856, -1.1229572 , 0.83705112, 0.84915464],
[ 0. , 0. , 0. , 0. ]]))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.