thedatumorg / kshape-python Goto Github PK

View Code? Open in Web Editor NEW

317.0 317.0 104.0 33.94 MB

Python implementation of k-Shape

License: MIT License

Python 100.00%

clustering kshape python time-series-clustering timeseries timeseries-analysis

kshape-python's People

Contributors

Stargazers

Watchers

Forkers

congratulatewe limingdeng ibadami rkd314 jhliang90us holgerteichgraeber forkbackups sieve-microservices hahnicity natnaelt xiaojieliu7 moyad10 adam-dziedzic samy101 xu-zhou cordeliagrey samangel93 tony-ou hakanaku1234 samithaj helloaworldzyf1 guanxin1121 dongwu92 zy-wang kentchun33333 njust-taoye qingzuhe wuhuicumt gp-yuege cequencer daeyounglim chaobo521 valeman sandy4321 crfx1995 agnes-yang wjj5881005 wenjiezeng7 liaopan lional stefanr44 rollingman1 frankcui713 fdoperezi bogireddytejareddy prasun6187 irfanfazdane asankoli avipopat jiaxin-yyjx patrick-miracle-wang tylerfrantz rithvik4567 rakhibatra02 xylchristi racyoo1004 jeonso0907 ashah21 zkysfls edwall chrisholman7 zahavablumenfeld caroline0codes swarg22 yuxiangren584 kendalllaflin haymondluk zshersus rachelle-soh huangjundashuaige sahithikatreddi rachanavk cchakraborty108 zjs65536 rahulmukthineni rystew17 jacksonwzyyy mannavagreeshma19 jennifourney duwenyu19 earrey shrenikj12 sarikaa3 aroontcholakov alsaeedhasnaa oximi123 onoshutaro-ci maxdesiree genliu777 qwqllly haojunli malzahar001 dat711 ponthius-g sbobek zzqzack ilwoof masa354 gopikrishnansrikumar

kshape-python's Issues

Single Threaded ?

Is this single threaded ? or how can I use multi-threading ?

This error with the tslearn implementation of K-Shape has been reported in the tslearn github repo. Posting here in case there is some knowledge about why this might be occurring already floating around:

tslearn-team/tslearn#99

kshape

some questions regarding your kshape code:

can your code be applied to time series with different lengths?
I need to extract shape prototype, but could not figure out what inputs are needed? Would you explain the parameters in _shape_extraction function?

Error happened when using ks_gpu to run a certain data

When I implemented

n1=np.zeros((20, 1440))
n2=np.full((30, 1440), 614631)
n3=np.concatenate((n1,n2),axis=0)
univariate_ts_datasets = np.expand_dims(n3, axis=2)
gpu_model = ks_gpu(univariate_ts_datasets, 3, centroid_init='zero', max_iter=100)

I got error:

Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.

Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.
info: 89
, submat: 0
info: 1531
Traceback (most recent call last):
  File "D:/apps_data/attempt/k-shape.py", line 279, in <module>
    gpu_model = ks_gpu(univariate_ts_datasets, num_clusters, centroid_init='zero', max_iter=100)
  File "D:\apps_home\Download\anaconda\envs\pytorch\lib\site-packages\kshape\core_gpu.py", line 116, in kshape
    idx, centroids = _kshape(x, k, centroid_init=centroid_init, max_iter=max_iter)
  File "D:\apps_home\Download\anaconda\envs\pytorch\lib\site-packages\kshape\core_gpu.py", line 100, in _kshape
    centroids[j] = torch.unsqueeze(_extract_shape(idx, x, j, centroids[j]), dim=1)
  File "D:\apps_home\Download\anaconda\envs\pytorch\lib\site-packages\kshape\core_gpu.py", line 76, in _extract_shape
    _, vec = torch.symeig(m, eigenvectors=True)
RuntimeError: symeig_cuda: the algorithm failed to converge; 1531 off-diagonal elements of an intermediate tridiagonal form did not converge to zero.

Process finished with exit code 1

It seems that y = zscore(a, axis=1, ddof=1) in core_gpu.py, line 67 got some huge elements, and s = y[:, :, 0].transpose(0, 1).mm(y[:, :, 0]) , line 69 got values containing inf.

ks_cpu can deal with this dataset successfully.

I use Python 3.7, Pytorch 1.8.0.

Outdated function in the package

I install the package and try to run it on google collab. However, when I used the gpu core by parsing the example on readme. it have the following errors. Please help me fix:

Default Z-score axis

Shouldn't the default be 1? This is how the paper describes the z-normalization.

k-shape with multi-dimensional time-series

Is it possible to cluster sequences with more than one dimension?

i.e. a tensor of shape (n_samples, seq_length, n_features) as opposed to just (n_samples, seq_length)

peaks in ram usage

hello guys,

you should definetley check your code against the k-shape implementation from tslearn.
your code has very high peaks in ram usage for k > 4; m = 56; n = 5,000,000.
(n is the number of time series, k is the number of clusters, and m is the length of the time series; from the original paper)
It goes up to 196 gigabyte of ram usage. The tslearn-implementation only uses up to 7 gigabyte (like your implementation) but it does not have the peaks. Your solution builds up a peak and then goes down rapidly.

at least put a link to the tslearn project in your readme-file.
https://github.com/rtavenar/tslearn/blob/master/tslearn/clustering.py#L597

thanks for your implementation though. At least you tried and open sourced it. Thanks for that =)

Error when importing kshape

Hello Mic92,

I was trying to run your code, however it does not work properly neither at ipython notebook nor at python. Could you check why is this happening? Thanks in advance for your time.

The error at ipython is:

An exception has occurred, use %tb to see the full traceback.
SystemExit: 0
To exit: use 'exit', 'quit', or Ctrl-D

The error at python is:

Traceback (most recent call last):
File "kshape.py", line 176, in
clusters = kshape(zscore(time_series), cluster_num)
File "kshape.py", line 13, in zscore
mns = a.mean(axis=axis)
File "/opt/conda/lib/python2.7/site-packages/numpy/core/_methods.py", line 65, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type

The code where I want to execute it is:
from kshape import kshape, zscore

time_series = [['c1'], ['c2'],['c3'], ['c4'], ['c5'], ['g1'], ['g2'], ['g3'], ['g4'], ['g5'], ['g6'],
['m1'], ['m2'], ['m3'], ['m4'], ['m5'], ['n1'], ['n2'], ['n3'], ['n4'], ['s1'], ['s2'],
['s3'], ['s4'], ['s5'], ['s6'], ['s7'], ['s8'], ['w1'], ['w2'], ['w3'], ['w4'], ['w5'],
['w6'], ['x1'], ['x2'], ['x3']]
for cluster_num in range(2, 37):
clusters = kshape(zscore(time_series), cluster_num)
print 'clusters', clusters

Getting different results on the same data set

Hi,

Am I supposed to get different clustering result from each run on the same data set? I was running the following code:

univariate_ts_datasets = np.expand_dims(np.random.rand(200, 60), axis=2)
num_clusters = 3

CPU Model

for j in range(5):
ksc = KShapeClusteringCPU(num_clusters, centroid_init='zero', max_iter=100, n_jobs=-1)
ksc.fit(univariate_ts_datasets)

labels = ksc.labels_ # or ksc.predict(univariate_ts_datasets)
cluster_centroids = ksc.centroids_
print(labels)

My understanding is that there's nothing random in the algo since I set the centroid init to be zero. But I get different results, e.g. sometimes the first ts and second ts belong to the same cluster and sometimes they don't

Thanks,
James

What does each dimension of data represent？

Hello， in the demo that in MTS clustering, that is "multivariate_ts_datasets = np.random.rand(200, 60, 6)". What is the 60,6 represents?60 is the sequence length? 6 is feature number?

Cluster centers equal to 0

Hello Mic92,

I was trying to run your code, however the results of clusters are sometimes has 0 centers.
When the cluster_num = 2, it is fine. However, when the cluster_num = 4, it has two 0 centroid.

time_series = [[1,2,3,4], [0,1,2,3], [0,1,2,3], [1,2,2,3],[2,2,2,2],[9,9,8,7],[2,2,1,1]]

cluster_num =4
clusters = _kshape(zscore(time_series), cluster_num)
clusters
(array([1, 1, 1, 1, 2, 2, 2]), array([[ 0. , 0. , 0. , 0. ],
[-1.2222319 , -0.35269008, 0.52140717, 1.05351481],
[-0.56324856, -1.1229572 , 0.83705112, 0.84915464],
[ 0. , 0. , 0. , 0. ]]))