wannesm / dtaidistance Goto Github PK
View Code? Open in Web Editor NEWTime series distances: Dynamic Time Warping (fast DTW implementation in C)
License: Other
Time series distances: Dynamic Time Warping (fast DTW implementation in C)
License: Other
According to the MODULES document, parallel and show_progress should work on the c base distance matrix module,
dtaidistance.dtw.distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None, max_step=None, penalty=None, psi=None, block=None, compact=False, parallel=True, show_progress=False)
However, I found that the both parallel and show_progress parameter do not work in dtaidistance.dtw.distance_matrix_fast module.
s = [
np.array([10., 10, 10, 8, 10, 8, 8, 10, 8]),
np.array([8., 10, 8, 8, 10, 8]),
np.array([8., 2, 0, 0, 0, 0, 0, 1, 1]),
np.array([8., 2, 0, 0, 0, 0, 0, 0, 0]),
...
np.array([9., 0, 1, 2, 1, 0, 1, 0, 9]),
np.array([0., 0, 0, 8, 10, 8, 8, 10, 8]),
np.array([1., 2, 0, 0, 0, 0, 0, 1, 1]),
np.array([1., 2, 0, 0, 0, 0, 0, 1, 3])
]
tic = time.clock()
dtw.distance_matrix(s,compact=True,psi=1,show_progress=True,parallel=True)
toc = time.clock()
toc - tic
tic = time.clock()
dtw.distance_matrix_fast(s,compact=True,psi=1,show_progress=True,parallel=True)
toc = time.clock()
toc - tic
tic = time.clock()
dtw.distance_matrix_fast(s,compact=True,psi=1,show_progress=True,parallel=False)
toc = time.clock()
toc - tic
0%| | 0/100 [00:00<?, ?it/s]
100%|██████████| 100/100 [00:00<00:00, 512.83it/s]
900.0146135228515145091.02014512547413254414
1.00014066557152784057
The progress bar didn't show up in fast module and the parallel parameter seem doesn't function properly as well. I believe there are some bug in the c code.
I am using Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] and dtaidistance v1.2.
~Thank you.
I'm trying to run your DTW implementation on data with values in the range of -2000 to 2000. If I normalize the data to a certain (usually much smaller) range first, I have no issues. However, whenever I attempt to run your code on the raw (i.e. unnormalized) data, I end up with the following message:
ValueError: Buffer dtype mismatch, expected 'double' but got 'short'
The trace back is as follows:
Traceback (most recent call last):
File "D:\PathToMyProject\testing.py", line 313, in myDTWFunction
myDistance = dtw.distance(a,b,use_c=True,window=w,max_dist=minimumDistance)
File "D:\PathToMyAnaconda\lib\site-packages\dtaidistance-1.1.3-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 84, in distance
psi=psi)
File "D:\PathToMyAnaconda\lib\site-packages\dtaidistance-1.1.3-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 196, in distance_fast
psi=psi)
File "dtaidistance\dtw_c.pyx", line 129, in dtaidistance.dtw_c.distance_nogil
I thought this might be an issue with some old code and I tried to recompile the most recent code from your repository, but I didn't succeed yet due to this issue.
P.S: I run the code on a Windows 10 machine.
I'm currently facing two issues. The first is that I'm unable to recompile the source code after updating the repository. Whenever I attempt to do this via
C:\pythoToMyPython\python.exe D:\pathToDtaidistance\dtaidistance\setup.py build_ext --inplace
on my Windows (10) machine, I get the following message:
Traceback (most recent call last):
File "D:\pathToDtaidistance\dtaidistance\setup.py", line 123, in
extra_link_args=extra_link_args)])
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 897, in cythonize
aliases=aliases)
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 777, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 102, in nonempty
raise ValueError(error_msg)
ValueError: 'dtaidistance/dtw_c.pyx' doesn't match any files
That's strange because my dtaidistance folder contains a file named "dtw_c.pyx".
For the second issue, I will open a separate ticket.
First of all: Thank you very much for creating this nice piece of software! Unfortunately, I have an issue running your sample code.
Whenever I try to run the following code:
series = [np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)
print(ds)
I'm greeted with this error message:
OverflowError: Python int too large to convert to C long
Any help would be highly appreciated.
When I run dtw.distance I have many values that return as "inf"...not sure if I shoul just increase the max_length_diff to something large
In some provided examples there is a syntax error, where you need to replace the variable s with series.
For instance the following
ds = dtw.distance_matrix_fast(s)
should be
ds = dtw.distance_matrix_fast(series)
Hi, my code works fine for smaller number of short (Nt=12) time series, N up to ~6000, but when I tried running my code for N= 350000 I'm getting an error you can see below. I run it on EC2 instance, so it's not a memory issue. Is there any hardcoded limit that could cause it?
model.fit(X_train)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/clustering.py", line 463, in fit
dists = self.dists_fun(self.series, **self.dists_options)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/dtw.py", line 547, in distance_matrix_fast
use_c=True, use_nogil=True, show_progress=show_progress)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/dtw.py", line 415, in distance_matrix
dists = dtw_c.distance_matrix_nogil(s, is_parallel=parallel, **dist_opts)
File "dtaidistance/dtw_c.pyx", line 586, in dtaidistance.dtw_c.distance_matrix_nogil
File "dtaidistance/dtw_c.pyx", line 668, in dtaidistance.dtw_c.distance_matrix_nogil_c_p
IndexError: Out of bounds on buffer access (axis 0)
Hi your package is very interesting to me and I would like to use it to visualise my own data.
I would like to plot a dendrogram with each of the node leaves labelled according to an external label.
However I cannot seem to find a way to do this....
import numpy as np
import pandas as pd
dataframe = pd.read_csv('sample.csv',header=None)
dataframe.head(6)
label for nodes
0 chr13_110718378-110719378.txt _2.441430_ 207.521542 163.575804 ...... 2
1 chr2_96278196-96279196.txt 43.223219 242.530287 168.090298 ...... 1
2 chr4_140084844-140085844.txt 237.444590 155.823012 249.811496 ...... 3
3 chr10_71267774-71268774.txt 232.878508 139.246943 225.676080 ....... 3
4 chr14_86309018-86310018.txt 131.655232 248.406099 67.069647 ....... 2
5 chr3_97076527-97077527.txt 129.814476 0.000000 204.337600 ........ 1
df=dataframe.values
labels=df[:,-1] #labels variable
df=df[:,:-1]
df= df[:,1:]
from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
from dtaidistance import clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
model2 = clustering.HierarchicalTree(model1)
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(np.matrix(df,dtype=np.double))
Hence my question is if there is a way to label the tree nodes and still have the nice plot formats that your package produces?
dynamicTimeWarping.py:
from dtaidistance import dtw
from dtaidistance import clustering
import numpy as np
s = np.array([
[0, 0, 1, 2, 1, 0, 1, 0, 0],
[0, 1, 2, 0, 0, 0, 0, 0, 0],
[1, 2, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 2, 1, 0, 1, 0, 0],
[0, 1, 2, 0, 0, 0, 0, 0, 0],
[1, 2, 0, 0, 0, 0, 0, 1, 1],
[1, 2, 0, 0, 0, 0, 0, 1, 1]])
model = clustering.Hierarchical(dtw.distance_matrix_fast, {})
modelw = clustering.HierarchicalTree(model)
cluster_idx = modelw.fit(s)
modelw.plot("hierarchy.png")
error logs:
(timeSeriesClassification) bash-3.2$ python3 dynamicTimeWarping.py
The compiled dtaidistance C library is not available.
See the documentation for alternative installation options.
Traceback (most recent call last):
File "dynamicTimeWarping.py", line 17, in <module>
cluster_idx = modelw.fit(s)
File "/Users/user/.local/share/virtualenvs/timeSeriesClassification-ight38Tz/lib/python3.7/site-packages/dtaidistance/clustering.py", line 418, in fit
result = self._model.fit(series, *args, **kwargs)
File "/Users/user/.local/share/virtualenvs/timeSeriesClassification-ight38Tz/lib/python3.7/site-packages/dtaidistance/clustering.py", line 73, in fit
pbar = tqdm(total=dists.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'
I should note that this fails the same way if I run the clustering tests included in this repo as well
Trying to run the example from the documentation:
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
# Keep track of full tree by using the HierarchicalTree wrapper class
model2 = clustering.HierarchicalTree(model1)
# You can also pass keyword arguments identical to instantiate a Hierarchical object
model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)
and getting an error for any of the proposed models:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-4bb92754361e> in <module>()
2 model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
3 # Keep track of full tree by using the HierarchicalTree wrapper class
----> 4 model2 = clustering.HierarchicalTree(model1)
5 # You can also pass keyword arguments identical to instantiate a Hierarchical object
6 model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
/Users/fred/anaconda/lib/python2.7/site-packages/dtaidistance/clustering.pyc in __init__(self, model, **kwargs)
388 else:
389 self._model = model
--> 390 super().__init__(**kwargs)
391 self._model.max_dist = np.inf
392
TypeError: super() takes at least 1 argument (0 given)
Dear Friend.
I have an error.
dist, cost, path = dtw(mfcc1.T, mfcc2.T)
print("The normalized distance between the two : ",dist) # 0 for similar audios
TypeError: 'module' object is not callable
Can you help .. to resolve?
What is the meaning of all these shades of green in the figure?
Simple question. For your clustering.py
plot method for hierarchical clustering, I am unable to find a way to change the size of the dendrogram. plt.figure(figsize=(x,y))
does not work.
Can help? Great library btw!
I installed this package and tried the fast implementation of the DTW but I get:
The compiled dtaidistance C library is not available.
See the documentation for alternative installation options
How can I rectify this? I have Cython installed and all other dependencies. The docs don't point to any other specific installations necessary. Thanks,
1.2.3 was released but there is no tag, making it harder to download from github.
Getting this error. Seems like just a syntax thing
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 70, in
from dtaidistance import dtw
File "/opt/conda/lib/python3.6/site-packages/dtaidistance/init.py", line 19, in
from . import dtw
File "/opt/conda/lib/python3.6/site-packages/dtaidistance/dtw.py", line 23
s from . import dtw_c
^
IndentationError: expected an indented block
I was just comparing results with other dynamic time warping libraries and noticed significant differences in the dtw distances, despite similar paths. Am I missing something or is there an error somewhere in your distance calculation?
dist1,paths= dtw.warping_paths(s1,s2)
path = dtw.best_path(paths)
print(dist1)
dist2 = 0
for [a, b] in path:
dist2 += abs(s1[a]-s2[b])
print(dist2)
dist1
and dist2
are vastly different in my examples.
**----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5FU_xZ/dtaidistance/**
pip install dtaidistance
Collecting dtaidistance
Using cached https://files.pythonhosted.org/packages/75/ad/458d751a5d4842e3f7aa0ad6f79ee0219683d0b28ebc0d882f4106436a10/dtaidistance-1.1.4.tar.gz
Complete output from command python setup.py egg_info:
/home/lokesh/.local/lib/python2.7/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-5FU_xZ/dtaidistance/dtaidistance/dtw_c.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Compiling dtaidistance/dtw_c.pyx because it depends on /home/lokesh/.local/lib/python2.7/site-packages/Cython/Includes/numpy/init.pxd.
[1/1] Cythonizing dtaidistance/dtw_c.pyx
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-5FU_xZ/dtaidistance/setup.py", line 139, in
with open('dtaidistance/init.py', 'r', encoding='utf-8') as fd:
TypeError: 'encoding' is an invalid keyword argument for this function
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5FU_xZ/dtaidistance/
Hi,
When I tried installing with pip I got the following error:
line 263, in
with open('dtaidistance/init.py', 'r', encoding='utf-8') as fd:
TypeError: 'encoding' is an invalid keyword argument for this function
I tried reinstalling with pip3 and it worked.
Please edit the install doc to "pip3."
Thx.
~Jon
According to the MODULES document, psi should work on the c base distance matrix module,
dtaidistance.dtw.distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None, max_step=None, penalty=None, psi=None, block=None, compact=False, parallel=True, show_progress=False)
However, I found that the psi parameter do not work on dtaidistance.dtw.distance_matrix_fast module.
s = [np.array([0., 0, 1, 2, 1, 0, 1, 0, 0]),
np.array([9., 0, 1, 2, 1, 0, 1, 0, 9])]
dtw.distance_matrix(s,compact=True,psi=1)
dtw.distance_matrix_fast(s,compact=True,psi=1)
[0.]
[12.72792206]
And when I look into the dtw_c.pyx file, I didn't found the psi code included in distance_matrix_nogil function. I am not familiar with c so I maybe read the code wrongly.
I am using Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] and dtaidistance v1.2.
I really need this features, is there any workaround?
~Thank you.
Is it possible for the C implementation of dtw distance to return not only the final distance but also the best_path or all paths? Because besides having the final distance, I would be interested in analysing the individual distances over time. Would that be possible?
I just attempted to install dtaidistance once again on a Windows 10 machine and ran into the issue that the variables "PrepReadme " and "MySDistCommand" could not be found. Thus, I simply removed the following lines (line 282 and line 283) from setup.py:
'readme': PrepReadme,
'sdist': MySDistCommand,
After doing so, I was able to install dtaidistance from source. Could you please have a look at this issue and adjust the file accordingly? Thanks a lot.
According to the documentation, when this flag is set to True
only means that distance function use precompiled functions in C.
But when I check both values they are so different as we can check in the image bellow.
I assume that this is a wrong behavior, I´m wrong?
Could you tell me, if I'm wrong, what makes this difference?
Thanks in advance!
I'm using your example but the result is: (None, None) Why?
series = np.matrix([
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1],
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1]])
from dtaidistance import clustering
from dtaidistance import dtw
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)
model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
cluster_idx = model2.fit(series)
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)
model2.plot("hierarchy.png")
(None, None)
Resolved. See suggestion for doc clarification below.
Original question:
Hi,
When I run this clustering example code provided in the docs:
from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0])
s3 = np.array([0.0, 0, 1, 2, 1, 0, 0, 0])
series = [s1, s2, s3]
from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix, {})
# Keep track of full tree by using the HierarchicalTree wrapper class
model2a = clustering.HierarchicalTree(model1)
# You can also pass keyword arguments identical to instantiate a Hierarchical object
model2b = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix, dists_options={})
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix, {})
cluster_idx = model3.fit(series)
model2a.plot("hierarchy.png")
I get the following error traced back to clustering.py:
line 220, in plot
self._series_y = [0] * len(self.series)
TypeError: object of type 'NoneType' has no len()
I get the same error when I use distance_matrix_fast.
Can you help me?
SOLUTION:
The above example fit model3, but not model 2b which it seeks to plot. You need to call
var = model2a.fit(series)
then you can plot it.
Updating the doc accordingly would help others avoid this error.
Hello,
I noticed that in clustering.py, you use 'complete' linkage rather than single, average or ward. Would you be so kind as to explain why you chose complete linkage? What were the considerations that went into your making this choice?
~Thank you.
For example, if I started with
from dtaidistance import dtw
s1 = [1,2,3,4,1,2,3]
s2 = [2,3,4,1,2,3,4]
distance, paths = dtw.warping_paths(s1, s2,penalty=0)
The resulting distance was 1.4142135623730951.
When I set penalty = 10, the distance would be 3.872983346207417.
How did the distance go from 1.4142135623730951 to 3.872983346207417?
A paper or any other references would be very helpful.
Thanks!
Hi experts, my code failed with error message
"ValueError: operands could not be broadcast together with shapes (3,) (9,)"
Is there any workaround ?
from dtaidistance import dtw
import numpy as np
series = np.array([
np.array([1, 2, 1]),
np.array([0., 1, 2, 0, 0, 0, 0, 0, 0]),
np.array([1., 2, 0, 0, 0, 0, 0, 1, 1, 3, 4, 5]),
np.array([0., 0, 1, 2, 1, 0, 1]),
np.array([0., 1, 2, 0, 0, 0, 0, 0]),
np.array([1., 2, 0, 0, 0, 0, 0, 1, 1])])
ds = dtw.distance_matrix(series2)
print(ds)
model3 = clustering.LinkageTree(dtw.distance_matrix, {})
cluster_idx = model.fit(series)
print(cluster_idx)
model.plot(img_path+"model-test.png", show_ts_label=True)
Best regards,
Keita
thanks
The tests require a file synthetic_control.data
. However, this file is not included in the github sources nor can I find any explanation on where to obtain this file. It would be helpful if it was in the github source tree, that would allow CI tests to be run.
I'm using Python 3.6 (with Anaconda) on a Windows 8.1 and a Windows 10 machine at work. I'm dealing with rather large data sets (roughly 10 data sets with 10,000 sequences consisting of 3,000 samples each). dtaidistance is working as expected but whenever I set the "use_c" parameter of the "distance" function, I can observe in the task-manager that the corresponding Python process slowly eats up all available RAM.
If I want to run datidistance on several data sets overnight, it regularly simply crashes after all my available RAM (roughly 32 GB) has been used.
Could it be possible that somehow the memory isn't freed again properly after it has been used?
In the docs on page "Dynamic Time Warping (DTW", in the section entitled "DTW between set of series", after giving an example using dtw.distance_matrix_fast, the doc says:
"This behaviour can be deactivated by setting the argument compact to true," implying that compact can be set to true in the previous example, which is of distance_matrix_fast.
However, compact is only a keyword in dtw.distance_matrix, and not in the distance_matrix_fast. If you try setting compact to True in distance_matrix_fast, you will get an error. So as not to be misleading, I suggest clarifying this in the doc.
I'm running Ubuntu 18.04, and I compiled dtaidistance
from GitHub. When logging info
level, it's confirmed that the C libraries installed and parallel computation is enabled, but htop shows a single core running when running model = clustering.LinkageTree(dtw.distance_matrix_fast, {'parallel':True})
. Any thoughts on why parallelization isn't working? Or how to debug it?
INFO:be.kuleuven.dtai.distance:Computing distances
INFO:be.kuleuven.dtai.distance:Compute distances in pure C
INFO:be.kuleuven.dtai.distance:Use parallel computation
Hi,
I have already installed python: 3.7.1 with numpy: 1.16.1, Cython: 0.29.6, dtaidistance: 1.1.4. I have tried to install dtaidistance by pip, GitHub and also From Sources. But when I want to use "distance_matrix_fast", I get this error:
"The compiled dtaidistance C library is not available.
See the documentation for alternative installation options."
I am using Mac OS: 10:14.
and here is my log when I try to install dtaidistance by using pip:
The current PyPI package ( https://pypi.org/project/dtaidistance/1.1.2/#files ) is missing a copy of the LICENSE file. This appanesrld causes the build to fail:
error: [Errno 2] No such file or directory: 'LICENSE'
Failed building wheel for dtaidistance
Should be
if ci > ui:
t += abs(ci - ui)
elif ci < li:
t += abs(ci - li)
else:
pass
#donothing
Hi,
Is it working for multivariate time series using a list of list as input or an array with dimensions "nb_samples * nb_features" for instance ?
Thanks !
I am getting a memory error when trying to calculate DTW distances for a numpy matrix of 2.2 million series with 157 time points. I have 400 GB of RAM available, with < 10% of utilization, and immediately get the error when trying to run dtw.distance_matrix_fast
. Here is the traceback:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-64-51fa732ba330> in <module>
2 dists = dtw.distance_matrix_fast(df_numpy, window=15,
3 parallel=True,
----> 4 show_progress=True)
5 dists[dists == np.inf] = 0
6 dists = dists + dists.T - np.diag(np.diag(dists))
~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw.py in distance_matrix_fast(s, max_dist, max_length_diff, window, max_step, penalty, psi, block, parallel, show_progress)
449 window=window, max_step=max_step, penalty=penalty, psi=psi,
450 block=block, parallel=parallel,
--> 451 use_c=True, use_nogil=True, show_progress=show_progress)
452
453
~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw.py in distance_matrix(s, max_dist, max_length_diff, window, max_step, penalty, psi, block, parallel, use_c, use_nogil, show_progress)
367 if parallel:
368 logger.info("Use parallel computation")
--> 369 dists = dtw_c.distance_matrix_nogil_p(s, **dist_opts)
370 else:
371 logger.info("Use serial computation")
~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw_c.pyx in dtaidistance.dtw_c.distance_matrix_nogil_p()
~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw_c.pyx in dtaidistance.dtw_c.distance_matrix_nogil()
MemoryError:
Any suggestions? Is it not feasible to run DTW on a dataset this large?
We are getting a weird error using this library that I cannot debug. I have tried to set the logging level to debug and seeing what I can find, but no luck.
INFO:be.kuleuven.dtai.distance:Computing distances
INFO:be.kuleuven.dtai.distance:Compute distances in pure C (parallel=True)
Traceback (most recent call last):
File "rt/clustering/rt_cluster_trainer.py", line 250, in <module>
clustering.run()
File "rt/clustering/rt_cluster_trainer.py", line 183, in run
model = cluster_library.train()
File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 241, in train
models = self.perform_clustering(self.buckets, self.kvalues, self.linkage_method)
File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 208, in perform_clustering
y = self.calculate_distance_matrix()
File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 199, in calculate_distance_matrix
y = dtw.distance_matrix_fast(self.frequency_counts, window=self.dtw_radius, compact=True)
File "/opt/grok/ve3/lib/python3.7/site-packages/dtaidistance/dtw.py", line 548, in distance_matrix_fast
use_c=True, use_nogil=True, show_progress=False)
File "/opt/grok/ve3/lib/python3.7/site-packages/dtaidistance/dtw.py", line 416, in distance_matrix
dists = dtw_c.distance_matrix_nogil(s, is_parallel=parallel, **dist_opts)
File "dtaidistance/dtw_c.pyx", line 586, in dtaidistance.dtw_c.distance_matrix_nogil
File "dtaidistance/dtw_c.pyx", line 657, in dtaidistance.dtw_c.distance_matrix_nogil_c_p
ValueError: negative dimensions are not allowed
We are using the library exactly like the documentation
from dtaidistance import dtw
import numpy as np
series = [
np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)
Our distance matrix is huge, so I cannot post it here, but I have written some code that tests each timeseries fed into the algorithm and they are all the same shape (307,)
, there are no negative values, there are no NaN.
Can someone shed some light on what this error actually means? There is nothing wrong with our data as far as we know...If we take random samples of the data, say only 50% of it, excluding many timeseries, it seems to work fine. Is there a single timeseries numpy array breaking it?
In the docs on page
Hi,
First, thanks for the great repo!
Maybe I missed, but there is any C-implementation for DTW in the n-dimensional case?
I found 'dtw_ndim.py' which handles ndim case, but 'dtw_c' (at line 22) are not used at all in this file.
(For the 1-dim case, I can see the C-implementation)
Thanks!
The linkage tree plot will fail, due to a bug in the module clustering.py
, line 227. Numpy doesn't know what to do with series of uneven lengths when calculating the max.
The easiest solution is to concatenate the arrays first, and then continue:
all_y = np.concatenate(self.series)
max_y = max(all_y.max(), np.abs(all_y.min()))
Got it!
will it work for multivariate time series classification for example mixture of categorical and continues data?
for example at time t1 we have observation: red, 2.4 , 5, 12.456 and time t2: green, 3.5, 2, 45.78; time t3: black, 5.6, 7, 23.56; t4: red, 2.1, 5, 12.6 ?
I get the following error whenever I attempt to run DTW:
numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject
The traceback looks as follows:
Traceback (most recent call last):
File "D:\Projects\MyProject\Main.py", line 3, in <module>
import Algorithms
File "D:\Projects\MyProject\Algorithms.py", line 3, in <module>
from dtaidistance import dtw, dtw_visualisation as dtwvis, clustering
File "D:\Programs\Anaconda\Anaconda\lib\site-packages\dtaidistance-1.1.4-py3.6-win-amd64.egg\dtaidistance\__init__.py", line 19, in <module>
from . import dtw
File "D:\Programs\Anaconda\Anaconda\lib\site-packages\dtaidistance-1.1.4-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 23, in <module>
from . import dtw_c
File "__init__.pxd", line 918, in init dtaidistance.dtw_c
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject
The system I'm using is Windows 10.
P.S: I wrote before that all current errors are fixed for me but apparently I was still somehow using an older version. Sorry for the confusion.
Generating code Finished generating code LINK : fatal error LNK1158: cannot run 'rc.exe' error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1158
Cannot install this package due to this error when building the wheel. Do you know how to fix it?
I hope you can help with the problems encountered in running the sample code for Clustering. Thanks!
ValueError Traceback (most recent call last)
in
----> 1 model2.plot("hierarchy.png")
F:\Mysoftware\Anaconda3\lib\site-packages\dtaidistance\clustering.py in plot(self, filename, axes, ts_height, bottom_margin, top_margin, ts_left_margin, ts_sample_length, tr_label_margin, tr_left_margin, ts_label_margin, show_ts_label, show_tr_label, cmap, ts_color)
366 if isinstance(filename, Path):
367 filename = str(filename)
--> 368 plt.savefig(filename, bbox_inches='tight', pad_inches=0)
369 plt.close()
370 fig, ax = None, None
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\pyplot.py in savefig(*args, **kwargs)
714 def savefig(*args, **kwargs):
715 fig = gcf()
--> 716 res = fig.savefig(*args, **kwargs)
717 fig.canvas.draw_idle() # need this if 'transparent=True' to reset colors
718 return res
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in savefig(self, fname, transparent, **kwargs)
2178 self.patch.set_visible(frameon)
2179
-> 2180 self.canvas.print_figure(fname, **kwargs)
2181
2182 if frameon:
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
2058 bbox_artists = kwargs.pop("bbox_extra_artists", None)
2059 bbox_inches = self.figure.get_tightbbox(renderer,
-> 2060 bbox_extra_artists=bbox_artists)
2061 pad = kwargs.pop("pad_inches", None)
2062 if pad is None:
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_tightbbox(self, renderer, bbox_extra_artists)
2359 bb = []
2360 if bbox_extra_artists is None:
-> 2361 artists = self.get_default_bbox_extra_artists()
2362 else:
2363 artists = bbox_extra_artists
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_default_bbox_extra_artists(self)
2330 bbox_artists.extend(ax.get_default_bbox_extra_artists())
2331 # we don't want the figure's patch to influence the bbox calculation
-> 2332 bbox_artists.remove(self.patch)
2333 return bbox_artists
2334
ValueError Traceback (most recent call last)
F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\formatters.py in call(self, obj)
339 pass
340 else:
--> 341 return printer(obj)
342 # Finally look for special method names
343 method = get_real_method(obj, self.print_method)
F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\pylabtools.py in (fig)
242
243 if 'png' in formats:
--> 244 png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))
245 if 'retina' in formats or 'png2x' in formats:
246 png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs))
F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\pylabtools.py in print_figure(fig, fmt, bbox_inches, **kwargs)
126
127 bytes_io = BytesIO()
--> 128 fig.canvas.print_figure(bytes_io, **kw)
129 data = bytes_io.getvalue()
130 if fmt == 'svg':
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
2058 bbox_artists = kwargs.pop("bbox_extra_artists", None)
2059 bbox_inches = self.figure.get_tightbbox(renderer,
-> 2060 bbox_extra_artists=bbox_artists)
2061 pad = kwargs.pop("pad_inches", None)
2062 if pad is None:
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_tightbbox(self, renderer, bbox_extra_artists)
2359 bb = []
2360 if bbox_extra_artists is None:
-> 2361 artists = self.get_default_bbox_extra_artists()
2362 else:
2363 artists = bbox_extra_artists
F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_default_bbox_extra_artists(self)
2330 bbox_artists.extend(ax.get_default_bbox_extra_artists())
2331 # we don't want the figure's patch to influence the bbox calculation
-> 2332 bbox_artists.remove(self.patch)
2333 return bbox_artists
2334
ValueError: list.remove(x): x not in list
Hi there! I am using your library for my master's thesis. Well, at least I am trying to. I almost have it done, I browsed the documentation, the source code and the closed issues - but I can't seem to find the solution to these three problems.
I want to make dead sure the labels are matching the time series ID that I have. That's because the time series have pre-defined groups and I am checking whether the groups match the clusters. I did some manual checks, and it looks like I get it, but since I am posting anyway figured it's better to make double sure.
Let's say I have 1000 time series and they all have a label: A, B or C. I would like A series to be red, B to be blue, C to be green.
I would like to know which time series are grouped together, and at what level of clustering certain attributes of my time-series make them cluster together.
Here is my code:
#Thats the database, the values I am clustering on is a floating point in NDVI column
database = pd.read_excel(r"data.xlsx")
data=database[['NDVI', 'Data', 'ID', 'TextAttribute']].dropna()
data.sort_values(by=['ID','Data'], ascending=True)
#Labels
duplicates = data.drop_duplicates(subset=['ID','TextAttribute'])
labels = duplicates['PartID'].tolist()
values = data.groupby('ID')['NDVI'].apply(lambda x: x.to_numpy())
series = [x.astype(np.double) for x in L]
#The clustering
model = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model.fit(series)
#Plotting
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(20, 250), dpi=300)
model.plot(filename='rawdata_all_dtwvis_id.pdf', axes=ax, show_ts_label=labels,
show_tr_label=True, ts_label_margin=-35,
ts_left_margin=30)
when I tried to import dtaidistance I got this error
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
import dtaidistance as dtw
File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/init.py", line 19, in
from . import dtw
File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/dtw.py", line 17, in
from .util import SeriesContainer, dtaidistance_dir
File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/util.py", line 36
logger.debug(f"Using directory: {directory}")
^
SyntaxError: invalid syntax
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.