wannesm / dtaidistance Goto Github PK

View Code? Open in Web Editor NEW

1.0K 27.0 183.0 1.34 MB

Time series distances: Dynamic Time Warping (fast DTW implementation in C)

License: Other

Makefile 0.87% Python 50.20% C 37.52% Cython 11.41%

timeseries dtw clustering dynamic-time-warping distance-measure c python

dtaidistance's People

Contributors

Stargazers

Watchers

Forkers

tomdecroos evgenyrakcheev khendrickx gurtej02 malibustacy yyyustc vishalbelsare zuomatthew chenshanghao kiv sktzwhj samins wusai2333 fredthedead jdc08161063 asundt lff5985 xcbat forainychen gustavocarita genana weimisa341 vanova carlinix librew18 hakanaku1234 wadkar zenghaihong alexaum misilva73 luozhongbin2017 chaof94 xllcheng loiccoyle espenwaaga simonelanucara raoyuanqi gzqhappy ryougeshiki merz9b sevennightstar ishanyash quant-kangchen 18305169793 mike-schwarz mohcinemadkour rantonov hxhc njsdias yanqi1811 ebourassin sandy4321 18085547630 marrygit miho-tanaka mizutarei hyzcn ohhzym macrocosme 15110142735 yutiansut kennethdevloo toddrme2178 wzpy suwang-coder kongfeih valerielimyh aung2phyowai fangniuer erezvolk buaaspy ggbaro shamazharikh lwcode andersonreisoares research-at-scuiot sooner0931 idahopotato1 yuanhui12 scottwedge olgakx joanvelro sichqq mainrs ahitboyzbw auiauk simiaolin enspiration24 yangicarus0311 fmc-data-solutions sujikathir dongxinliu jhryu1208 quanatec winter21days scanfyu theprismdata rukrei redqueen76 eveyear

dtaidistance's Issues

The 'parallel' and 'show_progress' parameter didn't work in dtaidistance.dtw.distance_matrix_fast

According to the MODULES document, parallel and show_progress should work on the c base distance matrix module,

dtaidistance.dtw.distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None, max_step=None, penalty=None, psi=None, block=None, compact=False, parallel=True, show_progress=False)

However, I found that the both parallel and show_progress parameter do not work in dtaidistance.dtw.distance_matrix_fast module.

s = [
    np.array([10., 10, 10, 8, 10, 8, 8, 10, 8]),
    np.array([8., 10, 8, 8, 10, 8]),
    np.array([8., 2, 0, 0, 0, 0, 0, 1, 1]),
    np.array([8., 2, 0, 0, 0, 0, 0, 0, 0]),
    ...
    np.array([9., 0, 1, 2, 1, 0, 1, 0, 9]),
    np.array([0., 0, 0, 8, 10, 8, 8, 10, 8]),
    np.array([1., 2, 0, 0, 0, 0, 0, 1, 1]),
    np.array([1., 2, 0, 0, 0, 0, 0, 1, 3])
    ]

tic = time.clock()
dtw.distance_matrix(s,compact=True,psi=1,show_progress=True,parallel=True)
toc = time.clock()
toc - tic

tic = time.clock()
dtw.distance_matrix_fast(s,compact=True,psi=1,show_progress=True,parallel=True)
toc = time.clock()
toc - tic

tic = time.clock()
dtw.distance_matrix_fast(s,compact=True,psi=1,show_progress=True,parallel=False)
toc = time.clock()
toc - tic

0%| | 0/100 [00:00<?, ?it/s]
100%|██████████| 100/100 [00:00<00:00, 512.83it/s]
900.014613522851514509

1.02014512547413254414

1.00014066557152784057

The progress bar didn't show up in fast module and the parallel parameter seem doesn't function properly as well. I believe there are some bug in the c code.

I am using Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] and dtaidistance v1.2.

~Thank you.

ValueError: Buffer dtype mismatch, expected 'double' but got 'short'

I'm trying to run your DTW implementation on data with values in the range of -2000 to 2000. If I normalize the data to a certain (usually much smaller) range first, I have no issues. However, whenever I attempt to run your code on the raw (i.e. unnormalized) data, I end up with the following message:

ValueError: Buffer dtype mismatch, expected 'double' but got 'short'

The trace back is as follows:

Traceback (most recent call last):
File "D:\PathToMyProject\testing.py", line 313, in myDTWFunction
myDistance = dtw.distance(a,b,use_c=True,window=w,max_dist=minimumDistance)
File "D:\PathToMyAnaconda\lib\site-packages\dtaidistance-1.1.3-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 84, in distance
psi=psi)
File "D:\PathToMyAnaconda\lib\site-packages\dtaidistance-1.1.3-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 196, in distance_fast
psi=psi)
File "dtaidistance\dtw_c.pyx", line 129, in dtaidistance.dtw_c.distance_nogil

I thought this might be an issue with some old code and I tried to recompile the most recent code from your repository, but I didn't succeed yet due to this issue.

P.S: I run the code on a Windows 10 machine.

ValueError: 'dtaidistance/dtw_c.pyx' doesn't match any files

I'm currently facing two issues. The first is that I'm unable to recompile the source code after updating the repository. Whenever I attempt to do this via

C:\pythoToMyPython\python.exe D:\pathToDtaidistance\dtaidistance\setup.py build_ext --inplace
on my Windows (10) machine, I get the following message:

Traceback (most recent call last):
File "D:\pathToDtaidistance\dtaidistance\setup.py", line 123, in
extra_link_args=extra_link_args)])
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 897, in cythonize
aliases=aliases)
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 777, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 102, in nonempty
raise ValueError(error_msg)
ValueError: 'dtaidistance/dtw_c.pyx' doesn't match any files

That's strange because my dtaidistance folder contains a file named "dtw_c.pyx".

For the second issue, I will open a separate ticket.

OverflowError: Python int too large to convert to C long

First of all: Thank you very much for creating this nice piece of software! Unfortunately, I have an issue running your sample code.

Whenever I try to run the following code:

    series = [np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
    ds = dtw.distance_matrix_fast(series)
    print(ds)

I'm greeted with this error message:
OverflowError: Python int too large to convert to C long

Any help would be highly appreciated.

getting "inf" as distance value. What is the default max_length_diff value?

When I run dtw.distance I have many values that return as "inf"...not sure if I shoul just increase the max_length_diff to something large

Examples Error

In some provided examples there is a syntax error, where you need to replace the variable s with series.
For instance the following
ds = dtw.distance_matrix_fast(s)
should be
ds = dtw.distance_matrix_fast(series)

Out of bounds on buffer access

Hi, my code works fine for smaller number of short (Nt=12) time series, N up to ~6000, but when I tried running my code for N= 350000 I'm getting an error you can see below. I run it on EC2 instance, so it's not a memory issue. Is there any hardcoded limit that could cause it?

model.fit(X_train)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/clustering.py", line 463, in fit
dists = self.dists_fun(self.series, **self.dists_options)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/dtw.py", line 547, in distance_matrix_fast
use_c=True, use_nogil=True, show_progress=show_progress)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/dtw.py", line 415, in distance_matrix
dists = dtw_c.distance_matrix_nogil(s, is_parallel=parallel, **dist_opts)
File "dtaidistance/dtw_c.pyx", line 586, in dtaidistance.dtw_c.distance_matrix_nogil
File "dtaidistance/dtw_c.pyx", line 668, in dtaidistance.dtw_c.distance_matrix_nogil_c_p
IndexError: Out of bounds on buffer access (axis 0)

customized labelling of tree nodes

Hi your package is very interesting to me and I would like to use it to visualise my own data.
I would like to plot a dendrogram with each of the node leaves labelled according to an external label.

However I cannot seem to find a way to do this....

import numpy as np
import pandas as pd
dataframe = pd.read_csv('sample.csv',header=None)
dataframe.head(6)

                                                                           label for nodes
0  chr13_110718378-110719378.txt    _2.441430_  207.521542  163.575804   ......    2
1     chr2_96278196-96279196.txt   43.223219  242.530287  168.090298     ......    1
2   chr4_140084844-140085844.txt  237.444590  155.823012  249.811496  ......    3
3    chr10_71267774-71268774.txt  232.878508  139.246943  225.676080   .......   3
4    chr14_86309018-86310018.txt  131.655232  248.406099   67.069647   .......    2
5     chr3_97076527-97077527.txt  129.814476    0.000000  204.337600   ........     1

df=dataframe.values
labels=df[:,-1] #labels variable
df=df[:,:-1]
df= df[:,1:]

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
from dtaidistance import clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
model2 = clustering.HierarchicalTree(model1)
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(np.matrix(df,dtype=np.double))

Hence my question is if there is a way to label the tree nodes and still have the nice plot formats that your package produces?

basic clustering example fails with "AttributeError: 'NoneType' object has no attribute 'shape'"

dynamicTimeWarping.py:

from dtaidistance import dtw
from dtaidistance import clustering
import numpy as np

s = np.array([
         [0, 0, 1, 2, 1, 0, 1, 0, 0],
         [0, 1, 2, 0, 0, 0, 0, 0, 0],
         [1, 2, 0, 0, 0, 0, 0, 1, 1],
         [0, 0, 1, 2, 1, 0, 1, 0, 0],
         [0, 1, 2, 0, 0, 0, 0, 0, 0],
         [1, 2, 0, 0, 0, 0, 0, 1, 1],
         [1, 2, 0, 0, 0, 0, 0, 1, 1]])
         

model = clustering.Hierarchical(dtw.distance_matrix_fast, {})
modelw = clustering.HierarchicalTree(model)
cluster_idx = modelw.fit(s)
modelw.plot("hierarchy.png")

error logs:

(timeSeriesClassification) bash-3.2$ python3 dynamicTimeWarping.py
The compiled dtaidistance C library is not available.
See the documentation for alternative installation options.
Traceback (most recent call last):
  File "dynamicTimeWarping.py", line 17, in <module>
    cluster_idx = modelw.fit(s)
  File "/Users/user/.local/share/virtualenvs/timeSeriesClassification-ight38Tz/lib/python3.7/site-packages/dtaidistance/clustering.py", line 418, in fit
    result = self._model.fit(series, *args, **kwargs)
  File "/Users/user/.local/share/virtualenvs/timeSeriesClassification-ight38Tz/lib/python3.7/site-packages/dtaidistance/clustering.py", line 73, in fit
    pbar = tqdm(total=dists.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'

I should note that this fails the same way if I run the clustering tests included in this repo as well

Clustering example not running

Trying to run the example from the documentation:

# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
# Keep track of full tree by using the HierarchicalTree wrapper class
model2 = clustering.HierarchicalTree(model1)
# You can also pass keyword arguments identical to instantiate a Hierarchical object
model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

and getting an error for any of the proposed models:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-4bb92754361e> in <module>()
      2 model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
      3 # Keep track of full tree by using the HierarchicalTree wrapper class
----> 4 model2 = clustering.HierarchicalTree(model1)
      5 # You can also pass keyword arguments identical to instantiate a Hierarchical object
      6 model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})

/Users/fred/anaconda/lib/python2.7/site-packages/dtaidistance/clustering.pyc in __init__(self, model, **kwargs)
    388         else:
    389             self._model = model
--> 390         super().__init__(**kwargs)
    391         self._model.max_dist = np.inf
    392 

TypeError: super() takes at least 1 argument (0 given)

module' object is not callable

Dear Friend.

I have an error.

dist, cost, path = dtw(mfcc1.T, mfcc2.T)
print("The normalized distance between the two : ",dist)   # 0 for similar audios

TypeError: 'module' object is not callable

Can you help .. to resolve?

I do not understand the colors in the figure generated by dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

What is the meaning of all these shades of green in the figure?

How to adjust the plot size of agglomerative clustering dendrogram?

Simple question. For your clustering.py plot method for hierarchical clustering, I am unable to find a way to change the size of the dendrogram. plt.figure(figsize=(x,y)) does not work.

Can help? Great library btw!

How to solve C library not available

I installed this package and tried the fast implementation of the DTW but I get:

The compiled dtaidistance C library is not available.
See the documentation for alternative installation options

How can I rectify this? I have Cython installed and all other dependencies. The docs don't point to any other specific installations necessary. Thanks,

No tag for 1.2.3

1.2.3 was released but there is no tag, making it harder to download from github.

Cannot import dtaidistance (indent error in code)

Getting this error. Seems like just a syntax thing

Traceback (most recent call last):

File "/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 70, in
from dtaidistance import dtw

File "/opt/conda/lib/python3.6/site-packages/dtaidistance/init.py", line 19, in
from . import dtw

File "/opt/conda/lib/python3.6/site-packages/dtaidistance/dtw.py", line 23
s from . import dtw_c
^
IndentationError: expected an indented block

What is happening with the dtw distance?

I was just comparing results with other dynamic time warping libraries and noticed significant differences in the dtw distances, despite similar paths. Am I missing something or is there an error somewhere in your distance calculation?

dist1,paths= dtw.warping_paths(s1,s2)
path = dtw.best_path(paths) 
print(dist1)

dist2 = 0
for [a, b] in path:
    dist2 += abs(s1[a]-s2[b])
print(dist2)

dist1 and dist2 are vastly different in my examples.

Error in installation of dtaidistance

**----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5FU_xZ/dtaidistance/**

pip install dtaidistance
Collecting dtaidistance
Using cached https://files.pythonhosted.org/packages/75/ad/458d751a5d4842e3f7aa0ad6f79ee0219683d0b28ebc0d882f4106436a10/dtaidistance-1.1.4.tar.gz
Complete output from command python setup.py egg_info:
/home/lokesh/.local/lib/python2.7/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-5FU_xZ/dtaidistance/dtaidistance/dtw_c.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Compiling dtaidistance/dtw_c.pyx because it depends on /home/lokesh/.local/lib/python2.7/site-packages/Cython/Includes/numpy/init.pxd.
[1/1] Cythonizing dtaidistance/dtw_c.pyx
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-5FU_xZ/dtaidistance/setup.py", line 139, in
with open('dtaidistance/init.py', 'r', encoding='utf-8') as fd:
TypeError: 'encoding' is an invalid keyword argument for this function

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5FU_xZ/dtaidistance/

Need to install with pip3, otherwise error

Hi,

When I tried installing with pip I got the following error:

line 263, in
with open('dtaidistance/init.py', 'r', encoding='utf-8') as fd:
TypeError: 'encoding' is an invalid keyword argument for this function

I tried reinstalling with pip3 and it worked.

Please edit the install doc to "pip3."

Thx.
~Jon

The psi parameter not work in dtaidistance.dtw.distance_matrix_fast

According to the MODULES document, psi should work on the c base distance matrix module,

dtaidistance.dtw.distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None, max_step=None, penalty=None, psi=None, block=None, compact=False, parallel=True, show_progress=False)

However, I found that the psi parameter do not work on dtaidistance.dtw.distance_matrix_fast module.

s = [np.array([0., 0, 1, 2, 1, 0, 1, 0, 0]),
     np.array([9., 0, 1, 2, 1, 0, 1, 0, 9])]

dtw.distance_matrix(s,compact=True,psi=1)
dtw.distance_matrix_fast(s,compact=True,psi=1)

[0.]
[12.72792206]

And when I look into the dtw_c.pyx file, I didn't found the psi code included in distance_matrix_nogil function. I am not familiar with c so I maybe read the code wrongly.

I am using Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] and dtaidistance v1.2.

I really need this features, is there any workaround?
~Thank you.

dtw.distance_fast() return best_path

Is it possible for the C implementation of dtw distance to return not only the final distance but also the best_path or all paths? Because besides having the final distance, I would be interested in analysing the individual distances over time. Would that be possible?

PrepReadme and MySDistCommand issues in setup.py

I just attempted to install dtaidistance once again on a Windows 10 machine and ran into the issue that the variables "PrepReadme " and "MySDistCommand" could not be found. Thus, I simply removed the following lines (line 282 and line 283) from setup.py:

        'readme': PrepReadme,
        'sdist': MySDistCommand,

After doing so, I was able to install dtaidistance from source. Could you please have a look at this issue and adjust the file accordingly? Thanks a lot.

Why DTW distance are so different when using or not use_c flag

According to the documentation, when this flag is set to True only means that distance function use precompiled functions in C.
But when I check both values they are so different as we can check in the image bellow.

I assume that this is a wrong behavior, I´m wrong?
Could you tell me, if I'm wrong, what makes this difference?

Thanks in advance!

clustering does not work

I'm using your example but the result is: (None, None) Why?
series = np.matrix([
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1],
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1]])

from dtaidistance import clustering
from dtaidistance import dtw

Custom Hierarchical clustering

model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)

Augment Hierarchical object to keep track of the full tree

model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
cluster_idx = model2.fit(series)

SciPy linkage clustering

model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

model2.plot("hierarchy.png")

(None, None)

basic clustering example fails with error NoneType has no len()

Resolved. See suggestion for doc clarification below.

Original question:

Hi,

When I run this clustering example code provided in the docs:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0])
s3 = np.array([0.0, 0, 1, 2, 1, 0, 0, 0])
series = [s1, s2, s3]

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix, {})
# Keep track of full tree by using the HierarchicalTree wrapper class
model2a = clustering.HierarchicalTree(model1)
# You can also pass keyword arguments identical to instantiate a Hierarchical object
model2b = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix, dists_options={})
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix, {})
cluster_idx = model3.fit(series)

model2a.plot("hierarchy.png")

I get the following error traced back to clustering.py:

 line 220, in plot
    self._series_y = [0] * len(self.series)
TypeError: object of type 'NoneType' has no len()

I get the same error when I use distance_matrix_fast.

Can you help me?

SOLUTION:

The above example fit model3, but not model 2b which it seeks to plot. You need to call

var = model2a.fit(series)

then you can plot it.

Updating the doc accordingly would help others avoid this error.

how to reformat dense dendrogam plot

Is there a recommended way to reformat the dendogram plot if there are many time series? Mine is quite dense and hard to read. My code:

im_name = 'pic.png'
cluster_model.plot(im_name, show_ts_label=names, show_tr_label=True)
ax.imshow(plt.imread(im_name))

Thanks!

In clustering, why 'complete' linkage as opposed to other methods?

Hello,

I noticed that in clustering.py, you use 'complete' linkage rather than single, average or ward. Would you be so kind as to explain why you chose complete linkage? What were the considerations that went into your making this choice?

~Thank you.

How exactly is the distance added by a penalty calculated?

For example, if I started with

from dtaidistance import dtw
s1 = [1,2,3,4,1,2,3]
s2 = [2,3,4,1,2,3,4]
distance, paths = dtw.warping_paths(s1, s2,penalty=0)

The resulting distance was 1.4142135623730951.

When I set penalty = 10, the distance would be 3.872983346207417.

How did the distance go from 1.4142135623730951 to 3.872983346207417?
A paper or any other references would be very helpful.

Thanks!

different length of numpy array plot support

Hi experts, my code failed with error message
"ValueError: operands could not be broadcast together with shapes (3,) (9,)"

Is there any workaround ?

from dtaidistance import dtw
import numpy as np
series = np.array([
     np.array([1, 2, 1]),
     np.array([0., 1, 2, 0, 0, 0, 0, 0, 0]),
     np.array([1., 2, 0, 0, 0, 0, 0, 1, 1, 3, 4, 5]),
     np.array([0., 0, 1, 2, 1, 0, 1]),
     np.array([0., 1, 2, 0, 0, 0, 0, 0]),
     np.array([1., 2, 0, 0, 0, 0, 0, 1, 1])])
ds = dtw.distance_matrix(series2)
print(ds)

model3 = clustering.LinkageTree(dtw.distance_matrix, {})
cluster_idx = model.fit(series)
print(cluster_idx)
model.plot(img_path+"model-test.png", show_ts_label=True)

Best regards,
Keita

how can I install the package with Windows environment

thanks

Missing test file

The tests require a file synthetic_control.data. However, this file is not included in the github sources nor can I find any explanation on where to obtain this file. It would be helpful if it was in the github source tree, that would allow CI tests to be run.

Memory freeing issue

I'm using Python 3.6 (with Anaconda) on a Windows 8.1 and a Windows 10 machine at work. I'm dealing with rather large data sets (roughly 10 data sets with 10,000 sequences consisting of 3,000 samples each). dtaidistance is working as expected but whenever I set the "use_c" parameter of the "distance" function, I can observe in the task-manager that the corresponding Python process slowly eats up all available RAM.

If I want to run datidistance on several data sets overnight, it regularly simply crashes after all my available RAM (roughly 32 GB) has been used.

Could it be possible that somehow the memory isn't freed again properly after it has been used?

compact only keyword for dtw_distance, not dtw_distance_fast

In the docs on page "Dynamic Time Warping (DTW", in the section entitled "DTW between set of series", after giving an example using dtw.distance_matrix_fast, the doc says:

"This behaviour can be deactivated by setting the argument compact to true," implying that compact can be set to true in the previous example, which is of distance_matrix_fast.

However, compact is only a keyword in dtw.distance_matrix, and not in the distance_matrix_fast. If you try setting compact to True in distance_matrix_fast, you will get an error. So as not to be misleading, I suggest clarifying this in the doc.

Parallel computation not work.

I'm running Ubuntu 18.04, and I compiled dtaidistance from GitHub. When logging info level, it's confirmed that the C libraries installed and parallel computation is enabled, but htop shows a single core running when running model = clustering.LinkageTree(dtw.distance_matrix_fast, {'parallel':True}). Any thoughts on why parallelization isn't working? Or how to debug it?

INFO:be.kuleuven.dtai.distance:Computing distances
INFO:be.kuleuven.dtai.distance:Compute distances in pure C
INFO:be.kuleuven.dtai.distance:Use parallel computation

C library is not available (Not handled yet)

Hi,
I have already installed python: 3.7.1 with numpy: 1.16.1, Cython: 0.29.6, dtaidistance: 1.1.4. I have tried to install dtaidistance by pip, GitHub and also From Sources. But when I want to use "distance_matrix_fast", I get this error:
"The compiled dtaidistance C library is not available.
See the documentation for alternative installation options."

I am using Mac OS: 10:14.
and here is my log when I try to install dtaidistance by using pip:

PyPI package missing LICENSE file

The current PyPI package ( https://pypi.org/project/dtaidistance/1.1.2/#files ) is missing a copy of the LICENSE file. This appanesrld causes the build to fail:

error: [Errno 2] No such file or directory: 'LICENSE'

Failed building wheel for dtaidistance

error in lb_keogh function

Should be

if ci > ui:
t += abs(ci - ui)
elif ci < li:
t += abs(ci - li)
else:
pass
#donothing

Multivariate float/int time series

Hi,
Is it working for multivariate time series using a list of list as input or an array with dimensions "nb_samples * nb_features" for instance ?
Thanks !

Memory Error

I am getting a memory error when trying to calculate DTW distances for a numpy matrix of 2.2 million series with 157 time points. I have 400 GB of RAM available, with < 10% of utilization, and immediately get the error when trying to run dtw.distance_matrix_fast. Here is the traceback:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-64-51fa732ba330> in <module>
      2 dists = dtw.distance_matrix_fast(df_numpy, window=15,
      3                                  parallel=True,
----> 4                                  show_progress=True)
      5 dists[dists == np.inf] = 0
      6 dists = dists + dists.T - np.diag(np.diag(dists))

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw.py in distance_matrix_fast(s, max_dist, max_length_diff, window, max_step, penalty, psi, block, parallel, show_progress)
    449                            window=window, max_step=max_step, penalty=penalty, psi=psi,
    450                            block=block, parallel=parallel,
--> 451                            use_c=True, use_nogil=True, show_progress=show_progress)
    452 
    453 

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw.py in distance_matrix(s, max_dist, max_length_diff, window, max_step, penalty, psi, block, parallel, use_c, use_nogil, show_progress)
    367         if parallel:
    368             logger.info("Use parallel computation")
--> 369             dists = dtw_c.distance_matrix_nogil_p(s, **dist_opts)
    370         else:
    371             logger.info("Use serial computation")

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw_c.pyx in dtaidistance.dtw_c.distance_matrix_nogil_p()

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw_c.pyx in dtaidistance.dtw_c.distance_matrix_nogil()

MemoryError:

Any suggestions? Is it not feasible to run DTW on a dataset this large?

ValueError: negative dimensions are not allowed

We are getting a weird error using this library that I cannot debug. I have tried to set the logging level to debug and seeing what I can find, but no luck.

INFO:be.kuleuven.dtai.distance:Computing distances
INFO:be.kuleuven.dtai.distance:Compute distances in pure C (parallel=True)
Traceback (most recent call last):
  File "rt/clustering/rt_cluster_trainer.py", line 250, in <module>
    clustering.run()
  File "rt/clustering/rt_cluster_trainer.py", line 183, in run
    model = cluster_library.train()
  File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 241, in train
    models = self.perform_clustering(self.buckets, self.kvalues, self.linkage_method)
  File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 208, in perform_clustering
    y = self.calculate_distance_matrix()
  File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 199, in calculate_distance_matrix
    y = dtw.distance_matrix_fast(self.frequency_counts, window=self.dtw_radius, compact=True)
  File "/opt/grok/ve3/lib/python3.7/site-packages/dtaidistance/dtw.py", line 548, in distance_matrix_fast
    use_c=True, use_nogil=True, show_progress=False)
  File "/opt/grok/ve3/lib/python3.7/site-packages/dtaidistance/dtw.py", line 416, in distance_matrix
    dists = dtw_c.distance_matrix_nogil(s, is_parallel=parallel, **dist_opts)
  File "dtaidistance/dtw_c.pyx", line 586, in dtaidistance.dtw_c.distance_matrix_nogil
  File "dtaidistance/dtw_c.pyx", line 657, in dtaidistance.dtw_c.distance_matrix_nogil_c_p
ValueError: negative dimensions are not allowed

We are using the library exactly like the documentation

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

Our distance matrix is huge, so I cannot post it here, but I have written some code that tests each timeseries fed into the algorithm and they are all the same shape (307,), there are no negative values, there are no NaN.

Can someone shed some light on what this error actually means? There is nothing wrong with our data as far as we know...If we take random samples of the data, say only 50% of it, excluding many timeseries, it seems to work fine. Is there a single timeseries numpy array breaking it?

compact only keyword for dtw_distance, not dtw_distance_fast

In the docs on page

Is there any C-implementation for ndim DTW?

Hi,
First, thanks for the great repo!

Maybe I missed, but there is any C-implementation for DTW in the n-dimensional case?

I found 'dtw_ndim.py' which handles ndim case, but 'dtw_c' (at line 22) are not used at all in this file.

(For the 1-dim case, I can see the C-implementation)

Thanks!

Plotting fails with uneven-length time series

The linkage tree plot will fail, due to a bug in the module clustering.py, line 227. Numpy doesn't know what to do with series of uneven lengths when calculating the max.

The easiest solution is to concatenate the arrays first, and then continue:

all_y = np.concatenate(self.series)
max_y = max(all_y.max(), np.abs(all_y.min()))

How to use it for multivariate time series?

Got it!

will it work for multivariate time series classification for example mixture of categorical and continues data?

will it work for  multivariate  time series classification for example mixture of categorical and continues data? 
for example at time t1 we have observation: red, 2.4 , 5, 12.456 and time t2: green, 3.5, 2, 45.78; time t3: black, 5.6, 7, 23.56; t4: red, 2.1, 5, 12.6 ?

numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

I get the following error whenever I attempt to run DTW:

numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

The traceback looks as follows:

Traceback (most recent call last):
  File "D:\Projects\MyProject\Main.py", line 3, in <module>
    import Algorithms
  File "D:\Projects\MyProject\Algorithms.py", line 3, in <module>
    from dtaidistance import dtw, dtw_visualisation as dtwvis, clustering
  File "D:\Programs\Anaconda\Anaconda\lib\site-packages\dtaidistance-1.1.4-py3.6-win-amd64.egg\dtaidistance\__init__.py", line 19, in <module>
    from . import dtw
  File "D:\Programs\Anaconda\Anaconda\lib\site-packages\dtaidistance-1.1.4-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 23, in <module>
    from . import dtw_c
  File "__init__.pxd", line 918, in init dtaidistance.dtw_c
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

The system I'm using is Windows 10.

P.S: I wrote before that all current errors are fixed for me but apparently I was still somehow using an older version. Sorry for the confusion.

Link error with pip install from PyPI

Generating code Finished generating code LINK : fatal error LNK1158: cannot run 'rc.exe' error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1158

Cannot install this package due to this error when building the wheel. Do you know how to fix it?

There were problems running the Clustering sample code in jupyterlab

I hope you can help with the problems encountered in running the sample code for Clustering. Thanks!

model2.plot("hierarchy.png")

ValueError Traceback (most recent call last)
in
----> 1 model2.plot("hierarchy.png")

F:\Mysoftware\Anaconda3\lib\site-packages\dtaidistance\clustering.py in plot(self, filename, axes, ts_height, bottom_margin, top_margin, ts_left_margin, ts_sample_length, tr_label_margin, tr_left_margin, ts_label_margin, show_ts_label, show_tr_label, cmap, ts_color)
366 if isinstance(filename, Path):
367 filename = str(filename)
--> 368 plt.savefig(filename, bbox_inches='tight', pad_inches=0)
369 plt.close()
370 fig, ax = None, None

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\pyplot.py in savefig(*args, **kwargs)
714 def savefig(*args, **kwargs):
715 fig = gcf()
--> 716 res = fig.savefig(*args, **kwargs)
717 fig.canvas.draw_idle() # need this if 'transparent=True' to reset colors
718 return res

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in savefig(self, fname, transparent, **kwargs)
2178 self.patch.set_visible(frameon)
2179
-> 2180 self.canvas.print_figure(fname, **kwargs)
2181
2182 if frameon:

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
2058 bbox_artists = kwargs.pop("bbox_extra_artists", None)
2059 bbox_inches = self.figure.get_tightbbox(renderer,
-> 2060 bbox_extra_artists=bbox_artists)
2061 pad = kwargs.pop("pad_inches", None)
2062 if pad is None:

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_tightbbox(self, renderer, bbox_extra_artists)
2359 bb = []
2360 if bbox_extra_artists is None:
-> 2361 artists = self.get_default_bbox_extra_artists()
2362 else:
2363 artists = bbox_extra_artists

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_default_bbox_extra_artists(self)
2330 bbox_artists.extend(ax.get_default_bbox_extra_artists())
2331 # we don't want the figure's patch to influence the bbox calculation
-> 2332 bbox_artists.remove(self.patch)
2333 return bbox_artists
2334

ValueError: list.remove(x): x not in list

ValueError Traceback (most recent call last)
F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\formatters.py in call(self, obj)
339 pass
340 else:
--> 341 return printer(obj)
342 # Finally look for special method names
343 method = get_real_method(obj, self.print_method)

F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\pylabtools.py in (fig)
242
243 if 'png' in formats:
--> 244 png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))
245 if 'retina' in formats or 'png2x' in formats:
246 png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs))

F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\pylabtools.py in print_figure(fig, fmt, bbox_inches, **kwargs)
126
127 bytes_io = BytesIO()
--> 128 fig.canvas.print_figure(bytes_io, **kw)
129 data = bytes_io.getvalue()
130 if fmt == 'svg':

ValueError: list.remove(x): x not in list

How exactly does dtaidistance keep track of labels?

Hi there! I am using your library for my master's thesis. Well, at least I am trying to. I almost have it done, I browsed the documentation, the source code and the closed issues - but I can't seem to find the solution to these three problems.

How does your library keep track of the labels??

I want to make dead sure the labels are matching the time series ID that I have. That's because the time series have pre-defined groups and I am checking whether the groups match the clusters. I did some manual checks, and it looks like I get it, but since I am posting anyway figured it's better to make double sure.

Is there a possibility to color the time series line charts by label?

Let's say I have 1000 time series and they all have a label: A, B or C. I would like A series to be red, B to be blue, C to be green.

Can I extract a list of labels at each cluster?

I would like to know which time series are grouped together, and at what level of clustering certain attributes of my time-series make them cluster together.

Here is my code:

#Thats the database, the values I am clustering on is a floating point in NDVI column
database = pd.read_excel(r"data.xlsx")
data=database[['NDVI', 'Data', 'ID', 'TextAttribute']].dropna()
data.sort_values(by=['ID','Data'], ascending=True)

#Labels
duplicates = data.drop_duplicates(subset=['ID','TextAttribute'])
labels = duplicates['PartID'].tolist()

values = data.groupby('ID')['NDVI'].apply(lambda x: x.to_numpy())
series = [x.astype(np.double) for x in L]

#The clustering
model = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model.fit(series)

#Plotting
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(20, 250), dpi=300)
model.plot(filename='rawdata_all_dtwvis_id.pdf', axes=ax, show_ts_label=labels,
           show_tr_label=True, ts_label_margin=-35,
           ts_left_margin=30)

problem on import dtaidistance

when I tried to import dtaidistance I got this error
Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 1, in
import dtaidistance as dtw

File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/init.py", line 19, in
from . import dtw

File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/dtw.py", line 17, in
from .util import SeriesContainer, dtaidistance_dir

File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/util.py", line 36
logger.debug(f"Using directory: {directory}")
^
SyntaxError: invalid syntax

wannesm / dtaidistance Goto Github PK

dtaidistance's People

Contributors

Stargazers

Watchers

Forkers

dtaidistance's Issues

Custom Hierarchical clustering

Augment Hierarchical object to keep track of the full tree

SciPy linkage clustering

model2.plot("hierarchy.png")

ValueError: list.remove(x): x not in list

Recommend Projects

Recommend Topics

Recommend Org