mparat / scipy-cluster Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/scipy-cluster
License: Other
Automatically exported from code.google.com/p/scipy-cluster
License: Other
What steps will reproduce the problem?
1.python -v
2.import hcluster
What do you see instead?
dlopen("/home/jgeiss/.local/lib/python2.6/site-packages/_hierarchy_wrap.so", 2);
Speicherzugriffsfehler (Segmentation fault)
What version of the product are you using? On what operating system?
hcluster0.2.0
python 2.6.5
Ubuntu 10.04.4 LTS
numpy 1.3.0
scipy 0.7.0
matplotlib 0.99.1.1
Please provide any additional information below.
it works with
hcluster0.2.0,
python 2.7.3
ubuntu 12.04
numpy 1.6.1
scipy 0.9.0
matplotlib 1.1.1rc
DO I need to update any of the other packages?
Thanks,
Johanna
Original issue reported on code.google.com by [email protected]
on 13 Jan 2015 at 7:59
Not sure if this is already doable, but some docs on that usage case would be
great. I.e like the
scipy-cluster vq function, given a feature vector which cluster does it fall in.
Thank,
Loki
Original issue reported on code.google.com by [email protected]
on 21 May 2008 at 7:08
I'm not sure this is the right place for that but I had some issues to compile
from the source as I needed to install the package python-dev beforehand.
Guess you should add it in the list of required package for Ubuntu (at least
10.10).
Bests,
Clément
Original issue reported on code.google.com by clement.grimal
on 9 Nov 2010 at 3:44
Using hcluster.pdist with the Canberra distance does not work as the input
matrix is converted to bool instead of double.
See the patch for a fix.
P.S. Thanks for a great package.
Original issue reported on code.google.com by [email protected]
on 1 Aug 2008 at 12:03
Attachments:
On a Mac OSX 10.6 (snow leopard) machine, with Numpy and other stuff updated
and new,
importing hcluster leads to a segmentation fault. Re-installing did not help.
Any advice?
Thx!
Original issue reported on code.google.com by [email protected]
on 13 Sep 2009 at 6:50
Instead use 'color_threshold' unlike shown in the tutorial.
Good tool, very useful.
Thanks
Original issue reported on code.google.com by [email protected]
on 23 Nov 2009 at 6:41
Loki Davison noted that the docs say the centroid, median, and ward linkage
functions can take condensed distance matrices to do the linkage. This is
not correct and corrections to these docs will appear in the next release.
Damian
Original issue reported on code.google.com by [email protected]
on 23 May 2008 at 1:18
I can "python setup.py install", but
>>> import hcluster
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "hcluster\__init__.py", line 1, in <module>
import hierarchy as _h
File "hcluster\hierarchy.py", line 198, in <module>
import _hierarchy_wrap, types
ImportError: No module named _hierarchy_wrap
>>> from hcluster import pdist, linkage, dendrogram
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "hcluster\__init__.py", line 1, in <module>
import hierarchy as _h
File "hcluster\hierarchy.py", line 198, in <module>
import _hierarchy_wrap, types
ImportError: No module named _hierarchy_wrap
Original issue reported on code.google.com by [email protected]
on 19 Jan 2013 at 7:57
Any chance an update can be made for python 2.7 on windows? I tried to compile
from source but was unable to do so.
Original issue reported on code.google.com by [email protected]
on 16 Jan 2013 at 5:59
What steps will reproduce the problem?
1.easy_install hcluster
What is the expected output? What do you see instead?
Expected output is one nicely installed hcluster module.
What actually happens is:
cc1: error: unrecognized command line option "-Wno-long-double"
What version of the product are you using? On what operating system?
hcluster-0.2.0.zip
i686-apple-darwin10-gcc-4.2.1
OS X Snow Leopard
Original issue reported on code.google.com by [email protected]
on 6 Nov 2009 at 10:02
Could you provide data export functions for tress?
Something like the Newick format or some other supported by standalone tre
drawing programs.
Original issue reported on code.google.com by fccoelho
on 23 Jun 2009 at 10:22
$ python setup.py install --root=installroot/
No paths in the python path contain numpy/arrayobject.h
$ echo $?
0
$
The exit code should be non-zero .. this affects the behavior of programs
(ones that automate building of Python packages, for instance) that rely on
proper exit codes of setup.py.
The culprit is:
valid_paths = filter(contains_arrayobject_h, sys.path)
if len(valid_paths) == 0:
print "No paths in the python path contain numpy/arrayobject.h"
sys.exit(0) <--- change this to 1, or just 'raise RuntimeError'
Original issue reported on code.google.com by [email protected]
on 1 Jun 2009 at 5:58
What steps will reproduce the problem?
1. dendrogram(Z)
2.
3.
What is the expected output? What do you see instead?
I expected to get an image of the dendrogram.
I'm getting just a dictionary containing the colors, coords, etc...
How one can get the image? I guess via matplotlib... I could not do it.
What version of the product are you using? On what operating system?
hcluster 0.1.9
Ubuntu
Original issue reported on code.google.com by [email protected]
on 23 Sep 2008 at 6:25
What steps will reproduce the problem?
import numpy
import hcluster
x1 = numpy.random.randn(10,)
x2 = numpy.random.randn(10,)
xx = numpy.vstack((x1, x2))
# first error
hcluster.correlation(x1, x2)
# second error
hcluster.pdist(xx, 'correlation')
What is the expected output? What do you see instead?
I expected 1-pearson correlation coeff.
Error #1
usr/lib/python2.5/site-packages/hcluster/cluster.py in correlation(u, v)
830 vm = v - vmu
831 return 1.0 - (scipy.dot(um, vm.T) / (math.sqrt(scipy.dot(um,
vm).T)) \
--> 832 * math.sqrt(scipy.dot(vm, vm.T)))
Error #2
usr/lib/python2.5/site-packages/hcluster/cluster.py in pdist(X, metric, p,
V, VI)
1372 dm = squareform(dm)
1373 elif mstr in set(['correlation', 'co']):
-> 1374 X2 = X - numpy.repmat(numpy.mean(X, axis=1).reshape(m,
1), 1, n)
1375 norms = numpy.sqrt(numpy.sum(X2 * X2, axis=1))
1376 _cluster_wrap.pdist_cosine_wrap(X2, dm, norms)
<type 'exceptions.AttributeError'>: 'module' object has no attribute 'repmat'
What version of the product are you using? On what operating system?
Python2.5, numpy1.0.5.dev, hcluster (current svn), linux, 32
Please provide any additional information below.
I dont really get the documentation with the manhatten norm and all :-), Im
just assuming 1-pers.corr.coeff. If thats right, here is my fix (diff):
- cluster.py (revision 90)
+++ cluster.py (working copy)
@@ -828,8 +828,8 @@
umu = u.mean()
um = u - umu
vm = v - vmu
- return 1.0 - (scipy.dot(um, vm.T) / (math.sqrt(scipy.dot(um, vm).T)) \
- * math.sqrt(scipy.dot(vm, vm.T)))
+ return 1.0 - (scipy.dot(um, vm.T) / ((math.sqrt(scipy.dot(um, um.T))) \
+ * math.sqrt(scipy.dot(vm, vm.T))))
def hamming(u, v):
"""
@@ -1371,7 +1371,7 @@
dm[xrange(0,m),xrange(0,m)] = 0
dm = squareform(dm)
elif mstr in set(['correlation', 'co']):
- X2 = X - numpy.repmat(numpy.mean(X, axis=1).reshape(m, 1), 1, n)
+ X2 = X - X.mean(1)[:,numpy.newaxis]
norms = numpy.sqrt(numpy.sum(X2 * X2, axis=1))
_cluster_wrap.pdist_cosine_wrap(X2, dm, norms)
elif mstr in set(['mahalanobis', 'mahal', 'mah']):
Arnar
[email protected]
Original issue reported on code.google.com by [email protected]
on 20 Feb 2008 at 3:14
When you give pdist() a function (instead of a string), it confirms this by
checking that it's a "types.FunctionType". However, it is useful to be able to
provide a c-compiled function, which has type "types.BuiltinFunctionType".
Therefore, I suggest that
if mtype is types.FunctionType:
(on line 1079 in distance.py in revision 132) should be changed to :
if mtype in (types.FunctionType, types.BuiltinFunctionType):
or even
if hasattr(metric, '__call__'):
so as to allow any callable.
Original issue reported on code.google.com by [email protected]
on 10 Feb 2011 at 4:17
Building on Windows with Visual Studio 2008 for 64-bit I encountered the
following portability problems with hcluster 0.20's C.
* Use of 'inline' keyword in distance.c/hierarchy.c - Python 2.7 requires
building with VS2008, which only accepts 'inline' as a keyword if the file is
named "*.cpp". Changing it to "__inline" in all cases fixed the problem. Not
sure how to make it portable using distutils, I'd expect that's a common need.
See http://msdn.microsoft.com/en-us/library/z8y1yy88(v=vs.90).aspx
* Some variable declarations aren't C89-friendly (which s what VS2008 adheres
to :( ). Moving them to the start of the block in all cases doesn't hurt
readability much and does make it portable enough:
...
xi = inds[i];
cnode *xnd = info->nodes + xi;
xn = xnd->n;
...
->
cnode *xnd;
...
xi = inds[i];
xnd = info->nodes + xi;
xn = xnd->n;
After this, I was able to successfully build it.
Original issue reported on code.google.com by [email protected]
on 6 Jan 2014 at 9:42
When trying to list all leafs in a given subtree of the merge tree created by
to_tree, it fails in some cases with 'Index out of bounds'
What steps will reproduce the problem?
Run the following code to get the exception:
from hcluster import *
dist = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
x = linkage(dist)
t = to_tree(x)
t.right.right.pre_order()
What version of the product are you using? On what operating system?
I'm using the port py26-hcluster version 0.2.0 on OS X 10.6 with Python 2.6.
Please provide any additional information below.
Looking at hierarchy.py (revision 132), it looks like the problem is that the
lists lvisited and rvisited (line 752 and 753) are indexed using node IDs,
while their size is 2*n (n is the size of the subtree in that node). If the
tree is large enough, the node ID of the (for example) rightmost leaf is
actually larger than 2*n for most subtrees containing it.
Original issue reported on code.google.com by [email protected]
on 20 Jul 2010 at 9:30
It would be good to add a prominent note to the front page that the package is
available as a part of scipy.
Some users may find installing Scipy a better option, because it is commonly
available pre-packaged and more actively maintained.
Original issue reported on code.google.com by [email protected]
on 17 Jun 2011 at 12:55
Hi,
first of all, thanks for the great module!
I was wondering if is possible to do incremental HAC with hcluster. Or if
there is any quick workaround.
Thanks in advance,
Manos._
Original issue reported on code.google.com by [email protected]
on 21 Sep 2009 at 10:00
What steps will reproduce the problem?
1. Feed a float32 array into pdist
no errors are raised, but pdist returns an array with meaningless results.
What is the expected output? What do you see instead?
scipy-cluster should
- upgrade float32 to float64 or use float32 nativly
What version of the product are you using? On what operating system?
hcluster-0.1.4 on linux compiled from source
Cheers,
Marcin
Original issue reported on code.google.com by [email protected]
on 19 Mar 2008 at 9:03
Running python setup.py:
hcluster/cluster.c:90:20: error: malloc.h: No such file or directory
Changing line 90 to
#include <stdlib.h>
and it compiles nicely. Still have to test the whole functionality though ;)
BTW I'm on Mac OS 10.5.2 but this should apply to all recent version.
See also: http://developer.apple.com/technotes/tn2002/tn2071.html
Kind regards,
Daniel
Original issue reported on code.google.com by [email protected]
on 22 Mar 2008 at 3:45
Would we useful to have some metrics in order to evaluate the output
Original issue reported on code.google.com by [email protected]
on 11 Sep 2010 at 3:41
What steps will reproduce the problem?
1. python setup.py build with duplicate directories in sys.path
What is the expected output? What do you see instead?
$ python setup.py build
There are several valid include directories containing numpy/arrayobject.h
Traceback (most recent call last):
File "setup.py", line 36, in <module>
s = input('Selection [default=1]:' % s)
TypeError: not all arguments converted during string formatting
What version of the product are you using? On what operating system?
Mac OSX
Please provide any additional information below.
adding the following line to setup.py avoids this issue:
valid_paths = dict(map(lambda i: (i,1),valid_paths)).keys()
Original issue reported on code.google.com by [email protected]
on 10 Oct 2008 at 5:44
Hi,
try the following code, it yields weird results,
see the attached files.
Best regards,
Petr Danecek
---------------------------------------------------
from pylab import *
from hcluster import pdist, linkage, dendrogram
import numpy
from numpy.random import rand
Y = [174,181,218,150,199,205,119,212,121,148]
for i in range(len(Y)):
Y[i] = (500-Y[i])/500.
Z = linkage(Y,method='complete')
dendrogram(Z)
print Z
savefig('_test-complete.png')
Z = linkage(Y,method='average')
dendrogram(Z)
print Z
savefig('_test-average.png')
Z = linkage(Y,method='weighted')
dendrogram(Z)
print Z
savefig('_test-weighted.png')
Z = linkage(Y,method='single')
dendrogram(Z)
print Z
savefig('_test-single.png')
---------------------------------------------------
Original issue reported on code.google.com by [email protected]
on 15 Jan 2009 at 5:34
Attachments:
What steps will reproduce the problem?
1.As part of crosscat install, New bayesDB project (although same if done
independently vis pip install)
setup.py just hangs the pip log shows no errors (python 2.7.5+)
2.
3.
What is the expected output? What do you see instead?
Wish I had more info to provide..
What version of the product are you using? On what operating system?
PIP install of 0.20
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 7 Dec 2013 at 10:45
What steps will reproduce the problem?
R = dendrogram(Z, labels = ['test' for i in range(0,150)])
/usr/lib64/python2.4/site-packages/hcluster/cluster.pyc in
_append_singleton_leaf_node(Z, p, n, level, lvs, ivl, leaf_label_func, i,
labels)
2532 # for the leaf nodes, use it.
2533 if labels is not None:
-> 2534 ivl.append(labels[i-n])
2535 else:
2536 # Otherwise, use the id as the label for the leaf.x
TypeError: list indices must be integers
Solution:
Change row 2534 to:
ivl.append(labels[int(i-n)])
Original issue reported on code.google.com by [email protected]
on 18 Feb 2008 at 1:46
The fcluster function return clusters numbered 1..n. Python counts from 0
and so does scipy.vq.kmeans2.
If scipy-cluster will be included in scipy and vq is there to stay it would
be nice to unify the numbering.
Marcin
Original issue reported on code.google.com by [email protected]
on 23 Mar 2008 at 11:52
What steps will reproduce the problem?
`pdist(X, 'jaccard')`
What is the expected output? What do you see instead?
Compared with MATLAB's output, there are many differences. The alternative
Python implementation `pdist(X, 'test_jaccard')` generates the correct output.
I need to copy over the code from the Scipy repository and generate a new
release.
Original issue reported on code.google.com by [email protected]
on 18 Apr 2008 at 3:04
What steps will reproduce the problem?
1. Run fclusterdata() on the points loaded from the csv file:
import numpy as np
import hcluster as hc
data = np.loadtxt('/tmp/segfault.csv')
hc.fclusterdata(data, 1)
I get a segfault after a few seconds. Can anyone reproduce? I get the error on
two machines I tried so far. Problem is that it needs a good amount of RAM to
be run.
Machine is Linux #59-Ubuntu SMP x86_64 GNU/Linux.
hcluster 0.2.0, python 2.6.5
numpy 1.6.1
Anyway I could help please let me know.
Original issue reported on code.google.com by [email protected]
on 21 Sep 2011 at 1:33
Attachments:
The spearman rank coefficient should be added to pdist.
Original issue reported on code.google.com by [email protected]
on 12 Apr 2008 at 6:37
Since the core code is in C,
will it be possible to create an interface for an openCV matrix?
Thanks
Original issue reported on code.google.com by [email protected]
on 16 Oct 2010 at 4:04
What steps will reproduce the problem?
1. Try to use fclusterdata
the arguments in the function header are different then those in the
function body it simply cannot work
hcluster-0.1.14, Linux
Cheers once more,
Marcin
Original issue reported on code.google.com by [email protected]
on 19 Mar 2008 at 9:05
After producing an upper triangular distance matrix with pdist, I used
squareform to transform the matrix to square before entering using it as
input to linkage.
So for
Y = pdist(data)
Y_sq = squareform(Y)
linkage(Y_sq) does NOT equal linkage(Y) --
here I expected linkage(Y_sq) == linkage(Y)
I have only read documentation indicating that Y (as upper triangle) is the
standard input to linkage, but using Y_sq yields the result I was expecting
(maybe just a fluke?). Matlab linkage does not accept Y_sq as input. What
goes on when I input Y_sq? why is the result different from using Y?
What version of the product are you using? On what operating system?
I am using hcluster 0.2.0 Mac 10.5
Original issue reported on code.google.com by [email protected]
on 14 Jul 2009 at 9:19
What steps will reproduce the problem?
1. compute clusters
2. execute 'dendrogram' command
What is the expected output? What do you see instead?
I expected to see a dendrogram drawn with Matplotlib. Instead, there is a
small delay while some
kind of calculation takes place, but no output window appears. Matplotlib
works fine when
invoked directly in other scripts on my system.
What version of the product are you using? On what operating system?
I've tried this on two Mac systems, a MacBookPro and a Mac Pro (both Intel
processors). OS
version is 10.5.4. Python installed on both is 2.5.1. Matplotlib is 0.98pre.
I've not tried it on
Linux or Windows, and suspect it may be a platform issue.
Original issue reported on code.google.com by [email protected]
on 12 Sep 2008 at 2:31
apt-get install scipy
apt-get install matplotlib
do not work on Ubuntu 7.10 (Gutsy Gibbon) with the default
/etc/apt/sources.list
To fix it:
apt-get install python-scipy
apt-get install python-matplotlib
Best
Darek Kedra
Original issue reported on code.google.com by [email protected]
on 19 May 2008 at 3:28
The parameter names that were changed were not changed throughout the call
tree. This has been fixed. These functions have been fixed.
Original issue reported on code.google.com by [email protected]
on 29 May 2008 at 9:35
I'm running hcluster v0.2.0 under linux (CentOS5) with python 2.4.3 and
numpy version: 1.0.1.
I have 6 observations that each comprise ~45000 datapoint (i.e., a 6x45000
numpy array). I want to compute the euclidean distances between the 6
observations. When I try:
dists=pdist(data, 'seuclidean')
I encounter the following error:
File "...lib/python2.4/site-packages/hcluster/distance.py", line 1151, in pdist
VV = np.var(X, axis=0, ddof=1)
TypeError: var() got an unexpected keyword argument 'ddof'
Any ideas?
Original issue reported on code.google.com by [email protected]
on 5 Jan 2010 at 5:21
These algorithms operate on dissimilarity matrices as long as they are
Euclidean. The centroid, ward, and median linkage functions should be
modified to support this feature.
Damian
Original issue reported on code.google.com by [email protected]
on 30 May 2008 at 2:14
What steps will reproduce the problem?
`pdist(X, 'hamming')`
`pdist(X, 'jaccard')`
`pdist(X, 'any_implemented_boolean_distance_matrix')`
What is the expected output? What do you see instead?
An exception is generated.
Original issue reported on code.google.com by [email protected]
on 18 Apr 2008 at 3:06
What steps will reproduce the problem?
1. R = dendrogram(Z, color_list=['brown' for i in range(0,150)])
2. All edges in dendrogram are just green
Solution ('hack'):
Line: 2474
if color_list is None:
color_list=[]
Colors are still added within the program, but custom colors are used
first. Maybe there is a better solution for this
Original issue reported on code.google.com by [email protected]
on 18 Feb 2008 at 1:49
I am interested in using this package to cluster sequences. I noticed in
the TODO file, you list that you want to do this as well. One place to go
is to take the implementation of the Levenshtein edit distance from the
py-editdist package. In addition, there is a normalized edit distance that
can be easily implemented from that in this paper:
IEEE Trans Pattern Analys Mach Intel 29(6):1091
I'll see about writing it myself, but my C is quite rusty.
Original issue reported on code.google.com by [email protected]
on 27 Feb 2009 at 1:21
The links to the tutorial and API docs at http://users.soe.ucsc.edu/~eads are
broken. This significantly increases the learning curve for this library. Could
these please be moved to the google code site?
Original issue reported on code.google.com by [email protected]
on 8 Apr 2011 at 1:40
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.