Giter Club home page Giter Club logo

clustering.jl's People

Contributors

alyst avatar annimesh2809 avatar ararslan avatar birm avatar cyocum avatar davidavdav avatar dependabot[bot] avatar github-actions[bot] avatar iainnz avatar ianfiske avatar jejomath avatar jmsteitz avatar johnmyleswhite avatar jumutc avatar kescobo avatar kmsquire avatar lendle avatar lindahua avatar matbesancon avatar nalimilan avatar nomadbl avatar oxinabox avatar palday avatar pluskid avatar rened avatar sambitdash avatar slundberg avatar timholy avatar tlienart avatar wildart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clustering.jl's Issues

Warnings with Julia v"0.4.0"

I'm seeing quite a few warnings from Clustering.jl on Julia 0.4.0

e.g.

WARNING: [a] concatenation is deprecated; use collect(a) instead
 in depwarn at deprecated.jl:73
 in oldstyle_vcat_warning at ./abstractarray.jl:29
 in hclust at /home/jeff/.julia/v0.4/Clustering/src/hclust.jl:334
 in hclust at /home/jeff/.julia/v0.4/Clustering/src/hclust.jl:342
 [inlined code] from /home/jeff/.julia/v0.4/Clustering/test/hclust.jl:7
 in anonymous at no file:0
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 [inlined code] from /home/jeff/.julia/v0.4/Clustering/test/runtests.jl:15
 in anonymous at no file:14
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 in process_options at ./client.jl:308
 in _start at ./client.jl:411
while loading /home/jeff/.julia/v0.4/Clustering/test/hclust.jl, in expression starting on line 6

and

WARNING: int(x) is deprecated, use Int(x) instead.
 in depwarn at deprecated.jl:73
 in int at deprecated.jl:50
 in kmeans! at /home/jeff/.julia/v0.4/Clustering/src/kmeans.jl:37
 in kmeans at /home/jeff/.julia/v0.4/Clustering/src/kmeans.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 [inlined code] from /home/jeff/.julia/v0.4/Clustering/test/runtests.jl:15
 in anonymous at no file:14
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 in process_options at ./client.jl:308
 in _start at ./client.jl:411
while loading /home/jeff/.julia/v0.4/Clustering/test/kmeans.jl, in expression starting on line 15

and

WARNING: deprecated syntax "{a=>b, ...}" at /home/jeff/.julia/v0.4/Clustering/test/hclust_generated_examples.jl:558.
Use "Dict{Any,Any}(a=>b, ...)" instead.

I think these come warnings from changes to the Julia language that appeared around the time of the release candidates. I'm using version 0.4.0 of Clustering.jl, but I get the warnings when I checkout master too.

test failure on OS X with 0.6

The cause is mcl: here is a simple reproducible example:

adj_matrix = [1.0 0.125 1.0 0.0 0.16 0.0; 0.125 1.0 0.0 0.25 0.0 0.16; 1.0 0.0 1.0 0.0 0.2 0.0; 0.0 0.25 0.0 1.0 0.0 0.5; 0.16 0.0 0.2 0.0 1.0 0.0; 0.0 0.16 0.0 0.5 0.0 1.0]
mcl(adj_matrix, display=:verbose, inflation=1.5, expansion=1.5, save_final_matrix=true)

I had a quick look through, my guess is that there is a slight change in BLAS behaviour which causes eig to give subtly different values, and that this error then amplifies, but i don't understand the algorithm well enough to be sure.

DBSCAN gives error on Float32 array

I can only use the DBSCAN algorithm with a Float64 array but not with a Float32 array, that is actually my input.

using Clustering; positions = zeros(Float32, 3, 10); clusters = dbscan(positions, 0.3, min_neighbors=1, min_cluster_size=1, leafsize=20)

ERROR: MethodError: no method matching _dbscan(::NearestNeighbors.KDTree{StaticArrays.SVector{3,Float32},Distances.Euclidean,Float32}, ::Array{Float32,2}, ::Float64; min_neighbors=1, min_cluster_size=1)
Closest candidates are:
_dbscan{T<:AbstractFloat}(::NearestNeighbors.KDTree{V<:AbstractArray{T,1},M<:Union{Distances.Chebyshev,Distances.Cityblock,Distances.Euclidean,Distances.Minkowski},T}, ::Array{T<:AbstractFloat,2}, ::T<:AbstractFloat; min_neighbors, min_cluster_size) at /user/.julia/v0.5/Clustering/src/dbscan.jl:144
in #dbscan#6(::Int64, ::Array{Any,1}, ::Function, ::Array{Float32,2}, ::Float64) at /user/.julia/v0.5/Clustering/src/dbscan.jl:137
in (::Clustering.#kw##dbscan)(::Array{Any,1}, ::Clustering.#dbscan, ::Array{Float32,2}, ::Float64) at ./:0

Offering non-classical hierarchical clustering techniques within scope?

Is it within the scope of this package to provide some more modern HC techniques (e.g. ROCK/CURE, BIRCH, etc)? Classical HC techniques (single-linkage, centroid linkage, etc) lack robustness and are sensitive to noise/outliers, plus their quadratic computational complexities are problematic when applying them to large datasets.

More modern algorithms like CURE can better handle multidimensional data and sophiscated cluster shapes. It has 2000+ citations on Google Scholar so there is definitely a large demand for HC techniques that can handle "big data". Wikipedia has the algorithm's pseudocode (I haven't checked it's validity).

predict and score functions

For some use cases predict and score functions would be helpful.
The predict function should return the assigned clusters for a set of observations like this for kmeans:

function predict(kmresult, X)
    dmat = Distances.pairwise(Distances.SqEuclidean(), kmresult.centers, X)   
    mod(findmin(dmat, 1)[2] .- 1, size(dmat, 1)) .+ 1
end

The score function should assign given observations and return the 1/totalcost for these observations and could look like this for kmeans:

function score(kmresult, X)
    dmat = Distances.pairwise(Distances.SqEuclidean(), kmresult.centers, X)   
    sum(findmin(dmat, 1)[1])
end

Of course, it would be great to have those functions for all of the available clustering algorithms.

ERROR: sample_by_weights not defined

I did not see this error earlier, until I pulled the latest Julia.

julia(113)% julia ~/.julia/Clustering/test/kmeans_t1.jl
non-weighted
ERROR: sample_by_weights not defined
in kmeanspp_initialize! at /home/dr/.julia/Clustering/src/seeding.jl:24
in kmeans at /home/dr/.julia/Clustering/src/kmeans.jl:399
in kmeans at /home/dr/.julia/Clustering/src/kmeans.jl:403
in include_from_node1 at loading.jl:92
in process_options at client.jl:274
in _start at client.jl:349
at /home/dr/.julia/Clustering/test/kmeans_t1.jl:12

Failing on 0.2 according to PackageEvaluator

http://status.julialang.org/

Just tried it on 0.2 myself manually:

idunning@IAIN-DESKTOP:~/.../JuMP/test$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.0 (2013-11-16 23:48 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org release
|__/                   |  x86_64-linux-gnu

julia> Pkg.add("Clustering")
INFO: Cloning cache of Clustering from git://github.com/johnmyleswhite/Clustering.jl.git
INFO: Cloning cache of Distance from git://github.com/JuliaStats/Distance.jl.git
INFO: Cloning cache of NumericExtensions from git://github.com/lindahua/NumericExtensions.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Installing Clustering v0.2.4
INFO: Installing Distance v0.2.6
INFO: Installing NumericExtensions v0.3.6
INFO: Installing StatsBase v0.2.10
INFO: REQUIRE updated.

julia> using Clustering
Warning: could not import Base.foldl into NumericExtensions
Warning: could not import Base.foldr into NumericExtensions
Warning: could not import Base.sum! into NumericExtensions
Warning: could not import Base.maximum! into NumericExtensions
Warning: could not import Base.minimum! into NumericExtensions
ERROR: Distributions not found
 in require at loading.jl:39
 in include at boot.jl:238
at /home/idunning/.julia/Clustering/src/Clustering.jl:4

Bounds Error with Hclust on large Matrix

I'm trying to do hierarchical clustering on large-ish distance matrices. The following works fine:

using Distances
using Clustering

m1 = rand(100,100)
d1 = pairwise(Jaccard(), m1)
c1s = hclust(d1, :single)
c1a = hclust(d1, :average)

I was able to do it on a random table as big as 10k x 10k, but for my actual datatable which is about 12k x 12k, the :single works, but :average hclust produces an error - (eventually, after a rather long time):

BoundsError: attempt to access 8947-element Array{Any,1} at index [8984]
hclust(::Symmetric{Float64,Array{Float64,2}}, ::Symbol) at hclust.jl:338
hclust(::Array{Float64,2}, ::Symbol) at hclust.jl:351
include_string(::String, ::String) at loading.jl:515
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#53{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#53{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#52{Dict{String,Any}})() at task.jl:80

Is this a memory error? I can do hclust in R for the same data, so iI think in principle it should work.

Move to JuliaStats

@johnmyleswhite I am wondering whether you are happy with moving this package to JuliaStats.

The Clustering.jl is one of the ML packages that received relative broader attention. This moving might make it easier to get more support from the community.

How to use k-medoids?

The documentation for k-medoids requires a cost matrix C, and parameter k, the number of clusters. But C must be a kxm matrix, so k can be inferred from C, why are both necessary? Also the matlab version doesn't require C as input at all. And also, C must be re-calculated whenever a new candidate medoid is selected, I don't see any hooks to allow this, is it possible this is half completed, and does't do step 5 like described in the wiki? Or maybe it's expected that step 5 is done outside.

[PackageEvaluator.jl] Your package Clustering may have a testing issue.

This issue is being filed by a script, but if you reply, I will see it.

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their test (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3).

The results of this script are used to generate a package listing enhanced with testing results.

The status of this package, Clustering, on...

  • Julia 0.2 is 'No tests, but package loads.' PackageEvaluator.jl
  • Julia 0.3 is 'No tests, but package loads.' PackageEvaluator.jl

'No tests, but package loads.' can be due to their being no tests (you should write some if you can!) but can also be due to PackageEvaluator not being able to find your tests. Consider adding a test/runtests.jl file.

'Package doesn't load.' is the worst-case scenario. Sometimes this arises because your package doesn't have BinDeps support, or needs something that can't be installed with BinDeps. If this is the case for your package, please file an issue and an exception can be made so your package will not be tested.

This automatically filed issue is a one-off message. Starting soon, issues will only be filed when the testing status of your package changes in a negative direction (gets worse). If you'd like to opt-out of these status-change messages, reply to this message.

Implement fast optimal leaf ordering for Hclust

I recently had need of an implementation of the method in this paper:

We present the first practical algorithm for the optimal linear leaf ordering of trees that are generated by hierarchical clustering. Hierarchical clustering has been extensively used to analyze gene expression data, and we show how optimal leaf ordering can reveal biological structure that is not observed with an existing heuristic ordering method. For a tree with n leaves, there are 2(n-1) linear orderings consistent with the structure of the tree. Our optimal leaf ordering algorithm runs in time O(n(4)), and we present further improvements that make the running time of our algorithm practical.

I'm not sure I did it the most efficient way possible, but for an hclust of a 5k x 5k distance matrix it took ~50 ms (generating the hclust itself was ~1000 ms). See jupyter notebook here

I initially wrote it for my Microbiome package, but I think it makes more sense to live here if you're up for a PR

Plot recipe PR?

I'm in the process of writing a user recipe for Plots.jl to enable plotting of Hclust see here. Generally, it makes sense to have the plotting recipes live in the package the generates the object, but it would require accepting RecipesBase.jl as a dependency.

Before I get too far into the development I was wondering if this would be a PR that you would be willing to take on.

Docs

doc/source/varinfo.rst is not showing up on the readthedocs site for the latest build.
Are there any other functions that are missing from the documentation?
When I built it locally varinfo was there.

Nondetermanistic methods should take a n_init parameter, for how many times to run

Hiya,
In python's sklearn methods like K-Means and K-Mediods take an n_init parameter:
To use their description:

n_init : int, default: 10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs ...

I think it would be good to have that here,
particularly since a single run of k-means is fairly poor as a method for clustering data.
Running it several times and taking the best is common practice.

sparse matrices fail

I get this error when calling kmeans on a sparse matrix:

julia> kmeans(x', 50)                                                                                                                                                         
ERROR: no method kmeans(SparseMatrixCSC{Float32,Int32}, Int64) 

Could this be due to the StoredArray change in julia?

Specific Docs on algo used for K-Means?

Hi, is the algo used for implementing K-Means the naive lloyd iteration? Are there any benefits/necessity of trying other algorithms like pelleg-moore or hamerly? I would like to get started with some API on julia and found the K-Means lib to be a good place to start, is it okay to take it up and work on it?

`kmeans` dispatch problem

The current implementation of kmeans has the following declaration:

function kmeans(X::Matrix, k::Int;
                weights=nothing,
                init=_kmeans_default_init,
                maxiter::Integer=_kmeans_default_maxiter,
                tol::Real=_kmeans_default_tol,
                display::Symbol=_kmeans_default_display)

although it calls the function kmeans! whose declaration is:

function kmeans!{T<:AbstractFloat}(X::Matrix{T}, centers::Matrix{T};
                                   weights=nothing,
                                   maxiter::Integer=_kmeans_default_maxiter,
                                   tol::Real=_kmeans_default_tol,
                                   display::Symbol=_kmeans_default_display)

Here T is a subtype of AbstractFloat. This constraint in T is not present in kmeans which allows us to call kmeans as:

kmeans(rand(Int,3,100), 5)

which throws an error.

I also require kmeans{T} to be constrained with T<:AbstractFloat because I am dispatching on kmeans for ImageSegmentation as:

kmeans{T<:Colorant,N}(x::AbstractArray{T,N}, args...; kwargs...)

Generic clustering interface

Continuing the off topic conversation in #12

In R's cluster package, partitioning method results inherit from a common class which contains cluster assignments, silhouette information, value of the objective at the clustering, dissimilarity matrix, and sometimes the original data matrix. Hierarchical methods also inherit from a common class but there's not much information about it in the manual and I haven't looked at the code closely.

Maybe it's more useful to think about what methods should operate on the result of a clustering operation. The obvious one is cluster assignment. Even that is ambiguous for hierarchical methods without specifying some cutoff criterion or number of clusters. Others might be silhouette widths or objective value. In principle those could be applied to any clustering algorithm given a dissimilarity matrix (once cluster memberships are assigned), but for some algorithms you don't necessarily have a dissimilarity matrix sitting there. Fuzzy and model based algorithms would have additional methods.

Based on this brainstorm, types and methods might be

ClusterPartition
    store cluster memberships
    method to return cluster memberships
    method to return silhouette
ClusterHierarchy
    store the clustering tree?
    method to reduce to a ClusterPartition given some criterion
    methods to summarize hierarchy (I'm not to familiar with common approaches here)
ClusterFuzzy
    store weighs for each cluster/observation
    method to reduce to a ClusterPartition, probably just argmax of cluster weights
    summarization
Maybe ClusterModel?
    method to reduce to ClusterFuzzy.

assignments of Fuzzy C-mean

I tired to use fuzzy_cmean and the function probably worked.
However, there are no-assignments or counts for FuzzyCMeansResult.
How can we get these values for each datasets?

affinity propagation result is not consistent with sklearn in python

the two clustering results are different. Julia version did not do any clustering since the assignment is just the index of each object! My similarity matrix is too large to show here.

using Clustering

@time affinityPropResult = Clustering.affinityprop(similarityMatrix)

affinityPropResult.assignments
using PyCall

@pyimport sklearn.cluster as cl
af = cl.AffinityPropagation(affinity="precomputed")[:fit](similarityMatrix)

labels = af[:labels_]

The travis test also did not verify the correctness of the result.

kmeans extremely slow in julia 0.5

Hello,

In trying to get GaussianMixtures working with julia v0.5, I am stumbeling on extremely slow kmeans, which I use for initializing the Gaussians.

v0.5

@time kmeans(rand(10,10000), 5)
 19.641323 seconds (3.55 M allocations: 545.293 MB, 0.37% gc time)

v0.4:

@time kmeans(rand(10,10000), 5)
  0.084591 seconds (152.87 k allocations: 14.603 MB, 7.42% gc time)

It might be related to

WARNING: slice is deprecated, use view instead.
 in depwarn(::String, ::Symbol) at ./deprecated.jl:64
 in slice(::Array{Float64,2}, ::Vararg{Any,N}) at ./deprecated.jl:30
 in colwise!(::Array{Float64,1}, ::Distances.SqEuclidean, ::SubArray{Float64,1,Array{Float64,2},Tuple{Colon,Int64},true}, ::Array{Float64,2}) at /Users/david/.julia/v0.5/Distances/src/generic.jl:36
 in initseeds!(::Array{Int64,1}, ::Clustering.KmppAlg, ::Array{Float64,2}, ::Distances.SqEuclidean) at /Users/david/.julia/v0.5/Clustering/src/seeding.jl:98
 in initseeds(::Clustering.KmppAlg, ::Array{Float64,2}, ::Int64) at /Users/david/.julia/v0.5/Clustering/src/seeding.jl:22
 in initseeds(::Symbol, ::Array{Float64,2}, ::Int64) at /Users/david/.julia/v0.5/Clustering/src/seeding.jl:34
 in #kmeans#2(::Void, ::Symbol, ::Int64, ::Float64, ::Symbol, ::Function, ::Array{Float64,2}, ::Int64) at /Users/david/.julia/v0.5/Clustering/src/kmeans.jl:51
 in kmeans(::Array{Float64,2}, ::Int64) at /Users/david/.julia/v0.5/Clustering/src/kmeans.jl:49

as (repeated) warnings tend to make julia very slow.

Feature Request: Mean shift clustering

It would be great to have an implementation of an algorithm that finds modes of kernel density estimates. Most common algorithm is mean-shift algorithm:

Comaniciu, Dorin; Peter Meer (May 2002). "Mean Shift: A Robust Approach Toward Feature Space Analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE) 24 (5): 603โ€“619. doi:10.1109/34.1000236.

A great short (2 page) guide to using mean shift algorithm for clustering http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf

I initially posted the request at KernelDensity.jl (JuliaStats/KernelDensity.jl#11) but thought it might be better suited here.

I made a pull request: #43

Why is the kmeans algorithm column-oriented instead of row-oriented?

In the docs (below), the kmeans algorithm takes a matrix where each column X[:, i] corresponds to an observed sample. This implementation goes against the idea of tidy data as well as differs from Python's scikit-learn implementation of kmeans and R's base implementation of kmeans.

Is there a good reason for this? Should this algorithm be changed from column-oriented to row-oriented so as to be consistent with R and Python as well as with the concept of tidy data?

URL: http://clusteringjl.readthedocs.io/en/stable/overview.html

Inputs

A clustering algorithm, depending on its nature, may accept an input matrix in either of the following forms:

  • Sample matrix X, where each column X[:,i] corresponds to an observed sample.
  • Distance matrix D, where D[i,j] indicates the distance between samples i and j, or the cost of assigning one to the other.

tag a new release of Clustering.jl ?

I submitted a PR last week that suppressed warnings from Julia 0.4, but these changes never got released. Without the changes, Clustering.jl generates so many warnings that it's hard to use. (Running the unit tests for Clustering.jl generates over 16,000 lines of warning messages currently.) Will you a tag a new release? Thanks much!

varinfo() clashes with InteractiveUtils.varinfo()

Unless there are better ideas, I suggest to rename it into variatinfo(), because varinfo really sounds like some information about a variable.
What should be the roadmap for renaming? Introduce the new name and deprecate varinfo() in the next minor release, then remove varinfo() after some period of time (6 month or so)?

cc @ararslan

kmeans() error with float32 vectors

I get this error when my input vectors are float32:

ERROR: no method _kmeans!(Array{Float32,2}, Nothing, Array{Float32,2}, Array{Int64,1}, Array{Float64,1}, Array{Int64,1}, Array{Float32,1}, KmeansOpts)                        
 in kmeans! at /Users/swade/.julia/Clustering/src/kmeans.jl:367
 in kmeans at /Users/swade/.julia/Clustering/src/kmeans.jl:387
 in kmeans at /Users/swade/.julia/Clustering/src/kmeans.jl:390
 in include_from_node1 at loading.jl:120

It goes away when vectors are float64. Looking at the code, it seems this is not intended.

Improve k-means

The current structure looks good to me. But it can be further extended to allow more options to use it.

I am considering several improvements to k_means:

Change from row-based to column-based

Currently, it considers each "row" as a sample -- this is not cache friendly, as Julia matrix is column-major. For a large data-matrix, operating by rows may incur very severe penalty due to cache miss.

Also, in typical machine learning literatures, samples are considered as column vectors in general.

Additional Interface

Currently, it is

k_means(x, k, opts)

We can add an additional function, as

k_means(x, init_centers, opts)

This function allows users to directly supply their own set of initial centers -- it is quite possible in practice that users can come up with a better initial guess based on their domain-specific knowledge.

Also, you don't have to provide the number k here, as it can be immediately inferred from the number of columns in init_centers.

Then, the original k_means(x, k, opts) can then just initialize a set of centers (using kmeans++) and then call the function above.

Use the Distance.jl to compute distances

My benchmarks shows this can lead to over 100x performance gain. Pairwise distance computation is the performance bottleneck of k-means algorithms.

Add more options

  • weights: allowing users to assign weights to samples
  • replicates: allowing the users to specify the number of times to run k-means (eventually the function returns the best result)
  • allows the user to specify what to do if a center gets no samples during iterative update (this is possible in practice). The default option can be to redraw a new center using a kmeans++ scheme.

Provide Elkan's method as an optional choice,

which takes advantage of triangle inequality to reduce the computation of distances.


Would you please me know if you have any feedback on this proposal?

There are two ways that I can contribute to this:

(1) If you grant me the privilege to commit, I may create a new branch for this development, and merge it to the master when both of us agree that it is ready.

(2) I can fork it and do a pull request later. But there can be some hassles if I have to modify it in future for bug fixes or further improvements.

WARNING: both ArrayViews and Base export "view"

Testing on 0.5.0-dev+5478 gives:

WARNING: both ArrayViews and Base export "view"; uses of it in module Clustering must be qualified

Although ArrayViews.jl now says:

By and large, this package is no longer necessary: base julia now has efficient SubArrays (i.e., sub and slice).

Eventually giving errors like this:

ERROR: LoadError: UndefVarError: view not defined
 in initseeds!(::Array{Int64,1}, ::Clustering.KmppAlg, ::Array{Float64,2}, ::Distances.SqEuclidean) at /Users/me/.julia/v0.5/Clustering/src/seeding.jl:98

Function dbscan does not accept keyword arguments

Not sure if this is a Julia v0.5 issue:

I just installed Clustering and ran the following commands in the REPL:

using Clustering;
clusters= dbscan(randn(3,10000), 0.05, min_neighbors=3, min_cluster_size=20);

ERROR: function dbscan does not accept keyword arguments
in kwfunc(::Any) at .\boot.jl:236

Please tag latest version in METADATA

Hello,

I am in the process of making GaussianMixtures compatible with julia v0.5. GaussianMixtures depends on Clustering. It needs the latest commits in order to compile. I don't see a way how I can depend on a specific commit, so the request is to tag the latest commit with METADATA.

Thanks a lot,

---david

Pkg.add("Clustering") is not working in julia 0.3.5

Dear All,

I tried to add the package Clustering and Julia is giving this :

ERROR: unknow package Clustering
in wait at task.jl:51
in sync_end at task.jl:311
in add at pkg/entry.jl:55
in anonymous at pkg/dir.jl:28
in cd at file.jl:30
in cd at pakg/dir.jl:28

The computing environment :

  • Julia version : 0.3.5
  • OS : Windows 8.1 with Bing 32 bits.

Best Regards.

Hierarchical clustering?

Any plans / interest to add hierarchical clustering to this package? Or is that more appropriate for a different package? In that case, this package should be renamed to Kmeans or some such.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.