pharo-ai / k-means Goto Github PK
View Code? Open in Web Editor NEWK-means clustering in Pharo
License: MIT License
K-means clustering in Pharo
License: MIT License
inertia_float
Sum of squared distances of samples to their closest cluster center, weighted by the sample weights if provided.
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
https://scikit-learn.org/stable/modules/clustering.html#k-means
Now, the centroids are selected randomly. We can do better than that using the k-means++ like in scikit-learn.
Currently I'm using k-means with DataFrame as follows:
| df kmeans |
df := Datasets loadIris columnsFrom: 1 to: 4.
kmeans := KMeans numberOfClusters: 3.
kmeans fit: (df asArrayOfRows collect: #asArray).
It would be nice if the argument to #fit: could be just the DataFrame, which knows how to be fitted with a KMeans algorithm:
kmeans fit: df
This way fit could receive also PMMatrix and similar matrix-like objects. And each one is responsible to implement:
DataFrame>>fitKMeans: aKmeans
aKmeans fit: (self asArrayOfRows collect: #asArray).
and so on...
This
| data kmeans |
data := #( #( 0 0 ) #( 0.5 0 ) #( 0.5 1 ) #( 1 1 ) ).
kmeans := AIKMeans numberOfClusters: 100.
kmeans fit: data
and this
| data kmeans |
data := #( #( 0 0 ) #( 0.5 0 ) #( 0.5 1 ) #( 1 1 ) ).
kmeans := AIKMeans numberOfClusters: -100.
kmeans fit: data
should raise an appropriate exception because now the first one either works or says Tooktoomuchtime while the second raises a SubscriptionOutOfBound
When initializating kmeans, you should be able to choose if you want the initial centroids to be random or #6, or if you want to set your own initial centroids. Keep in mind that If an array is passed, it should be of shape (n_clusters, n_features) and gives the initial centers or else raise an exception.
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.
Adding this feature would be helpful in testing that empty clusters are relocated as expected if you give one initial centroid that is far from the data which means that a cluster will be empty on the first iteration
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
n_initint, default=10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
The method transform will return the distance of a variable to all the centroids. It is useful for dimensionality reduction.
We should have a method, common to all algos, that returns a boolean saying that the algo has reached convergence or not. We need to separete that of the max_iterations.
If the algo has reached the max iterations and has not converged it should return false.
This should be for all algos
Now we are only checking if the algo is converging. We should add a max iterations parameter that is initialized with a number by default.
Do we want to have AI for non ML one?
But in any case we should have AI or ML.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.