Giter Club home page Giter Club logo

k_means_clustering's Introduction

K-Means Clustering

#3. Run KMeansNormTest.R with TestObservations and TestCenters. #a) What is the single most obvious difference between the distributions of the first and second dimensions?

The sizes of the dimensions are different, i.e. the data is not in equal terms.

#b) Does clustering in Test 1 occur along one or two dimensions? Which dimensions? Why?

In Test 1, clustering occurs in one dimension, the second dimension since it has a larger scale/impact than first dimension.

#c) Does clustering in Test 2 occur along one or two dimensions? Which dimensions? Why?

In Test 2, clustering occurs in one dimension, the second dimension. Normalizing the first dimension makes its scale even more smaller compare to the second dimension.

#d) Does clustering in Test 3 occur along one or two dimensions? Which dimensions? Why?

In Test 3, clustering occurs in two dimensions. Even though the second dimension has a larger scale, de-normalizing reduced its impact and cluster in two dimensions.

#e) Does clustering in Test 4 occur along one or two dimensions? Which dimensions? Why?

In Test 4, clustering occurs in two dimensions. Normalizing both dimensions balanced the impact of each dimension, i.e. it put the data in equal terms.

#4. Why is normalization important in K-means clustering?

It carries both cluster at an equivalent scale so that the impact of each dimension can be comparable in K-means clustering.

#5.How do you encode categorical data in a K-means clustering?

We need to transfer categorical values into numerical values, i.e. apply Binarization to be able to perform K-means clustering.

#One way to do that would be define impact values of each category (for instance the number of repetition of each category) . #6. Why is clustering un-supervised learning as opposed to supervised learning?

Unlike supervised learning which requires an expert label (or a specific function to guess dependent y via independent x), un-supervise learning users do not know what the outcome is.

In clustering, we do not tell the algorithm what outcome was observed or what outcome is desired.

k_means_clustering's People

Contributors

ozemreozdemir avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.