This project has been accomplished as part of Udacity Machine Learning Nanodegree Projects
In this project, I have used unsupervised learning techniques to organize the general population into clusters, then use those clusters to see which of them comprise the main user base for the companies. Prior to applying the machine learning methods, I've assessed and cleaned the data in order to convert the data into a usable form. While unsupervised learning lies in contrast to supervised learning in the fact that unsupervised learning lacks objective output classes or values, it can still be important in converting the data into a form that can be used in a supervised learning task. Dimensionality reduction techniques has been used which help surface the main signals and associations in data, providing supervised learning techniques a more focused set of features upon which to apply their work.
Company X and company Y have provided two datasets one with demographic information about the people of Germany, and one with that same information for customers of a mail-order sales company. You’ll look at relationships between demographics features, organize the population into clusters, and see how prevalent customers are in each of the segments obtained. Their main question of interest is to identify facets of the population that are most likely to be purchasers of their products for a mailout campaign.