Giter Club home page Giter Club logo

paulsudarshan / customer-segmentation-clustering Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 294 KB

Customer Segmentation refers to the process of identifying several segments of customers from a pool of customer database which helps businesses to target potential user base. The technique of customer segmentation depends upon various differentiating factors such as demographics, economy status, geography etc. All these factors plays a vital role in the customer segmentation process. In order to perform this project, clustering mechanism of Unsupervised Machine Learning Algorithm is used specifically K-Means Clustering.

Jupyter Notebook 100.00%

customer-segmentation-clustering's Introduction

Customer-Segmentation-Clustering

HitCount

Customer Segmentation refers to the process of identifying several segments of customers from a pool of customer database which helps businesses to target potential user base. The technique of customer segmentation depends upon various differentiating factors such as demographics, economy status, geography etc. All these factors plays a vital role in the customer segmentation process. Companies that deploy customer segmentation are under the notion that every customer has different requirements and require a specific marketing effort to address them appropriately. In order to achieve this target, clustering mechanism of Unsupervised Machine Learning Algorithm is used specifically K-Means Clustering.

Dataset Details :

The dataset contains transactions on an e-commerce website between the period Feb 2018 to Feb 2019 from customers across different countries. File: transaction_data.csv Columns

UserId - Unique identifier of a user.

TransactionId - Unique identifier of a transaction. If the same TransactionId is present in multiple rows, then all those products are bought together in the same transaction.

TransactionTime - Time at which the transaction is performed

ItemCode - Unique identifier of the product purchased

ItemDescription - Simple description of the product purchased

NumberOfItemsPurchased - Quantity of the product purchased in the transaction

CostPerItem - Price per each unit of the product

Country - Country from which the purchase is made.

Tools and Libraries Used :

  1. Jupyter Notebook
  2. Pandas Library (for Data Manipulation)
  3. sklearn
  4. Seaborn
  5. MatplotLib

Methodology

  1. Cleaning and Pre-processing the dataset which by imputing the Missing values using efficient techniques, Removing irrelevant characters values etc.
  2. Performing Descriptive Analysis to detect any anomalies within the dataset feature values.
  3. Performing Feature Engineering to create other useful features from the existing features for example- in the dataset we are given with costPerItem and NoOfItemsBought in each transaction, so simply by multiplying the two feature values we can create a new feature which would tell us about the total amount spent in each transaction from the particular userID for that specific ItemID.
  4. Encoding the categorical variables with LabelEncoder for ordinal variables and One-Hot Encoding for Nominal variables. Sometimes One-Hot Encoding may give out Sparse Classes, they have been dealt appropriately in the following project.
  5. Exploratory Data Analysis is required to gain insights about the existing data using various libraries such as seaborn or MatplotLib to create scatterplots, barplots, histogram etc to perform Univariate and Bivariate Analysis.
  6. Some variables are required to be transformed (Log, Root etc) to get rid of the skewness in their distribution to make them normally distributed.
  7. Kmeans clustering analysis is performed after appropriate value of K has been determined using the 'Elbow Method'. The final clustering is performed with the best value for K that will determine the number of clusters in your data.
  8. The clusters are visualised using the Scatterplot and various analysis are performed to based on the results for business purposes.

Results

Conclusion

Hence, we developed a Customer Segmentation Model using a class of machine learning known as Unsupervised Learning. Specifically, we made use of a clustering algorithm called K-means clustering. We analyzed and visualized the data and then proceeded to implement our algorithm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.