In this project I applied Exploratory Data Analytics on the H&M Fashion Dataset to have an insight look at the dataset and then implemented three recommender systems based on the Turi Create.
The code for this project can be found in ./code/Recommender_Systems_H&M.ipynb
The complete dataset is available HERE.
The dataset contains totally three .csv
files and 105K .jpg
files.
The three .csv
files are as followed:
- articles.csv - detailed metadata for each
article_id
available for purchase - customers.csv - metadata for each
customer_id
in dataset - transactions_train.csv - the training data, consisting of the purchases each customer for each date, as well as additional information. Duplicate rows correspond to multiple purchases of the same item.
./images
is a folder of images corresponding to each article_id
; images are placed in subfolders starting with the first three digits of the article_id
; important note! Not all article_id values have a corresponding image!
In this part, I applied data preprocessing methods to articles.csv
, customers.csv
, transaction_tran.csv
to impute missing values and made visulizations to have a more directly look into the data. For
In this part, based on Turi Create, I implemented three recommender systems for the recommendation of top-12 items for each user. Turi Create simplifies the development of custom machine learning models and it's open source on GitHub. Since Turi Create is based on the RAPID dataframe, here we use the RAPID dataframe to read in and prepare the datasets.
To prepare the dataset, we first create normalized matrix with customers on rows and articles ad columns and then split the dataset into training and testing dataset.
Training Dataset
: 70%Testing Dataset
: 30%
The three recommender systems includes Popularity Recommender System, Cosine Recommender System and Pearson Recommender System. The Popularity Recommender System recommends the top-12 popular items among all items while the Cosine Recommender System and Pearson Recommender System recommend the top-12 item that are most correlated to the user's previous purchases based on collaborative filtering.
After the training process of each model, the RMSE, Mean Precision and Mean Recall of each model is calculated to evaluate the performance of each model. The complete evaluation output can be found in the .txt
and '.txt'.
Here is the result of the RMSE for each model:
Models | RMSE |
---|---|
Popularity Recommender System |
1.699 |
Cosine Recommender System |
1.017 |
Pearson Recommender System |
1.675 |
├─code
│ ├─Recommender_System_H&M.ipynb # Jupyter Notebook for this project
├─output
│ ├─eval_counts.txt # evaluaton results of three models
├─public # Some of the example images
│ ├─example1.png
│ ├─example2.png
│ ├─H&M.jpg