Giter Club home page Giter Club logo

recommender_systems's Introduction

Exploration of Recommender Systems

Recommender Systems

Recommender systems are useful for recommending users items based on their past preferences. Broadly, recommender systems can be split into content-based and collaborative-filtering types. Such recommender systems are predictive in nature. For example, we predict what the user might want to buy next. Here, we use the Surprise python package. http://surpriselib.com/

Example: Amazon - "You May Also Like"

Association Mining

Also, we will delve a bit into association mining, which is a form of descriptive analytics. This can be used to develop marketing strategies like cross-selling or up-selling. Here, we use the mlxtend python package. https://github.com/rasbt/mlxtend

Example: Amazon - "Frequently Bought Together"

Content-based : Nearest Neighbours - Top Recommendations

Nearest Neighbours of Item - nearest_neighbours.py

This code demonstrates how to get the 10 most related items (nearest neighbours) to the movie item "Clockwork Orange".

User and Item Based Collaborative Filtering

User and Item based collaborative filtering - collaborative_filtering_exploration.py

Algorithms used: http://surprise.readthedocs.io/en/stable/knn_inspired.html

This code demonstrates how collaborative filtering for user-based and item-based methods can be done in Python.

Advantages of item-based filtering over user-based filtering :

  1. Scales Better : User-based filtering does not scale well as user likes/interests may change frequently. Hence, the recommendation needs to be re-trained frequently.

  2. Computationally Cheaper : In many cases, there are way more users than items. It makes sense to use item-based filtering in this case.

A famous example of item-based filtering is Amazon's recommendation engine.

https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf

Utilizing Grid Search

Grid Search on KNNMeans Algorithm - grid_search.py

An extensive grid search to find the best hyper-parameters for KNNMeans on this dataset. Warning: Takes a long time!

Results:

Grid Search Results: 

0.931737661568928
{'k': 40, 'bsl_options': {'n_epochs': 1, 'reg_i': 40, 'method': 'als', 'reg_u': 25}, 'sim_options': {'user_based': False, 'name': 'pearson_baseline'}}

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).
                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.7161  0.7176  0.7171  0.7178  0.7227  0.7182  0.0023  
RMSE (testset)    0.9131  0.9185  0.9173  0.9171  0.9239  0.9180  0.0035  
Fit time          1.11    1.12    1.10    1.05    0.96    1.07    0.06    
Test time         2.06    2.06    2.05    2.05    2.05    2.05    0.00    
{'fit_time': (1.1072561740875244, 1.1171300411224365, 1.1014506816864014, 1.052947998046875, 0.9627974033355713), 'test_rmse': array([0.91305966, 0.91854328, 0.91732058, 0.9170613 , 0.92394471]), 'test_mae': array([0.71609261, 0.71756385, 0.7170833 , 0.71775924, 0.72269525]), 'test_time': (2.0553717613220215, 2.055539846420288, 2.047955274581909, 2.052337884902954, 2.045203685760498)}

Matrix Factorization Algorithms

Singular Vector Decomposition (SVD, SVD++) - SVD.py

These algorithms are famous for being the baseline algos for winning the Netlix Prize.

http://buzzard.ups.edu/courses/2014spring/420projects/math420-UPS-spring-2014-gower-netflix-SVD.pdf

For SVD

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.7191  0.7158  0.7138  0.7166  0.7254  0.7181  0.0040  
RMSE (testset)    0.9173  0.9162  0.9125  0.9179  0.9271  0.9182  0.0048  
Fit time          1.04    1.11    1.03    1.01    0.90    1.02    0.07    
Test time         2.08    2.06    2.06    2.05    2.06    2.06    0.01    
{'test_mae': array([0.71909501, 0.71579784, 0.71384567, 0.71656901, 0.72541331]), 'fit_time': (1.0419056415557861, 1.1055827140808105, 1.0349535942077637, 1.01346755027771, 0.9016950130462646), 'test_rmse': array([0.91726677, 0.91622589, 0.91245293, 0.91793969, 0.92711428]), 'test_time': (2.0783369541168213, 2.0616250038146973, 2.0627968311309814, 2.049354314804077, 2.058506727218628)}

SVD : Test Set
RMSE: 0.9000

For SVD++ (Before running : Be warned it is much slower than SVD)

Grid Search Results:

0.9167768433405562
{'n_factors': 40, 'reg_all': 0.1, 'n_epochs': 40, 'lr_all': 0.008}

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9104  0.9053  0.9111  0.9133  0.9124  0.9105  0.0028  
MAE (testset)     0.7188  0.7145  0.7204  0.7224  0.7203  0.7193  0.0027  
Fit time          336.05  334.75  335.30  334.70  339.24  336.01  1.69    
Test time         1.79    1.84    1.86    1.84    1.74    1.81    0.04    

SVD++ : Test Set
RMSE: 0.9141


A little slow? Can we use GPU to accelerate this? Yes! (for future explorations)

https://devblogs.nvidia.com/numba-python-cuda-acceleration/

http://www.nvidia.com/content/GTC/documents/1034_GTC09.pdf

https://github.com/NVIDIA/DeepRecommender

C-Based Implementations

libFM - Recommended by many https://github.com/srendle/libfm

xLearn - supports out-of-core training, fast, has Python and R APIs https://github.com/aksnzhy/xlearn

Association Mining

Association Mining : association_mining.py

Demonstrates how to get frequent items and association rules using the mlxtend package.

alt text

One finding is this:

Jumbo Storage Bag Suki is usually bought together with Jumbo Bag Red Retrospot.

Association Mining from another repo : https://github.com/lppier/League_Of_Legends_Data_Analytics/blob/master/AssociationAnalysis.Rmd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.