Team members:
- Diyue Gu ([email protected], github: Ivygdy)
- Jingyan Xu ([email protected], github: jx2424)
- Yifei Zhang ([email protected], github: jimmy-zyf)
- Chelsea Cui ([email protected], github: acui34)
- Yishi Wang ([email protected], github: wangyis)
The overall goal of this project is to build a improved version of the recommendation system in HW2, which recommends users the movies they potentially would like, by utilizing a hybrid architecture design. In this version, the recommendation systems switches the prediction model based on the user rating history. Based on the advantages each model, the system coud generate the customized best results for user. In this way, we can maintain a relatively active user population and keep promoting newly released movies to the right target audience.
The data used in this project are from the full version of MovieLens Latest Dataset, which was last updated 9/2018. While the dataset included 27,000,000 ratings and 1,100,000 tag applications applied to 58,000 movies by 280,000 users, we only used the ratings table and kept about 20,000 users and 1,000 items for calculation due to the limitation of computation power.
Our submission includes exported Google Collab notebook files (splitted due to extended running time) and intermediate files, as specified below:
Project Folder:
│ README.md
│ requirements.txt
# Notebook Files
│ Hybrid Model.ipynb
│ DataPreparation.ipynb
│ ContentBased.ipynb
│ DL.ipynb
│ MF.ipynb
# Python Model Modules (models saved in python for easy import)
│ dlModel.py
│ contentBased.py
│ modelBased.py
# Preprocessed Data (created by DataPreparation.ipynb)
│ train.csv
│ test_for_dp.csv
│ test_for_content.csv
│ test_for_mf.csv
# Cache Data (test results pre-saved for faster evaluation)
│ DL_recommend_df.csv
│ MFRecommendation.csv
│ content_based_recommend_df.csv
│ cb_evaluation_df.csv
│ mb_evaluation_df.csv
│ hybrid_evaluation_df.csv
│ dl_evaluation_df.csv
The main report, including business objectives, model description, model comparison, and final conclusion, can be found in the Hybrid Model.ipynb
file. Data preparation and sampling can be found in DataPreparation.ipynb
. Single Model explanatoin and tuning can be found in notesbooks named by the models (ContentBased.ipynb
, DL.ipynb
, MF.ipynb
).
For the ease of rebuilding the environment we used, we also included Notebook links below:
-
Main Report and Hybrid Model
-
Matrix Factorization Model and Baseline Model
-
Content-based Model
-
Deep Learning Model
-
Data Preparation