ml_movei_preference

This project is to find movie preference by using BTV dataset which is given by "DATA Analysis Contest"

How to use it?

Follow the example below. There is 3 process - Scrap, Train, Test.

1 Scrap additional movie infromation

To scrap additional movie information from 'naver movie web' you need to put some options

src_path : Source path that has source csv file
tgt_path : Target path that file is saved

EXAMPLE

python scrap.py --src_path data/SKB_DLP_MOVIES.csv --tgt_path data/NEW_MOVIES.csv

Running this code you will get '.csv' files that has scrapped information of movies in tgt_path

2 Train Model

To train model set options

movie_path : Path that has 'movie infomation csv' file which is generated from "Scrap.py"
view_path : Path that has 'SKB_DLP_VIEWS.csv' file
question_path : Path that has 'SKB_DLP_QUESTION.csv' file
batch_size : Batch size of training
window_size : Sequence size that will loaded for training
test_portion : Test portion that split dataset into train, valid dataset
hidden_size : Hidden size of RNN model
word_vec_dim : Embedding size of movieID
n_epochs : Max epoch to train
early_stop : Early stop condtion If there is no progress after epochs
target : Folder path where result model saved
model : There is 3 models that can be used to train (seqModel, seqModel2, seqModel3). But only 'seqModel3' is validated
device : Choose device when running train.py (cpu, gpu)

EXAMPLE

python train.py --model seqModel3 --n_epochs 20 --batch_size 16

Running this code you will get trained model(ex. model.pwf) files in target directory

3 Test with trained model

To test model set options

model_path : Path that has model file
device : Choose device when running test.py (cpu, gpu)
question_path : Path that has 'SKB_DLP_QUESTION.csv' file
movie_path : Path that has 'movie infomation csv' file which is generated from "Scrap.py"
batch_size : Batch size of testing
test_num : A number of movies that will recommended for each sequence(top-k)
model : There is 3 models that can be used to train (seqModel, seqModel2, seqModel3). But only 'seqModel3' is validated

EXAMPLE

python test.py --test_num 5

Running this code you will get (.csv) files that has top-k recommend movie list for each user_id

joungheekim / ml_movei_preference Goto Github PK