This project is to find movie preference by using BTV dataset which is given by "DATA Analysis Contest"
Follow the example below. There is 3 process - Scrap, Train, Test.
To scrap additional movie information from 'naver movie web' you need to put some options
- src_path : Source path that has source csv file
- tgt_path : Target path that file is saved
python scrap.py --src_path data/SKB_DLP_MOVIES.csv --tgt_path data/NEW_MOVIES.csv
Running this code you will get '.csv' files that has scrapped information of movies in tgt_path
To train model set options
- movie_path : Path that has 'movie infomation csv' file which is generated from "Scrap.py"
- view_path : Path that has 'SKB_DLP_VIEWS.csv' file
- question_path : Path that has 'SKB_DLP_QUESTION.csv' file
- batch_size : Batch size of training
- window_size : Sequence size that will loaded for training
- test_portion : Test portion that split dataset into train, valid dataset
- hidden_size : Hidden size of RNN model
- word_vec_dim : Embedding size of movieID
- n_epochs : Max epoch to train
- early_stop : Early stop condtion If there is no progress after epochs
- target : Folder path where result model saved
- model : There is 3 models that can be used to train (seqModel, seqModel2, seqModel3). But only 'seqModel3' is validated
- device : Choose device when running train.py (cpu, gpu)
python train.py --model seqModel3 --n_epochs 20 --batch_size 16
Running this code you will get trained model(ex. model.pwf) files in target directory
To test model set options
- model_path : Path that has model file
- device : Choose device when running test.py (cpu, gpu)
- question_path : Path that has 'SKB_DLP_QUESTION.csv' file
- movie_path : Path that has 'movie infomation csv' file which is generated from "Scrap.py"
- batch_size : Batch size of testing
- test_num : A number of movies that will recommended for each sequence(top-k)
- model : There is 3 models that can be used to train (seqModel, seqModel2, seqModel3). But only 'seqModel3' is validated
python test.py --test_num 5
Running this code you will get (.csv) files that has top-k recommend movie list for each user_id