Objective

Genre Classification of Million Song Dataset

Create a virtual environment

python3 -m venv music

source music/bin/activate

Steps to run the project locally

Make sure docker is setup in your system. Then follow these steps to install the project:

clone the project by using: git clone [email protected]:meetnisha/genre.git Run these commands in the project's folder:
git submodule update --init --recursive
run ./run_local.sh in root folder of the project

Alternatively, clone repo and run

docker-compose up -d --build

docker-compose -f docker-compose.yaml up -d --build

To check log files

docker logs -f core-api-container

Application

Report

Please check Report.docx for details about this application

DEMO

Video:

/documents/demo.gif

Screenshots

Home - home.png
Prediction output - prediction.png
Search functionality - search.png

Home Page

http://localhost:8000/

Prediction Output file

The test output file is saved in this folder:

/data/test_prediction.csv

OpenSpec API

http://localhost:8000/docs

Conclusion of ML analysis

File: /app/analysis/EDA_ML.ipynb

One shouldnot get 100% accuracy from your training dataset. This means my model is overfitting.

XGBoost Test Accuracy - 65.71.

I tried to find better hyper paramaters like n_estimators, reg_lambda but the space was too large.
I applied dimensionalty reducing technique like PCA but the accuracy got worse.
It consumed lots of time and hence I decided to move to Deep Learning.

Check EDA_DL file for further analysis

As I need to decrease the complexity by removing features, I used recursive feature elimination but it took almost 24 hours to run on my machine. Hence the features extracted in my ML analysis are used to build a deep learning model where I wanted to use title and tags feature as well.

Conclusion of DL Analysis

File: /app/analysis/EDA_DL.ipynb

Cleaned data, removed all null values
Dropped highly co-related features
Used recursive feature elimination(RFECV) in XGBClassifier in ML analysis, I eliminated those features for DL model as well.
Trid few optimizers like sgd, rmsprop and adam. Model performed better with adam.
Base line model overfitted.
Apply few techniques like regularization, drop outs and early stopping
Drop puts performed better.
XGBoost Test Accuracy - 65.71 % a. I tried to find better hyper paramaters like n_estimators, reg_lambda but the space was too large. b. It consumed lots of time and hence I decided to move to Deep Learning.
Baseline DL model - 63.13%
Best performing model - Dropout model - Test Accuracy - 70.89%

meetnisha / genre Goto Github PK

genre's Introduction