Giter Club home page Giter Club logo

genre's Introduction

Objective

Genre Classification of Million Song Dataset

Create a virtual environment

python3 -m venv music

source music/bin/activate

Steps to run the project locally

Make sure docker is setup in your system. Then follow these steps to install the project:

  1. clone the project by using: git clone [email protected]:meetnisha/genre.git Run these commands in the project's folder:

  2. git submodule update --init --recursive

  3. run ./run_local.sh in root folder of the project

Alternatively, clone repo and run

docker-compose up -d --build

Or

docker-compose -f docker-compose.yaml up -d --build

To check log files

docker logs -f core-api-container

Application

Report

Please check Report.docx for details about this application

DEMO

Video:

/documents/demo.gif

Screenshots

  1. Home - home.png
  2. Prediction output - prediction.png
  3. Search functionality - search.png

Home Page

http://localhost:8000/

Prediction Output file

The test output file is saved in this folder:

/data/test_prediction.csv

OpenSpec API

http://localhost:8000/docs

Conclusion of ML analysis

File: /app/analysis/EDA_ML.ipynb

One shouldnot get 100% accuracy from your training dataset. This means my model is overfitting.

XGBoost Test Accuracy - 65.71.

  1. I tried to find better hyper paramaters like n_estimators, reg_lambda but the space was too large.
  2. I applied dimensionalty reducing technique like PCA but the accuracy got worse.
  3. It consumed lots of time and hence I decided to move to Deep Learning.

Check EDA_DL file for further analysis

As I need to decrease the complexity by removing features, I used recursive feature elimination but it took almost 24 hours to run on my machine. Hence the features extracted in my ML analysis are used to build a deep learning model where I wanted to use title and tags feature as well.

Conclusion of DL Analysis

File: /app/analysis/EDA_DL.ipynb

  1. Cleaned data, removed all null values
  2. Dropped highly co-related features
  3. Used recursive feature elimination(RFECV) in XGBClassifier in ML analysis, I eliminated those features for DL model as well.
  4. Trid few optimizers like sgd, rmsprop and adam. Model performed better with adam.
  5. Base line model overfitted.
  6. Apply few techniques like regularization, drop outs and early stopping
  7. Drop puts performed better.
  8. XGBoost Test Accuracy - 65.71 % a. I tried to find better hyper paramaters like n_estimators, reg_lambda but the space was too large. b. It consumed lots of time and hence I decided to move to Deep Learning.
  9. Baseline DL model - 63.13%
  10. Best performing model - Dropout model - Test Accuracy - 70.89%

genre's People

Contributors

meetnisha avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.