Giter Club home page Giter Club logo

winesommelier's Introduction

WineSommelier

A project on a blind taster algorithm that intends to predict the grape variety based on a semi-professional description of wines.

Organization of the repository

Blind taster predictor

The predictor resides in the main folder in sommelier.py. The code comprises a pipeline that takes numerical and text based data and a classifier, trains the classifier on a train set and prints out the accuracy, confusion matrix and classification report for both train and test predictions. Furthermore, there is a hyperparameter optimization using GridSeach cross validation and at the end cross validation test results for 3 folds.

DataBase folder

The input data for the sommelier.py code resides in the DataBase folder in an excel file.

Scraping folder

The input database was obtained by webscraping Bibendum and Majestic Wine. These scripts result in a much larger database then the one showed in the DataBase folder. To make sure I am not exploiting the hard work of those two companies to put together their database I only share a small portion of their data in the DataBase folder.

DataCleaning folder

The scripts in this folder take the raw data from the scraping codes and clean and filter them. Particularly, they perform lower case transformation, extracting grape variety from wine names, removing grape types from the description column etc. These codes make sure that the input data for the sommelier.py code is not introducing bias to the machine learning algorithm. The cleaned and truncated data is transferred into the DataBase folder.

DataAnalysis and Publish folders

These two folders contain basically the same information (to some extent), but one was written in a text editor while the other in jupyter notebook. The files especially in the Publish folder are giving a detailed explanation of how can an algorithm make predictions on grape variety based on a description of a wine. They include visualization of the data and try to dig deep to understand the characteristic features of the grapes in the database.

Zsolt Diveki 2018 My GitHub Page My GitHub

winesommelier's People

Contributors

diveki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.