Giter Club home page Giter Club logo

wine_pred_cdsw's Introduction

Wine Quality Prediction

This notebook demonstrates how to create a Gradient-boosted tree classifier to predict wine quality based on characteristics.

Used: Python, Spark, seaborn

Steps:

0. Prepare Env

Set Env variables for projet (Settings -> Engine -> Environmental Variables )

  • make sure that the following pyspark env variables point to the same python interpreter (and that the interpreter exists)
  • PYSPARK3_PYTHON
  • PYSPARK_DRIVER_PYTHON

ex :

  • echo $PYSPARK3_PYTHON => /usr/local/bin/python3
  • ls /usr/local/bin/python3

1. Open a python3 session

Open Workbench => Editor : Workbench Engine Kernel : Python 3

2. Run setup

-installs python libraries ( SKLearn ) -copies data to HDFS (/tmp/wine_pred) -and create hive table ( wineDS_ext ) Note : Assumes that Hive is setup correctly and can talk to the cluster

Start a Python 3 session using Workbench editor use either the session input window or terminal session to run setup

  • in session : !./setup/setup.sh
  • in terminal : ./setup/setup.sh

3. Run descriptive anylisis of data

Start a python3 session using Workbensh editor run "analyse.py"

4. create a model with pyspark

Start a Python 3 session using Workbench editor Run "fit.py:

Note : using RandomForest from spark ML library params:

  • numTrees ( number of trees to spawn )
  • maxDepth (depth of leafs for trees )

=> Copies model to root of projet as "spark_rf.tar"

4Bis. create a model with SKLearn and Jupyter Notebook

Start a Python 3 session using Jupyter Notebook editor => user at least 2 CPU and 4GB of RAM => Wait 15-20 secs for notebook to show up

In the "Jupyter Notebook" folder, run "fit_SKLEARN.ipynb" Note : using RandomForest from Sklearn params:

  • estimator ( number of trees to spawn )
  • maxDepth (depth of leafs for trees )

5. Create a model in the Model API

  • Give the model a name and description
  • Confifure runtime params
    • Script: "model.py"
    • Function : "predict"
    • (optional) give example input :
      {"feature": "7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4"} {"feature": "7.3;0.65;0.0;1.2;0.065;15.0;21.0;0.9946;3.39;0.47;10.0"}

wine_pred_cdsw's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.