Giter Club home page Giter Club logo

snowflake_ml_intro's Introduction

Snowflake for Data Science

Test Dependencies

Getting Started.

Although we recorded videos, we are constantly making upgrades and additions to this repo, so the videos may differ slightly from what is in the repo. Overall they are the same but we will continue to upload more videos on any additions to the repo.

Configuration Setup

  1. Create a .env file and populate it with your account details:

    SNOWFLAKE_ACCOUNT = abc123.us-east-1
    SNOWFLAKE_USER = username
    SNOWFLAKE_PASSWORD = yourpassword
    SNOWFLAKE_ROLE = sysadmin
    SNOWFLAKE_WAREHOUSE = compute_wh
    SNOWFLAKE_DATABASE = snowpark
    SNOWFLAKE_SCHEMA = titanic
    
  2. Utilize the environment.yml file to set up your Python environment for the demo:

    • Examples in the terminal:
      • conda env create -f environment.yml
      • micromamba create -f environment.yml -y

Why we partner with Anaconda

Image

Review of distributed Hyperparameter tuning benefits

Local run time 8 min 27 seconds

Screenshot 2024-02-05 at 10 13 50โ€ฏAM

SnowflakeML run time 1 min 17 seconds (6.5x improvement in speed leveraging a Large WH)

Screenshot 2024-02-05 at 10 16 43โ€ฏAM

Data Processing & ML Operations

Load & Transform Data

Execute the load_data notebook to accomplish the following:

  • Load the Titanic dataset from Seaborn, convert to uppercase, and save as CSV
  • Upload the CSV file to a Snowflake Internal Stage
  • Create a Snowpark DataFrame from the staged CSV
  • Write the Snowpark DataFrame to Snowflake as a table

Machine Learning Operations (snowml)

In the snowml notebook:

  • Generate a Snowpark DataFrame from the Titanic table
  • Validate and handle null values
  • Remove columns with high null counts and correlations
  • Adjust Fare datatype and impute categorical nulls
  • One-Hot Encode Categorical Values
  • Segregate data into Test & Train sets
  • Train an XGBOOST Classifier Model with hyperparameter tuning
  • Conduct predictions on the test set
  • Display Accuracy, Precision, and Recall metrics

Advanced MLOps with Live/Batch Inference & Streamlit

Following the load_data steps, utilize the deployment notebook to:

  • Create a Snowpark DataFrame from the Titanic table
  • Assess and eliminate columns with high null counts and correlated columns
  • Adjust Fare datatype and handle categorical nulls
  • One-Hot Encode Categorical Values
  • Split the data into Test & Train sets
  • Train an XGBOOST Classifier Model, optimizing with grid search
  • Display model accuracy and best parameters
  • Register the model in the model registry
  • Deploy the model as a vectorized UDF (User Defined Function)
  • Execute batch predictions on a table
  • Perform real-time predictions using Streamlit for interactive inference

snowflake_ml_intro's People

Contributors

cromano8 avatar sfc-gh-twhite avatar huseyna28 avatar indexseek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.