Giter Club home page Giter Club logo

sfguide-snowpark-scikit-learn's Introduction

End to end Machine Learning with Scikit-Learn and Snowpark

Introduction

In this lab you will learn below concepts -

1. How to ingest data in Snowflake

2. How to do data explorations and understanding with Pandas and visualization

3. How to encode the data for algorithms to use

4. How to normalize the data

5. How to training models with Scikit-Learn and Snowpark (including using Snowpark Optimized warehouse)

6. How to evaluate models for accuracy

7. How to deploy models on Snowflake

In this lab we start by loading the data into Snowflake, do some data transformation and finally we fit/train a Scikit-Learn ML pipeline that includes common feature engineering tasks such as Imputations, Scaling and One-Hot Encoding. The pipeline also includes a RandomForestRegressor model that will predict median house values in California.

We will fit/train the pipeline using a Snowpark Python Stored Procedure (SPROC) and then save the pipeline to a Snowflake stage. This example concludes by showing how a saved model/pipeline can be loaded and run in a scalable fashion on a snowflake warehouse using Snowpark Python User-Defined Functions (UDFs).

Snowpark ML

1. Create the Conda Snowpark Environment

1.1 Clone the repository and switch into the directory

1.2 Open environment.yml and paste in the following config:

name: snowpark_scikit_learn
channels:
  - https://repo.anaconda.com/pkgs/snowflake/
  - nodefaults
dependencies:
  - python=3.8
  - pip
  - snowflake-snowpark-python
  - ipykernel
  - pyarrow
  - numpy
  - scikit-learn
  - pandas
  - joblib
  - cachetools
  - matplotlib
  - seaborn

1.3 Create the conda environment

conda env create -f environment.yml

1.4 Activate the conda environment

conda activate snowpark_scikit_learn

1.5 Download and install VS Code or you can use juypter notebook or any other IDE of your choice

1.6 Update the config.py file with your Snowflake account and credentials

1.7 Configure the conda environment in VS Code

In the terminal, run this command and note the path of the newly create conda environment

`conda env list`

Open the notebook named 1_snowpark_housing_ingest_data.ipynb in VS Code and in the top right corner click Select Kernal

Select Kernel

Paste in the path to the conda env you copied earlier

2. Ingest the Sample Dataset into Snowflake

Run the rest of the cells in 1_snowpark_data_ingest.ipynb to ingest data into Snowflake

3. Data Exploration and Transformation using Snowpark and Pandas

Open and run 2_data_exploration_transformation.ipynb notebook.

4. Feature Engineering, ML Training and Model Deployment in Snowflake

Open and run 3_snowpark_end_to_end_ml.ipynb notebook.

sfguide-snowpark-scikit-learn's People

Contributors

sfc-gh-vkhandelwal avatar fjkattan avatar sfc-gh-mstellwall avatar jdanielmyers avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.