Giter Club home page Giter Club logo

sfguide-getting-started-dataengineering-ml-snowpark-python's Introduction

Getting Started with Data Engineering and ML using Snowpark for Python

Overview

In this guide, we will perform data engineering (data analysis and data preparation) and machine learning tasks to train a Linear Regression model to predict future ROI (Return On Investment) of variable ad spend budgets across multiple channels including search, video, social media, and email using Snowpark for Python, Streamlit and scikit-learn. By the end of the session, you will have an interactive web application deployed visualizing the ROI of different allocated advertising spend budgets.

Step-By-Step Guide

For prerequisites, environment setup, step-by-step guide and instructions, please refer to the QuickStart Guide.

sfguide-getting-started-dataengineering-ml-snowpark-python's People

Contributors

fjkattan avatar iamontheinet avatar mungojam avatar sfc-gh-mstellwall avatar sfc-gh-thoyt avatar sfc-gh-zzhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sfguide-getting-started-dataengineering-ml-snowpark-python's Issues

snowflake-ml-python version not found

The version of snowflake-ml-python needs to be upgraded from 1.0.0 to 1.0.2 (as of the time of the submission). Using conda env create --file environment.yml fails unless the version is unanchored or set to 1.0.2.

Update environment.yml to use conda for snowflake-ml-python and streamlit

Conda's dependency management does not really work well if some packages are installed using pip; pip overrides whatever conda has installed in case the newest possible version which the constraints in the YAML allows has newer required dependencies.

Now it seems that Snowflake's Conda channel includes the needed streamlit and snowflake-ml-python packages, and at least the following environment resolves nicely:

name: snowpark-ml-lgbm
channels:
  - https://repo.anaconda.com/pkgs/snowflake
dependencies:
  - python=3.10
  - snowflake-snowpark-python>=1.4.0,<2
  - snowflake-ml-python>=1.0.0,<2
  - pandas=1.4.*
  - notebook>=6.5
  - scikit-learn=1.2.2
  - cachetools=4.2.2
  - boto3
  - pip
  - lightgbm=3.3.5

There is an issue with temporary stage

SnowparkSQLException: (1304): 01ad51d0-0503-8979-006f-c983008c8176: 002003 (02000): SQL compilation error:
Stage 'DASH_DB.DASH_SCHEMA."SNOWML_TRANSFORM_CAB1A996_1246_4696_BF17_BE65E6E11C6E MP3YTNLBYM"' does not exist or not authorized.

Here are my packages' versions:
Successfully installed absl-py-1.4.0 aiohttp-3.8.4 aiosignal-1.3.1 altair-5.0.1 async-timeout-4.0.2 blinker-1.6.2 click-8.1.3 cryptography-38.0.4 frozenlist-1.3.3 fsspec-2023.1.0 gitdb-4.0.10 gitpython-3.1.31 markdown-it-py-3.0.0 mdurl-0.1.2 multidict-6.0.4 pillow-9.5.0 protobuf-4.23.3 pydeck-0.8.1b0 pympler-1.0.1 pytz-deprecation-shim-0.1.0.post0 pyyaml-6.0 rich-13.4.2 smmap-5.0.0 snowflake-ml-python-1.0.2 sqlparse-0.4.4 streamlit-1.24.0 tenacity-8.2.2 toml-0.10.2 toolz-0.12.0 tzdata-2023.3 tzlocal-4.3.1 validators-0.20.0 watchdog-3.0.0 xgboost-1.7.6 yarl-1.9.2

OS: Windows 11

Error Executing Inference

Hi,
When I tried to run the call for the UDF for inference with new data, I got this error message:

code execution
python

test_df = session.create_dataframe([[250000,250000,200000,450000],[500000,500000,500000,500000],[8500,9500,2000,500]],  schema=['SEARCH_ENGINE','SOCIAL_MEDIA','VIDEO','EMAIL'])
test_df.select(
    'SEARCH_ENGINE','SOCIAL_MEDIA','VIDEO','EMAIL', 
    call_udf("predict_roi", 
    array_construct(col("SEARCH_ENGINE"), col("SOCIAL_MEDIA"), col("VIDEO"),col("EMAIL"))).as_("PREDICTED_ROI")).show()

Error returned

AttributeError: Can't get attribute '_passthrough_scorer' on <module 'sklearn.metrics._scorer' from '/usr/lib/python_udf/5f20772ab347018f6391e4ceaf2d4f3e3c074ff214498b0fb4f554f0cc1a7099/lib/python3.9/site-packages/sklearn/metrics/_scorer.py'>
 in function BATCH_PREDICT_ROI with handler udf_py_715127722.compute

relevant libraries and versions:
snowflake-snowpark-python 1.7.0
snowflake-ml-python 1.0.8
scikit-learn 1.2.2

Hex Notebook - No module named 'snowflake.core'

When using the Hex notebook, I am getting errors when trying to import snowflake.core

from snowflake.snowpark.session import Session
from snowflake.snowpark.functions import month,year,col,sum
from snowflake.snowpark.version import VERSION
from snowflake.connector import connect
from snowflake.core import Root
from snowflake.core.task import Task, StoredProcedureCall
from snowflake.core.task.dagv1 import DAG, DAGTask, DAGOperation
from snowflake.core import CreateMode

# Misc
from datetime import timedelta
import json
import logging

logger = logging.getLogger("snowflake.snowpark.session")
logger.setLevel(logging.ERROR)



---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_13/2588667778.py in <cell line: 14>()
     12 from snowflake.snowpark.version import VERSION
     13 from snowflake.connector import connect
---> 14 from snowflake.core import Root
     15 from snowflake.core.task import Task, StoredProcedureCall
     16 from snowflake.core.task.dagv1 import DAG, DAGTask, DAGOperation

ModuleNotFoundError: No module named 'snowflake.core'

snowflake-ml-python==1.0.0 is not supported for configured python=3.9

Pip subprocess output:
Collecting streamlit
Using cached streamlit-1.24.0-py2.py3-none-any.whl (8.9 MB)

Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 0.55.2 Requires-Python <3.5
ERROR: Could not find a version that satisfies the requirement snowflake-ml-python==1.0.0 (from versions: 1.0.1, 1.0.2)
ERROR: No matching distribution found for snowflake-ml-python==1.0.0

failed

CondaEnvException: Pip failed

OS: Windows 11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.