azure-databricks-exercise's Introduction

Azure Databricks Hands-on (Tutorials)

To run these exercises, follow each instructions on the notebook below.

Storage Settings
Basics of PySpark, Spark Dataframe, and Spark Machine Learning
Spark Machine Learning Pipeline
Hyper-parameter Tuning
MLeap (requires ML runtime)
Horovod Runner with TensorFlow (requires ML runtime)
Structured Streaming (Basic)
Structured Streaming with Azure EventHub or Kafka
Delta Lake
MLflow (requires ML runtime)
Orchestration with Azure Data Services
Delta Live Tables
Databricks SQL

Prerequisites

Create Azure Databricks resource in Microsoft Azure.
When you create a resource, please select Premium plan.
After the resource is created, launch Databricks workspace UI by clicking "Launch Workspace".
Create a compute (cluster) in Databricks UI. (Select "Compute" menu and proceed to create.)
When you create a compute, please select Runtime ML (not a simple Runtime).
Download HandsOn.dbc and import into your workspace as follows.
- Select "Workspace" in Workspace UI.
- Go to user folder, click your e-mail (the arrow icon), and then select "import" command.
- Pick up HandsOn.dbc to import.
Open notebook and attach above compute (cluster) in every notebook. (Select compute on the top of each notebook.)
Please make sure to run "Exercise 01 : Storage Settings (Prepare)", before running other notebooks.

Note : You cannot use Azure trial (free) subscription, because of the limited quota. When you're in Azure free subscription, please promote to pay-as-you-go. (The credit in free subscription will be reserved, even when you transit to pay-as-you-go.)

Tsuyoshi Matsuzaki @ Microsoft

Recommend Projects