To run these exercises, follow each instructions on the notebook below.
- Storage Settings
- Basics of PySpark, Spark Dataframe, and Spark Machine Learning
- Spark Machine Learning Pipeline
- Hyper-parameter Tuning
- MLeap (requires ML runtime)
- Horovod Runner with TensorFlow (requires ML runtime)
- Structured Streaming (Basic)
- Structured Streaming with Azure EventHub or Kafka
- Delta Lake
- MLflow (requires ML runtime)
- Orchestration with Azure Data Services
- Delta Live Tables
- Databricks SQL
- Create Azure Databricks resource in Microsoft Azure.
When you create a resource, please select Premium plan. - After the resource is created, launch Databricks workspace UI by clicking "Launch Workspace".
- Create a compute (cluster) in Databricks UI. (Select "Compute" menu and proceed to create.)
When you create a compute, please select Runtime ML (not a simple Runtime). - Download HandsOn.dbc and import into your workspace as follows.
- Select "Workspace" in Workspace UI.
- Go to user folder, click your e-mail (the arrow icon), and then select "import" command.
- Pick up
HandsOn.dbc
to import.
- Open notebook and attach above compute (cluster) in every notebook. (Select compute on the top of each notebook.)
- Please make sure to run "Exercise 01 : Storage Settings (Prepare)", before running other notebooks.
Note : You cannot use Azure trial (free) subscription, because of the limited quota. When you're in Azure free subscription, please promote to pay-as-you-go. (The credit in free subscription will be reserved, even when you transit to pay-as-you-go.)
Tsuyoshi Matsuzaki @ Microsoft