Description of some databricks workshops and learning material we have developed at Knowit.
These workshops are 2.5h hands-on workshops for learning various important aspects of databricks.
At Knowit we call these workshops Toppturer, giving quick but meaningful experience with a technology/tool/framework.
![image](https://private-user-images.githubusercontent.com/264435/322796452-dea1f874-b9b5-49d3-b2c1-345d591a051e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxNzgwNDUsIm5iZiI6MTcxOTE3Nzc0NSwicGF0aCI6Ii8yNjQ0MzUvMzIyNzk2NDUyLWRlYTFmODc0LWI5YjUtNDlkMy1iMmMxLTM0NWQ1OTFhMDUxZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjIzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyM1QyMTIyMjVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iNjBjOTJlYmVkN2E3NWJjYWUyY2MxMzYwOGQ0ZDE3NTYwMTVlZmY4ZjE2MTQ0OTEwOGY5OGM1NWY1OTA2M2U0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.HhuyEivy9HT6yraGuCZQjw1-_gsi2OWXlbirXJAdZUc)
Link: https://github.com/knowit/AWS-Databricks-NYC-Taxi-Workshop
For: Developers, analysts, data scientists, data engineers.
Pre-requisites: Some python knowledge
Topics:
- Basic understanding of components and tools in Databricks
- Perform data transformation in Spark SQL and Pyspark
- Use Databricks Reops for git-versioned Data Engineering
- Deploy a Spark job with Databricks Workflows
- Write ETL code and data quality checks in Delta Live Tables
Link:
Link: https://github.com/paalvibe/llm-langchain-course
For: Anybody
Topics:
- Setup and use of LLMs in Databricks
- Use of Langchain-rammeverket for:
- LLM-wrapping
- LLM-serving
- Summarizing
- Context embedding with chromadb
- Reformating
- Multi query retrieval
- Prompt engineering
Link: https://github.com/paalvibe/llm-tune-course
For: Anybody
Topics:
- What is an LLM (Large Language Model)?
- Tuning of LLM-modeller on Databricks
- Different modes of adapting LLMs
- When and when not to train your own LLM?
Link: https://github.com/paalvibe/databricks-dataops-course
For: Data Engineers, Full stack data scientists, ML Engineers, Data Platform Engineers
Topics:
- Opinionated git-based approach to DataOps
- Structure your environments to allow for dev runs of data pipelines
- Move data pipelines from dev to prod
- Using git branches and commits to name and manage data and jobs responsibly
- Will not do Github Actions here, but the processed needed are used
- Does not cover data quality nor pipeline management
Pre-requisites: Some python knowledge
For: Data Engineers, Full stack data scientists, ML Engineers, Data Platform Engineers
- How to enable data contracts and data quality checks in pipelines
- Difference between Delta Live Tables and regular databricks notebooks
Pre-requisites: Some python knowledge