####The data is from the Ergast website.
The data is stored in the form of an API, downloadable CSVs, and nested or non-nested JSON files. Azure Databricks on top of Apache Spark, Azure Notebook, and Azure Data Lakes Storage are the main tools for this ETL Project.
In this project, I focused on extraction from the CSV AND JSON files for my ETL. This can be done on a free AZURE trial option from Microsoft.
Here is a quick diagram of the high-level plan.
Purple Blocks show columns were renamed and/or transformed Red Blocks show columns that were dropped Green Blocks show columns that were Added
Both horizontal and vertical scaling is very much possible but a larger budget would be necessary to truly take advantage of the full potential of Azure Databricks.
Below are random snapshots the reproducable files are avalable DataBricks files are in the folder