This is a practical example of a data engineering project. Topics are:
- Automate infrastructure management
- Understanding infrastructure changes before being applied
- Deploy a resource group, a virtual machine, a simple storage, and a data warehouse.
- State management: Proper management and storage of the state, possible remote backends.
- Modularity: Scripts are modularized using modules, promoting reusability.
- Destruction: Safe destruction of resources without leaving orphaned resources in the cloud environment.
- Generate data (Python) & send to Azure Event Hub
- Read Stream data by Stream Analytics
- Storing on Azure Data Lake Storage Gen2
- Machine Learning Part: Deploy endpoint Machine learning (trained model) by Azure Machine Learning Studio
- Adding Database features to Azure SQL Server
- Visualize real-time data by the Power BI dashboard
- Generate data (Python) & send to Azure Event Hub.
- Databricks: using spark to read stream data from the event hub, save data with parquet format in Azure Data Lake Storage Gen2, using push API to send data to Power BI dashboard.
- Web App (Html, Css, Js, Flask) : Input file csv and show report
- Storing on Azure Data Lake Storage Gen2
- Trigger Databricks job when new file arrive in Blob Storage: Azure Function Apps
- Databrick: Ingest data from blob, ETL, Preprocessing and apply Machine learning model (Spark)
- Delta Lake : raw data (Bronze), Select feature & processingn missing values (Silver), Result (Gold)
- Machine Learning Part: Xgboost and ANN
- Adding Database features to Azure SQL Server
- Visualize data by Power BI report
- Start terminal in RealTime/EventHub folder
- Run pip install -r requirements.txt
- Run python generate_realtime_eventhub_operation.py (same with python generate_realtime_eventhub_raman.py)
- Start terminal in BatchTime/WebappDemoplatform folder
- Run pip install -r requirements.txt
- Run python main.py
You should create a new env.
- Connect Azure Data Lake Storage Gen2 and Azure Databricks : https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage