Brandless Data Engineering Take Home Exercise
Setup:
- Scheduler :Apache Airflow
- Database : Postgres 10.0(dev)/ Amazon Redshift(prod)
- Intermediate file/object storage : Local file system and S3
- Languages used : Python, SQL
In this exercise we have setup an Airflow scheduler which queries Edemam Recipe Search API and pulls data for the Pasta recipes and also the health and diet labels associated with each recipes in our recipes table. We have scheduled daily incremental batch jobs which loads data into 3 tables
- recipe_health_lables
- recipe_diet_lables &
- recipes
We have separate Python tasks to load into each of the three tables in the form of a DAG.