This repo goes along with the following blog post. https://www.confessionsofadataguy.com/data-quality-great-expectations-for-data-engineers/
The aim is just to learn about Great Expectations and put it to work in an Apache Spark environment to kick the wheels and test out some of Great Expectations' features.
The blog post covers the following topics.
Data Context
- Holds all configurations and below componets for your project.
Datasource
- Holds all configurations related to your data source(s) and interacting with them.
Expectation
- The actual tests and "expectations" that describe your data.
Checkpoint
- The validation that will run against your data, reporting metrics etc.