Python Lambda to orchestrate loading data into RDS from S3 for UCFS Claimant API service
For history prior to the creation of this repo (2020-04-24) refer to archived private repo dip/ucfs-claimant-load-data
Additional information: Previously the application accepted in full S3 paths in the manifest, however due to a limitation with AWS Batch, the manifest character size has had to be shrunk. The S3 base path is supplied in the manifest as 's3_base_path'. It is the applications responsibility to build up the full S3 path for the objects.
- All staging tables created by previous load are DROPPED
- New staging tables are created
- Data is loaded into staging tables with PK, data, and virtual columns
- Any rows with a NULL IDs are removed from the staging tables
- Unique constraints are added to the staging table ID columns
- Indices are added
- Live tables are renamed to
_old
- Staging tables are renamed to become live tables
docker build -t dwpdigital/ucfs-claimant-api-load-data .
Sample manifest is provided for passing into main.py
either directly or via docker run
like this:
docker run -e "LOG_LEVEL=DEBUG" -e "AWS_DEFAULT_REGION=eu-west-2" dwpdigital/ucfs-claimant-api-load-data -m $(cat manifest.json.txt)
aws --profile dataworks-development --region eu-west-2 batch submit-job --job-name dan_cli --job-queue ucfs_claimant_api --job-definition ucfs_claimant_api_load_data_job --parameters manifest=$(cat manifest.json.txt)