Giter Club home page Giter Club logo

amazon-mwaa-complex-workflow-using-step-functions's Introduction

Build complex workflows with Amazon MWAA,AWS Step Functions ,AWS Glue and Amazon EMR

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

Code repo structure

.
├── README.MD                   <-- The instructions file
├── dags/mwaalib                <-- Reusable code for Amazon EMR and AWS Step Functions
├── setup                       <-- Source code for initial setup
│   └── transform/              <-- Pre processing pyspark code and resuable code.     
│   └── template.yaml           <-- Template for basic application setup
│   └── deploy.sh               <-- Deploy Script 

Requirements

  • AWS CLI already configured with Administrator permission

Architecture

Architecture

Prerequisites

  1. AWS Account .Create an AWS account if you do not already have one and login.

  2. Amazon Managed Workflow for Apache Airflow environment in supported region.Create environment if you do not have one. Note us-west-2 is selected. Change the region, if required.

  3. IAM permissions for the MWAA Execution role for S3 ,EMR, Step Functions and AWS Systems Manager Parameter Store.

    elasticmapreduce:RunJobFlow
    iam:PassRole on EMR_DEFAULT_ROLE
    iam:PassRole on EMR_EC2_ROLE
    states:DescribeStateMachineForExecution
    states:DescribeStateMachine
    states:DescribeExecution
    states:StartExecution
    ssm:GetParameters
    ssm:GetParameter
    

A sample Policy is provided as an example. Verify and edit the Account Number to your AWS Account Number. Create and Attach the Policy to the Amazon MWAA execution role.

Refer to this link for Adding and removing IAM identity permissions.

A sample role yaml is also provided if you do not have EMR_DEFAULT_ROLE and EMR_EC2_ROLE already created. Run the Cloudformation template to create EMR Roles

Installation Instructions

  1. Create an AWS account if you do not already have one and login.

  2. Clone the repo onto your local development machine using git clone.

  3. From the command line, change directory into the setup folder, then run:

    ./deploy.sh -s <MWAA Airflow Dag Bucket Name> -d <Demo Data Bucket Name>
    
    

    Replace <MWAA Airflow Dag Bucket Name> with the MWAA Airflow S3 Bucket

    Replace <Demo Data Bucket Name> with any bucket you want to use.

    Modify the stack-name or bucket parameters as needed. Wait for the stack to complete.

  4. Wait for the script to complete. You should see the following logs.

    Waiting for stack update to complete ...
    Finished create/update successfully!
    upload: ./movielens_glue_transform.py to s3://mwaa-dl-demo-us-east-1/scripts/glue_jobs/movielens/movielens_glue_transform.py
    upload: transform/preprocess_movies.py to s3://mwaa-dl-demo-us-east-1/scripts/preprocess_movies.py
    upload: transform/preprocess_tags.py to s3://mwaa-dl-demo-us-east-1/scripts/preprocess_tags.py
    upload: transform/preprocess_ratings.py to s3://mwaa-dl-demo-us-east-1/scripts/preprocess_ratings.py
    ...
    

Post Installation Checks

  1. Verify the resources created by the Cloudformation template.
  2. Verify that Amazon MWAA execution role has additional policy attached.
  3. The deploy script creates a Glue Database and 2 crawlers. If you have Lakeformation enabled, please make sure to add the LF database grant to the crawler.

AWS resources :

Following stacks are created by the above process

  1. mwaa-demo-foundations - Contains the foundational resources and services
    • Glue Database - mwaa-movielens-demo-db
    • Glue Crawlers - Crawlers to catalog the data.
    • Lambda Functions - To invoke Glue jobs and check status from Step Functions
    • LambdaRole - Lambda role for Step1 and Step2
    • SSM Parameters - SSM parameters for resources to be used by all services.
    • Step Functions - Movie Lens Step function

AWS resources created based on DAG Run:

  1. EMR Cluster

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

amazon-mwaa-complex-workflow-using-step-functions's People

Contributors

amazon-auto avatar dgghosalaws avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.