Giter Club home page Giter Club logo

mlops-e2e's Introduction

MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK

This sample project uses a sample machine learning project to showcase how we can implement MLOps - CI/CD for Machine Learning using Amazon SageMaker, AWS CodePipeline and AWS CDK

Pre-requisite

Configuration

Source Repo

Option 1: Use GitHub Repo

  1. Fork this repo in your GitHub account
  2. Create a GitHub connection using the CodePipeline console to provide CodePipeline with access to your Github repositories (See session Create a connection to GitHub (CLI))
  3. Update the GitHub related configuration in the ./configuration/projectConfig.json file
    • Set the value of repoType to git
    • Update the value of githubConnectionArn, githubRepoOwner and githubRepoName

Option 2: Create a CodeCommit Repo in your AWS account

Alternatively, the CDK Infrastructure code can provision a CodeCommit Repo as Source Repo for you.

To switch to this option, set the value of repoType to codecommit in the ./configuration/projectConfig.json file.

Usage

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

Bootstrap

Run the command below to provision all the required infrastructure.

bootstrap.sh

The command can be run repatedly to deploy any changes in this folder.

Source Code

If repoType is codecommit, after the cloudformation stack is created, follow this page to connect to the CodeCommit Repo and push the content of this folder to the main branch of the repo.

Note: The default branch may not be main depending on your Git setting.

Testing Data Set

Download a copy of testing data set from https://archive.ics.uci.edu/ml/datasets/abalone, and upload it to the Data Source S3 Bucket (The bucket name starts with mlopsinfrastracturestack-datasourcedatabucket...) under your prefered folder path, e.g. yyyy/mm/dd/abalone.csv.

Cleanup

To clean up all the infrastructure, run the command below:

cleanup.sh

Sample Machine Learning Project

The project is created based on the SageMaker Project Template - MLOps template for model building, training and deployment.

In this example, we are solving the abalone age prediction problem using a sample dataset. The dataset used is the UCI Machine Learning Abalone Dataset. The aim for this task is to determine the age of an abalone (a kind of shellfish) from its physical measurements. At the core, it's a regression problem.

Project Layout

  • buildspecs: Build specification files used by CodeBuild projects
  • configuration: Project and Pipeline configuration
  • docs: Images used in the documentation
  • infrastructure: AWS CDK app for provisioning the end-to-end MLOps infrastructure
  • ml_pipeline: The SageMaker pipeline definition expressing the ML steps involved in generating an ML model and helper scripts
  • model_deploy: AWS CDK app for deploying the model on SageMaker endpoint
  • scripts: Bash scripts used in the CI/CD pipeline
  • src: Machine learning code for peprocessing and evaluating the ML model
  • tests: Unit testing code for testing machine learning code

Overall Architecture

The overall archiecture of the sample project is shown below:

Overall Archiecture

License

This project is licensed under the MIT.

Contributing

Refer to CONTRIBUTING for more details on how to contribute to this project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.