Giter Club home page Giter Club logo

e2e_mlops_aws_cicd's Introduction

Layout of the SageMaker ModelBuild Project Template

The template provides a starting point for bringing your SageMaker Pipeline development to production.

|-- codebuild-buildspec.yml
|-- CONTRIBUTING.md
|-- pipelines
|   |-- abalone
|   |   |-- evaluate.py
|   |   |-- __init__.py
|   |   |-- pipeline.py
|   |   `-- preprocess.py
|   |-- get_pipeline_definition.py
|   |-- __init__.py
|   |-- run_pipeline.py
|   |-- _utils.py
|   `-- __version__.py
|-- README.md
|-- sagemaker-pipelines-project.ipynb
|-- setup.cfg
|-- setup.py
|-- tests
|   `-- test_pipelines.py
`-- tox.ini

Start here

This is a sample code repository that demonstrates how you can organize your code for an ML business solution. This code repository is created as part of creating a Project in SageMaker.

In this example, we are solving the abalone age prediction problem using the abalone dataset (see below for more on the dataset). The following section provides an overview of how the code is organized and what you need to modify. In particular, pipelines/pipelines.py contains the core of the business logic for this problem. It has the code to express the ML steps involved in generating an ML model. You will also find the code for that supports preprocessing and evaluation steps in preprocess.py and evaluate.py files respectively.

Once you understand the code structure described below, you can inspect the code and you can start customizing it for your own business case. This is only sample code, and you own this repository for your business use case. Please go ahead, modify the files, commit them and see the changes kick off the SageMaker pipelines in the CICD system.

You can also use the sagemaker-pipelines-project.ipynb notebook to experiment from SageMaker Studio before you are ready to checkin your code.

A description of some of the artifacts is provided below:

Your codebuild execution instructions. This file contains the instructions needed to kick off an execution of the SageMaker Pipeline in the CICD system (via CodePipeline). You will see that this file has the fields definined for naming the Pipeline, ModelPackageGroup etc. You can customize them as required.

|-- codebuild-buildspec.yml



Your pipeline artifacts, which includes a pipeline module defining the required get_pipeline method that returns an instance of a SageMaker pipeline, a preprocessing script that is used in feature engineering, and a model evaluation script to measure the Mean Squared Error of the model that's trained by the pipeline. This is the core business logic, and if you want to create your own folder, you can do so, and implement the get_pipeline interface as illustrated here.

|-- pipelines
|   |-- abalone
|   |   |-- evaluate.py
|   |   |-- __init__.py
|   |   |-- pipeline.py
|   |   `-- preprocess.py



Utility modules for getting pipeline definition jsons and running pipelines (you do not typically need to modify these):

|-- pipelines
|   |-- get_pipeline_definition.py
|   |-- __init__.py
|   |-- run_pipeline.py
|   |-- _utils.py
|   `-- __version__.py



Python package artifacts:

|-- setup.cfg
|-- setup.py



A stubbed testing module for testing your pipeline as you develop:

|-- tests
|   `-- test_pipelines.py



The tox testing framework configuration:

`-- tox.ini

AWS Tools Used

  • Code Pipeline — automate continuous delivery pipelines
  • CodeBuild — continuous integration

To Do

Notes

In AWS CodeBuild, the pre_build phase is used to execute commands before the build starts. The build phase is used to execute commands that build the source code of your application. You can write unit tests in either the pre_build or build phase depending on your use case. If you want to run unit tests before building the source code, you can write them in the pre_build phase. If you want to run unit tests after building the source code, you can write them in the build phase. https://docs.aws.amazon.com/codebuild/latest/userguide/test-report-pytest.html

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.8
    commands:
      - pip install pytest
  pre_build:
    commands:
      
  build:
    commands:
      - python -m pytest --junitxml=<test report directory>/<report filename>
      - python pipeline.py

reports:
  pytest_reports:
    files:
      - <report filename>
    base-directory: <test report directory>
    file-format: JUNITXML

Log

Updates on progress

  • 4 Jun 2023 — pipeline works in SageMaker for HDB dataset through notebook. Entire build works until deployment. Had to use xgboost as algo instead of framework (so not using xgboost_train.py for now)
  • 1 Jun 2023 — modified preprocess, xgboost_train, evaluate, and pipeline for SageMaker (haven't test if it works)
  • 27 May 2023 — created preprocessing.py and xgboost_train.py with arguments (working on local)

AWS Steps

  1. Set up AWS Environment
  2. Set up public VPC
  3. Set up Amazon SageMaker Studio Domain
  4. Set up a SageMaker Studio notebook and parameterize the pipeline

1. Go to IAM, under Roles, edit AmazonSageMakerServiceCatalogProductsLaunchRole by adding "sagemaker:DescribeCodeRepository", "sagemaker:AddTags", "sagemaker:CreateCodeRepository" to it.


  1. Follow SageMaker MLOps Project Walkthrough Using Third-party Git Repos BUT before creating the project, do the following steps:

    1. Follow this by editing AmazonSageMakerServiceCatalogProductsUseRole:
        {
        "Effect": "Allow",
        "Action": [
            "codestar-connections:UseConnection"
        ],
        "Resource": "arn:aws:codestar-connections:*:*:connection/*",
        "Condition": {
            "StringEqualsIgnoreCase": {
                "aws:ResourceTag/sagemaker": "true"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "s3:PutObjectAcl"
        ],
        "Resource": [
            "arn:aws:s3:::sagemaker-*"
        ]
    }
    
    1. Go to IAM, under Toles, add permissions to AmazonSageMakerServiceCatalogProductsLaunchRole: AmazonSageMakerServiceCatalogProductsLaunchRole and AmazonSageMakerServiceCatalogProductsLaunchRole

  1. After uploading data to S3, in IAM role find AmazonSageMakerServiceCatalogProductsUseRole, add below policy in this role: aws/amazon-sagemaker-examples#1923
{
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>/*"
            ]
        }

Useful Resources

e2e_mlops_aws_cicd's People

Contributors

teyang-lau avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.