Giter Club home page Giter Club logo

agogosml's Introduction

Agogosml

ย  Status
Agogosml Build status1 Agogosml Library Documentation Status
CLI Build status2 Agogosml CLI Documentation Status

Agogosml is a data processing pipeline project that addresses the common need for operationalizing ML models. The project enables you to deploy models in production at scale and aspires to provide scoring and monitoring of models on the same infrastructure (coming soon).

Features

  • Re-usable/canonical data processing pipeline supporting multiple data streaming technologies (Kafka and Azure EventHub) and deployment to Kubernetes.
  • CI/CD pipeline using Azure DevOps to deploy versioned and immutable pipeline.
  • Blue/Green deployments, automatic role-backs or redeployment of a specific version.

Quick Install & Run

The following quick install instructions assumes you have the azure-cli, Python 3.7 (with C Compiler tools), Docker and Terraform installed.

# 1. Installing the CLI
 pip install agogosml_cli

 # 2. Create a directory for your project
 mkdir hello-agogosml
 cd hello-agogosml

 # 3. Init the project
 agogosml init

 # 4. Fill in the manifest.json (Docker Container Registry, Azure Subscription, etc).
 vi manifest.json

 # 5. Generate the code for the projects
 agogosml generate

The generated folder structure consists of the input reader, customer app and output writer as well as the Azure DevOps pipelines for CI/CD.

For more detailed information, see the User Guide

Architecture

The agogosml package was developed to provide a Data Engineer with a simple configurable data pipeline consisting of three components: an input reader, app (that holds a trained ML model) and an output writer. The three components are instrumented using one Docker container per component.

Input Reader

The input reader acts as the data receiver and obtains the data required as input for the ML model. The package supports both Kafka and EventHub.

Output Writer

The output writer receives the scored data from the app and sends it onto a streaming client (a Kafka or Eventhub instance).

App

The app receives data from the input reader and feeds it to the ML model for scoring. Once scored the data is sent onto the output writer.

For more information about the design, see the Design Documentation

Links

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

agogosml's People

Contributors

c-w avatar cicorias avatar cloudbeatsch avatar danmass avatar devlace avatar eladiw avatar ericschles avatar fnocera avatar itye-msft avatar margaretmeehan avatar martinpeck avatar microsoftopensource avatar msftgits avatar nzigel avatar rbinrais avatar sayar avatar torosent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agogosml's Issues

Address modules not covered in tests when code changes

Code modules are being bypassed in coverage to address target of 70% - these lines are marked with # pragma: no cover

As these modules are changed, the goal is to remove pragma statements where code is changing and ensure coverage is in place when meaningful.

agogosml should use a single container per model

The three containers add to overall runtime complexity and increase build times while making it harder to maintain the semantics of the queue/stream to which the containers are connected. Much can go wrong in the two extra http hops. Given how we create agogosml apps/containers from templates, we could easily create a single-container template.

agogosml is currently a more brittle (at runtime), less capable model hosting environment than Kafka Streams ( https://kafka.apache.org/21/documentation/streams/core-concepts ). We should simplify agogosml with the idea of moving over to Kafka Streams when the Kafka head for Azure Event Hub supports it.

tox failures during `make test-all`

during a run of make test-all tests fails. see pyenv/pyenv-virtualenv#206 and tox-dev/tox-pyenv#4

flake8 runtests: commands[0] | flake8 cli
__________________________________________________________________________________________________________ summary __________________________________________________________________________________________________________
  py35: commands succeeded
ERROR:   py36: Error creating virtualenv. Note that some special characters (e.g. ':' and unicode symbols) in paths are not supported by virtualenv. Error details: InvocationError("Failed to get version_info for python3.6: pyenv: python3.6: command not found\n\nThe `python3.6' command exists in these Python versions:\n  3.6.8\n  3.6.8/envs/py36\n  3.6.8/envs/py368\n  py36\n  py368\n\n", None)
  py37: commands succeeded
  flake8: commands succeeded
(py35) cicorias@cicoria-msi:/c/g/cse/tiem/agogosml/agogosml_cli$

more verbose

GLOB finish: packaging after 0.24 seconds
copying new sdistfile to '/root/.tox/distshare/example-pkg-your-username-0.0.1.zip'
package .tmp/package/1/example-pkg-your-username-0.0.1.zip links to dist/example-pkg-your-username-0.0.1.zip (/app/.tox)
py35 start: getenv /app/.tox/py35
py35 cannot reuse: no previous config /app/.tox/py35/.tox-config1
py35 create: /app/.tox/py35
ERROR: Error creating virtualenv. Note that some special characters (e.g. ':' and unicode symbols) in paths are not supported by virtualenv. Error details: InvocationError("Failed to get version_info for python3.5: pyenv: python3.5: command not found\n\nThe `python3.5' command exists in these Python versions:\n  3.5.6\n\n", None)
py35 finish: getenv after 0.05 seconds
py36 start: getenv /app/.tox/py36
py36 cannot reuse: no previous config /app/.tox/py36/.tox-config1
py36 create: /app/.tox/py36
ERROR: Error creating virtualenv. Note that some special characters (e.g. ':' and unicode symbols) in paths are not supported by virtualenv. Error details: InvocationError("Failed to get version_info for python3.6: pyenv: python3.6: command not found\n\nThe `python3.6' command exists in these Python versions:\n  3.6.8\n\n", None)
py36 finish: getenv after 0.05 seconds
___________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________ERROR:   py35: Error creating virtualenv. Note that some special characters (e.g. ':' and unicode symbols) in paths are not supported by virtualenv. Error details: InvocationError("Failed to get version_info for python3.5: pyenv: python3.5: command not found\n\nThe `python3.5' command exists in these Python versions:\n  3.5.6\n\n", None)
ERROR:   py36: Error creating virtualenv. Note that some special characters (e.g. ':' and unicode symbols) in paths are not supported by virtualenv. Error details: InvocationError("Failed to get version_info for python3.6: pyenv: python3.6: command not found\n\nThe `python3.6' command exists in these Python versions:\n  3.6.8\n\n", None)
cleanup /app/.tox/.tmp/package/1/example-pkg-your-username-0.0.1.zip

Forward slash missing in the input reader and output writer dockerfiles, causing the builds to fail

Issue Template

  • Does your issue follow our Code of Conduct?

  • Ensure you are up-to-date with master.

  • Search existing issues.

  • Part of the project with issues: (ie. Agogosml, Deployment, CLI)

  • Indicate OS, version of Python (Note: Only Python 3.7 is currently supported.)

    • Include versions of tooling where appropriate (Terraform, Kubernetes, Docker, etc.)
  • Steps of reproduce issue. Provide a link to a Github Gist or paste here if short.
    To reproduce use the generate command to generate the template and then look at the input reader and out put writer docker files. both of them will have paths as follows :

FROM ${CONTAINER_REG}agogosml:${AGOGOSML_TAG} as builder

There needs to be a "/" after ${CONTAINER_REG} . because of the path being incorrect, the Azure Devops build fails with the following message

ull access denied for *********.azurecr.ioagogosml, repository does not exist or may require 'docker login'

After adding "/" build is successful

I will submit PR for this as well.

.env file not generated after agogosml generate is invoked

Issue Template

  • Does your issue follow our Code of Conduct?

  • Ensure you are up-to-date with master.

  • Search existing issues.

  • Part of the project with issues: (ie. Agogosml, Deployment, CLI)

  • Agogosml CLI
  • Indicate OS, version of Python (Note: Only Python 3.7 is currently supported.)

    • Include versions of tooling where appropriate (Terraform, Kubernetes, Docker, etc.)
      I have python 3.7.1 on Mac OS
  • Steps of reproduce issue. Provide a link to a Github Gist or paste here if short.

Signal Interrupts are not handled consistently between various implementation of InputReaders and OutputWriters

Issue Template

  • Does your issue follow our Code of Conduct?

  • Ensure you are up-to-date with master.

  • Search existing issues.

  • Part of the project with issues: (ie. Agogosml, Deployment, CLI): Agogosml Library

  • Indicate OS, version of Python (Note: Only Python 3.7 is currently supported.) All.

    • Include versions of tooling where appropriate (Terraform, Kubernetes, Docker, etc.)
  • Steps of reproduce issue. Provide a link to a Github Gist or paste here if short.

Can we leverage connexion in simple app's main.py?

In the simple app main.py template, a very low level construct is used to serve http requests. The http server validates the incoming requests against a JSON schema, transforms the data and then creates a response.

This is a common pattern in the web world: we want to ensure that some request corresponds to our desired schema, call a method on the request body and respond with some data. A great abstraction for this workflow is connexion which is a framework that lets you create http servers based on OpenAPI/Swagger specifications. The framework takes care of validating that the request matches the Swagger schema, routing the request to a particular Python function and serializing the response. As such, it seems like a great match for our use-case.

For example, to translate the simple app into connexion, the following would be all that's required:

File api.spec.yaml

swagger: '2.0'

info:
  title: My simple API
  version: '0.1'

basePath: '/'

paths:
  '/':
    post:
      summary: Transform the data.
      operationId: datahelper.transform
      consumes:
        - application/json
      parameters:
        - $ref: '#/parameters/Data'
      responses:
        200:
          description: The transformed data.

parameters:
  Data:
    name: data
    description: The data to transform.
    in: body
    schema:
      $ref: '#/definitions/Data'
    required: true


definitions:
  Data:
    properties:
      key:
        description: The key of the input.
        type: string
      intValue:
        description: The value of the input.
        type: number
    required:
      - key
      - intValue

File main.py

from connexion import App

app = App(__name__)
app.add_api('api.spec.yaml')
app.run(port=8080, host='0.0.0.0')

The business logic method datahelper.transform doesn't even have to know that it's dealing with JSON inputs anymore! It simply gets a deserialized object passed into it that has the keys/values specified in the Swagger spec. Another advantage of connexion is that we can chose from a wide range of server backends like flask, tornado, aiohttp, etc. so it's trivial to get good performance out of the app with just a configuration change.

Last but not least, we can abstract the Swagger spec and use templating so that the user only has to provide the type definitions for the data and never edit the entire Swagger spec file, e.g.:

File input.spec.yaml

Data:
  properties:
    key:
      description: The key of the input.
      type: string
    intValue:
      description: The value of the input.
      type: number
  required:
    - key
    - intValue

There's an example of how to bind this fragment into the Swagger spec here.

Another benefit of this approach is that we'll get self-documenting APIs since a Swagger spec will be auto-loaded to ${basePath}/swagger.json

Support for pyinstaller

Issue Template

  • Does your issue follow our Code of Conduct?

  • Ensure you are up-to-date with master.

  • Search existing issues.

  • Part of the project with issues: (ie. Agogosml, Deployment, CLI)

  • Indicate OS, version of Python (Note: Only Python 3.7 is currently supported.)

    • Include versions of tooling where appropriate (Terraform, Kubernetes, Docker, etc.)
  • Steps of reproduce issue. Provide a link to a Github Gist or paste here if short.

pyyaml inconsistent version in Pip lock files and security warning

The version of pyyaml referenced differently in

./agogosml/Pipfile
./agogosml/Pipfile.lock  
./agogosml/setup.py
./agogosml_cli/Pipfile.lock
[packages]

pyyaml = ">=4.2b1"

Issue Template

  • Does your issue follow our Code of Conduct?

  • Ensure you are up-to-date with master.

  • Search existing issues.

  • Part of the project with issues: (ie. Agogosml, Deployment, CLI)

  • Indicate OS, version of Python (Note: Only Python 3.7 is currently supported.)

    • Include versions of tooling where appropriate (Terraform, Kubernetes, Docker, etc.)

Agogosml CLI with different versions of Python

Refinement and improvement required testing of CI.

Acceptance:

  • the output of CLI "build" is 3 different python "packages" usable under 3.5, 3.6, 3.7 - these would be 3 different docker images tagged in some manner to represent the Python version and 3 different dist/ tar.gz from the make dist command as pipeline artifacts
TAG
latest
python3.5-1.0.1
python3.6-1.0.1
python3.7-1.0.1
  • all tests pass under the 3 selected python versions
  • needs to handle docker file for "each" Python version
  • Push to a public Container Registry either Docker Hub or a Microsoft managed ACR with tags that are unique per PY version and a latest as 3.7 version
  • Should be driven by the ARG passed in - see Clemens hint with Travis example
  • each build via the ./dist should be an artifact on the build

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.