Giter Club home page Giter Club logo

ml-as-a-service-pipeline's Introduction

EGDAR Sentiment Analysis Data Pipeline on Kubernetes Engine

Cover Image

The Goal of this project is to create a data pipeline which creates a labeled dataset, using which we train a ML Model Pipeline and Deploy a Flask App on a Kubernetes Cluster. Everything is Managed On the Cloud.

Lets Get Started!

This Project has 4 Stages

  1. Annotation Pipeline
    • This is the starting point for the main pipeline.
    • It Generates a Database of A Labeled Dataset using Azure Text Analytics API
    • Entire Database is stored in a AWS S3 bucket
  2. Machine Learning Pipeline
    • This is the second pipeline.
    • The Database created in the Annotation Pipeline is used to train our model
    • The trained model is stored on a S3 bucket
  3. REST Flask App
    • The trained model is incubated in a Python Flask REST App
    • The Flask App is tested inside a Docker Container
    • The Docker Container is Deployed on a Google Cloud Kubernetes Engine
  4. Inference Pipeline
    • Inference Pipeline is an Automated Sentiment Analysis Pipeline
    • It scrapes EDGAR Earning Call Transcript Data and stores it in the cloud
    • Using the Flask Webapp in Stage 3, It predicts the sentiment of the document.

Getting Started

These instructions will get you a copy of the project up and running on your Local Environment using Cloud Infrastructure

git clone www.github.com/kashishshah881/ml-as-a-service-pipeline

Prerequisites

Python3.7
AWS Account
GCP Account
Microsoft Azure Account

Installing

What things you need to install the software:

pip3 install -r requirements.txt

Steps For Running on AWS EC2 Cloud

Step 1:
  • Create Multiple AWS S3 Buckets
  • Configure IAM Role having Full S3 Bucket Access in your local environment. Learn More Here
  • Create a GCP Account. Get Started Here
  • Create an Azure Account. Get Started Here
  • Request a Metaflow Sandbox to run your pipeline on AWS Batch.
Step 2:
  • Once Everything is setup, Configure Metaflow's Sandbox. Run metaflow configure sandbox on CLI. Enter The API Keys from Step 1
  • Configure the input/output buckets on AWS S3 and Enter the bucket name in Annotation Pipeline , ML Pipeline , Inference Pipeline and Flask App
  • Lastly, add the Azure Api Keys Here
Step 3:

Run on CLI

  • Change the permission of the files

chmod a+x Annotation\ Pipeline/index.py ML\ Pipeline/index.py Inference\ Pipeline/index.py

  • Running the Annotation Pipeline

./Annotation\ Pipeline/index.py run --with sandbox

  • Running the Machine Learning Pipeline

./ML\ Pipeline/index.py run --with sandbox

  • Creating a docker container of the flask app

cd REST\ Flask\ App/
docker build .
docker login --username=yourhubusername [email protected]
docker push yourhubusername/reponame

Step 4:

Once the Dockerized Flask App is in the repo in Step 3, Create a Kubernetes Cluster on Google Cloud Product and Deploy your Docker File From Hub. Learn More Here
Now Your Flask App Is Up! and Accessible from Anywhere Across The World!

Step 5:

Add the required Tickerfile bucket location in Inference Pipeline
Add Bucket Location Inference Pipeline
Add the IP Address and Port Number Obtained from The GCP Kubernetes Cluster in Inference Pipeline

Built With

Authors

  • Kashish Shah - Design, Architect and Deployment - Linkedin
  • Manogana Mantripragada - Machine Learning Engineer - Linkedin
  • Dhruv Panchal - Research - Linkedin

License

This project is licensed under the Commons Clause License - see the LICENSE.md file for details

ml-as-a-service-pipeline's People

Contributors

kashishshah881 avatar dependabot[bot] avatar

Stargazers

Mukesh Mithrakumar avatar Oskari Vuorinen avatar Balachandar Kurella avatar Nikhil Vinay Sharma avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.