Giter Club home page Giter Club logo

aabouzaid / modern-data-platform-poc Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 1.0 849 KB

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

Home Page: https://dx.doi.org/10.13140/RG.2.2.15360.71689

Jupyter Notebook 98.98% YAML 1.02%
cloud-agnostic cloud-native data-lakehouse data-platform dataops edinburgh-napier kubernetes msc msc-project data-engineering

modern-data-platform-poc's Introduction

Modern Data Platform PoC

A proof of concept for the core of Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

Note

This project is part of my Master of Science in Data Engineering at Edinburgh Napier University (April 2023).

Contents

Architecture

Core Components

The core components of the platform are:

  • Infrastructure (Kubernetes)
  • Data Ingestion (Argo Workflows + Python)
  • Data Storage (MinIO)
  • Data Processing (Dremio)

Initial Model

To visualise the interactions of the current implementation, the C4 software architecture model (Context, Containers, Components, and Code) is used.

The following is a simplified view of the initial architecture model (all the abstractions are combined together).

Modern Data Platform Initial Architecture Model

Deployment

Prerequisites: asdf, Linux operating system, and Docker Engine (tested with asdf 0.11.1, Ubuntu 20.04.5 LTS, and Docker Engine Community 23.0.1).

The following tools are used in the development:

  • Helm
  • KinD
  • Kubectl
  • Kustomize

They could be installed with corresponding versions via asdf:

asdf install

Create the local Kubernetes cluster:

kind create cluster \
  --config clusters/local/kind-cluster-config.yaml

Deploy the applications to the Kubernetes cluster:

kustomize build --enable-helm clusters/local | kubectl apply -f -

Wait for deployments to be ready:

# Ingress-Nginx.
kubectl rollout status deployment \
  --watch --namespace ingress-nginx ingress-nginx-controller

# MinIO.
kubectl rollout status deployment \
  --watch --namespace minio minio

# Argo Workflows.
kubectl rollout status deployment \
  --watch --namespace argo-workflows argo-workflows-server

# Dremio.
kubectl rollout status statefulset \
  --watch --namespace dremio dremio-master

Apply the data pipeline:

kubectl apply --namespace argo-workflows --filename \
  pipelines/ingestion/argo-workflow-covid19-subnational-data.yaml

Benchmarking

TPC-DS test suite has been used to assess the performance of the platform.

For complete results, please check the project Jupyter Notebook in the benchmarking section.

modern-data-platform-poc's People

Contributors

aabouzaid avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

keshavaspanda

modern-data-platform-poc's Issues

argo ui default login/pwd

hi,
image

i got this http sign in box upon login to argo default UI. My bad, couldn't find anywhere stating the uid/pwd for this. Appreciate your kind attention

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.