Giter Club home page Giter Club logo

data-assignment's Introduction

#Workflow Summarizer on Sunbird Telemetry Data

Technologies Used

  • Scala 3
  • Apache Spark
  • Apache Flink
  • Apache Kafka
  • Apache Livy for Spark Job Submission
  • Kubernetes with Docker as Container Runtime
  • Apache Druid
  • Apache Zookeeper
  • Apache Maven

How to Run :-

  • Build the Fat Jar for the spark job (mvn package)
  • Build the Fat Jar for the flink job (mvn package)
  • ./build.sh to Deploy the cluster

##Workflow Diagram

Alt text

Submitting the Spark job

Using Rest API exposed by Apache Ivy

API Doc Reference - https://livy.incubator.apache.org/docs/latest/rest-api.html

curl --location --request POST 'localhost:30190/batches' \
--header 'Content-Type: application/json' \
--data-raw '{
    "file": "http://192.168.1.16:8080/DataProduct-1.0-SNAPSHOT-jar-with-dependencies.jar",
    "className": "org.example.App",
    "args": [
        "/opt/spark-data/2022-01-01-14-1641034589056.json.gz"
    ],
    "numExecutors": 1,
    "conf": {
        "spark.jars.packages": "org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.2",
        "spark.driver.host": "10.1.6.157",
        "spark.submit.deployMode": "client",
        "spark.driver.bindAddress": "0.0.0.0"
    }
}'

or manually

spark-submit --class `<class-name>` --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.2 `<jar-path>` `<file-path>` `<num-partitions optional>`

Example -

spark-submit --class org.example.App --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.2 /opt/spark-jars/DataProduct-1.0-SNAPSHOT-jar-with-dependencies.jar /opt/spark-data/2022-01-01-14-1641034589056.json.gz

Submitting Ingestion spec to the Druid

curl -X POST -H 'Content-Type: application/json' -d @/Users/ravinderkumar/workspace/sanketika/assignment/Data-pipeline/assets/ingestionSpec/telemetry.json http://localhost:30101/druid/indexer/v1/supervisor

Druid Connection string

druid://router:8888/druid/v2/sql

data-assignment's People

Contributors

yravinderkumar33 avatar

Watchers

 avatar

Forkers

utkarsha-kapoor

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.