Giter Club home page Giter Club logo

example-apache-pinot-docker's Introduction

Apache Pinot streaming project

This project demonstrates the ability of Apache Pinot to consume continuous realtime information from a Kafka source.

The kafka source is itself populated by aggregating data based on realtime cryptocurrency trading information from a public url: www.cryptocompare.com ingested and processed by Apache Spark. Spark is only used to perform aggregation of avg prices per coin per exchange and then bucket the data in window intervals of 1 minute to make meaningful sense out of the data being received.

The entire running of this project has been dockerized to allow for easy setup.

Open in Gitpod

Prerequisites

To run the project, you must have the following installed on your local system.

  • Intellij IDEA ide 2022.1 or later with Scala plugin and SBT plugin 1.6.2 enabled.
  • Docker desktop v4.10.1 or later and docker-compose v1.29 or later

Setup

The project makes use of docker images and running the project the first time may take some time due to network speed when pulling the images from docker. The following docker images are used:

  1. bitnami/zookeeper:3.5.5
  2. bitnami/kafka:2.2.1
  3. apachepinot/pinot:0.9.3
  4. bitnami/spark:3.3.0

To setup your environment, pls follow the steps below.

  1. Clone this project and setup your IDE to resolve all dependencies required by this project.
  2. Create a free account on CryptoCompare by going to the site below.
https://min-api.cryptocompare.com/ 
  1. Follow the instructions to setup your API_KEY. This API_KEY is what will grant the access to stream live messages from the exposed cryptocompare API websocket to our Spark application.
  2. Switch to your IDE and open up the terminal in the IDE. Ensure the current/ working directory is at the root of the project and running the following bash command.
chmod 755 *.sh
./start.sh
  1. Wait a while for the docker images to be pulled and for the project image to be built. You can confirm if spark is started and running by hitting the endpoint: http://localhost:18080 To verify that Apache Pinot is up and also running, please hit the endpoint: http://localhost:9090

img_2.png

img_3.png Once all the endpoints have been started and running, run the command below in the terminal to create the Pinot REALTIME table.

docker-compose -f ./kafka-pinot/docker-compose-pinot-exec.yml up -d

Streaming TradeMsgs from CryptoCompare to Apache Pinot via Kafka:

To start streaming TradeMsgs from the CryptoCompare API, run the docker image with command below. THis will execute a spark-submit job, begin consuming Msgs from Cryptocompare websocket, process it and push to a kafka topic for a specified duration. The Pinot REALTIME table created earlier reads from the kafka topic allowin you to execute real time queries against the msgs.

docker exec -e STREAM_TIMEOUT=120 -e 
CRYPTOCOMPARE_API_KEY=<<paste_API_KEY_here>> -it spark-worker-1 
spark-submit --master 
spark://spark-master:7077 akka-websockets-spark-cassandra_2.12-0.1.jar

You can specify the number of seconds the job should run by modifying the STREAM_TIMEOUT parameter. You must also paste your API_KEY by overriding the environment value: CRYPTOCOMPARE_API_KEY

Once the job is started, head over to your Pinot Controller URL to view real time data as shown below.

select * from tradeMsgAvgPriceByPeriod where market = 'Coinbase' and 
direction ='SELL'

img_4.png

Improvements are welcomed! Please feel free to submit a pull request!

example-apache-pinot-docker's People

Contributors

xingh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.