Giter Club home page Giter Club logo

finnhub-realtime-pipeline's Introduction

Finnhub Realtime Pipeline

The project involves building a data pipeline that streams real-time trading data from the Finnhub.io using websockets. This data is then processed using Spark for analysis and realtime dashboard.

Architecture

Arch

All applications are containerized into Docker containers

  1. Data source layer - Finnhub.io offering real-time trades for US stocks, forex and crypto using websockets

  2. Data ingestion layer - A containerized Python application named producer connects to the Finnhub.io websocket. It encodes retrieved messages into Avro format, as specified in the schemas/trades.avsc file, and ingests messages into the Kafka broker.

  3. Message broker layer - Messages from producer are consumed by the Kafka broker located in the broker container.

  4. Stream processing layer - A Spark cluster consisting of one master and two worker node is set up. A PySpark application named processor is submitted to the Spark cluster manager, which delegates a worker for it. This application connects to the Kafka broker to retrieve messages, transforms them using Spark Structured Streaming, and loads them into Cassandra tables. The first query, which transforms trades into a feasible format, runs continuously, whereas the second query, involving aggregations, has a 5-second trigger.

  5. Data serving layer - A Cassandra database stores and persists data from Spark jobs. Upon launching, the cassandra-setup.cql script runs to create the keyspace and tables.

  6. Visualization layer - Grafana connects to the Cassandra database using HadesArchitect-Cassandra-Plugin and serves visualized data to users, exemplified in the Finnhub Sample BTC Dashboard. The dashboard refreshes every 500ms.

Dashboard

A live dashboard that constantly updates to display trade prices every 500 milliseconds.

Dashboard

Setup & deployment

  1. Obtain an API Key: Register at Finnhub to generate your API key

  2. Configure Environment Variables:

    • Locate and rename the [sample.env](sample.env) file to .env
    • Update the .env file with the generated API key and any necessary configurations
    FINNHUB_API_TOKEN=abc123xyz
    FINNHUB_STOCKS_TICKERS=["AAPL", "AMZN", "MSFT", "GOOG", "NVDA", "META", "TSLA", "NFLX",  "BINANCE:BTCUSDT", "IC MARKETS:1"]
    FINNHUB_VALIDATE_TICKERS=1
    KAFKA_SERVER=broker
    KAFKA_PORT=29092
    KAFKA_TOPIC_NAME=market
    CASSANDRA_HOST=cassandra
    CASSANDRA_PORT=9042
    CASSANDRA_USERNAME=cassandra
    CASSANDRA_PASSWORD=cassandra
    
  3. Start the Service: Launch the services

    make run
    
  4. View Producer Logs: Check the logs of the producer container to monitor activity

  5. Execute Spark Job: Submit the Spark job

    make spark-job
    
  6. Access Visualization: Open your browser and go to localhost:3003 to view the visualization. Enter admin for username and password.

  7. Shutdown Containers: Stop all running containers

    make down
    

finnhub-realtime-pipeline's People

Contributors

piyush-an avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.