Giter Club home page Giter Club logo

strata-data-pipelines's Introduction

Strata Data Pipelines

Build

docker build . -t data-pipelines:latest

Run Anchor Localnet

Clone your anchor repo (if capturing events for anchor). Otherwise, follow similar steps for you Solana setup.

Run

anchor localnet

Then upload your idl(s) with

anchor idl init <program-id> --filepath <idl.json> --provider.cluster localnet

Run Data Pipelines

First, update ACCOUNTS and ANCHOR_IDLS in docker-compose.yml to the programs you would like to capture, and the anchor programs you would like to parse.

Run

docker-compose up

In this repo. You can also run a subset, for example only run up to the event transformer:

docker-compose up event-transformer

Run Strata

If you're doing local dev for strata, you'll want our leaderboards.

First, clone strata api and build:

cd strata-api && docker build . -t strata-api:latest
cd strata-compose && docker-compose up

Setup kSQL

Components

See (and render) architecture.puml for a birds-eye view of the system.

Kafka S3 Slot Identifier

Identifies contiguous solana slots and pushes them to a heavily partitioned kafka topic

Kafka S3 Block Uploader

This utility pulls blocks for each contiguous Solana slot (as idneitified by the slot identifier) and inserts them into S3.

It then sends an event pointing to that s3 location to Kafka. We avoid sending the full block to kafka as it may be too large of a message.

Note that because slot identifier slots are partitioned, we can horizontally scale this uploader as many times as there are partitions. We found we needed 3-4 to keep up with mainnet.

Event Transformer

Reads the events from Kafka S3 Block Uploader, pulls the blocks from S3, and transforms the transaction data into usable JSON events. Each event has common fields like type, blockTime, slot.

This gives us a fat topic of all events occurring on the blockchain

ksqlDB

Looking at ksql/, you can see all of our ksqlDB queries. These queries turn the firehose of json.solana.events topic into useful tables and streams.

The main usecase right now for these streams is to create leaderboards both on holders of individual accounts, and top tokens leaderboards

Leaderboard Redis Inserters

These read from streams generated by ksqlDB and insert them into Redis sorted sets so that we can power a fast graphQL API.

Deploying

You should use the strata-terraform repo to deploy the full pipeline. We use app.terraform.io to provision and launch terraform objects on AWS.

Local Development

Boot up docker compose, but excluding the services you don't need. You can do this by passing args

docker-compose up minio kafka redis kowl

Now, you can launch whatever utility you want using vscode tasks that exist for this purpose.

You can use kowl at localhost:8080 to see what's going into the topics.

Trophies

To test trophy sending, you can run

jq -rc . tests/resources/trophy.json | kafka-console-producer.sh --topic json.solana.trophies --bootstrap-server localhost:29092

strata-data-pipelines's People

Contributors

chewingglass avatar noahprince22 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.