Giter Club home page Giter Club logo

ethereum-etl-postgres's Introduction

Ethereum ETL for PostgreSQL

The steps below will allow you to bootstrap a PostgreSQL database in GCP with full historical and real-time Ethereum data: blocks, transactions, logs, token_transfers, and traces.

The whole process will take between 24 and 72 hours.

Prerequisites:

1. Export Ethereum data from BigQuery to CSV files in GCS

  • Install gcloud and run gcloud auth login
  • Run
pip install -r requirements.txt
export BUCKET=<your_gcs_bucket>
bash ethereum_bigquery_to_gcs.sh $BUCKET

Optionally provide start and end dates: bash ethereum_bigquery_to_gcs.sh $BUCKET 2020-01-01 2020-01-31

Exporting to CSV files is going to take about 10 minutes.

2. Import data from CSV files to PostgreSQL database in Cloud SQL

  • Create a new Cloud SQL instance
export CLOUD_SQL_INSTANCE_ID=ethereum-0
export ROOT_PASSWORD=<your_password>
gcloud sql instances create $CLOUD_SQL_INSTANCE_ID --database-version=POSTGRES_11 --root-password=$ROOT_PASSWORD \
    --storage-type=SSD --storage-size=100 --cpu=4 --memory=6 \
    --database-flags=temp_file_limit=2147483647

Notice the storage size is set to 100 GB. It will scale up automatically to around 1.5 TB when we load in the data.

  • Add Cloud SQL service account to GCS bucket as objectViewer. Run gcloud sql instances describe $CLOUD_SQL_INSTANCE_ID, then copy serviceAccountEmailAddress from the output and add it to the bucket.

  • Create the database and the tables:

gcloud sql databases create ethereum --instance=$CLOUD_SQL_INSTANCE_ID

# Install Cloud SQL Proxy following the instructions here https://cloud.google.com/sql/docs/mysql/sql-proxy#install
./cloud_sql_proxy -instances=myProject:us-central1:${CLOUD_SQL_INSTANCE_ID}=tcp:5433

cat schema/*.sql | psql -U postgres -d ethereum -h 127.0.0.1  --port 5433 -a
  • Run import from GCS to Cloud SQL:
bash ethereum_gcs_to_cloud_sql.sh $BUCKET $CLOUD_SQL_INSTANCE_ID

Importing to Cloud SQL is going to take between 12 and 24 hours.

A few performance optimization tips for initial loading of the data:

3. Apply indexes to the tables

NOTE: indexes won't work for the contracts table due to the issue described here blockchain-etl#11 (comment)

  • Run:
cat indexes/*.sql | psql -U postgres -d ethereum -h 127.0.0.1  --port 5433 -a

Creating indexes is going to take between 12 and 24 hours. Depending on the queries you're going to run you may need to create more indexes or partition the tables.

Cloud SQL instance will cost you between $200 and $500 per month depending on whether you use HDD or SSD and on the machine type.

4. Streaming

Use ethereumetl stream command to continually pull data from an Ethereum node and insert it to Postgres tables: https://github.com/blockchain-etl/ethereum-etl/tree/develop/docs/commands.md#stream.

Follow the instructions here to deploy it to Kubernetes: https://github.com/blockchain-etl/blockchain-etl-streaming.

ethereum-etl-postgres's People

Contributors

medvedev1088 avatar bsh98 avatar evgenynansenai avatar 121812 avatar kunalmodi avatar msilb7 avatar surajsjain avatar govindasamy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.