Giter Club home page Giter Club logo

explorer-backend's Introduction

Alephium explorer backend

Development

1. Install the dependencies

You need to have Postgresql and sbt installed in your system.

2. Create the database

  1. Start the postgresql service.
  2. Login to the PostgreSQL shell with the default postgres user:
    psql postgres # or `psql -U postgres` depending on your OS
  3. Ensure that the postgres role exists, and if not, create it. List all roles:
    postgres=# \du
    Create postgres role:
    postgres=# CREATE ROLE postgres WITH LOGIN;
  4. Then, create the database:
    postgres=# CREATE DATABASE explorer;

3. Start the server

sbt app/run

4. Single Jar

sbt app/assembly

The resulting assembly file will appear in app/target/scala-2.13/ directory.

5. Restore archived database

Syncing all data from scratch can take a while, you can choose to start from a snapshot instead. Snapshots are available at https://archives.alephium.org/ Download the explore-db dump you want and simply run:

psql database_name < dump_file

Please note that database_name must have been created before, see point 2 on how to do it.

Querying hashes

Hash strings are stored as bytea. To query a hash string in SQL use the Postgres function decode which converts it to bytea.

select *
from "utransactions"
where "hash" = decode('f25f43b7fb13b1ec5f1a2d3acd1bebb9d27143cdc4586725162b9d88301b9bd7', 'hex');

6. Configuration

There are two ways to configure the application:

Environment variables

Every value in application.conf file can be overridden by an environment variable.

export BLOCKFLOW_NETWORK_ID = 1
export DB_NAME = "testnet"

Using user.conf file

The same way it's done in our full node, you can override the application.conf file by creating a user.conf file in the EXPLORER_HOME folder, which default to ~/.alephium-explorer-backend.

alephium.blockflow.network-id = 1
db.db.name = "testnet"

Benchmark

1. Create benchmark database

The benchmark database (set via dbName) should exist:

CREATE DATABASE benchmarks;

2. Set benchmark duration

Update the time value in the following annotation in DBBenchmark to set the benchmark run duration:

@Measurement(iterations = 1, time = 1, timeUnit = TimeUnit.MINUTES)

3. Executing benchmarks

Execute the following sbt commands to run JMH benchmarks

sbt benchmark/jmh:run

Testing

The tests are using the Postgresql database and the default postgres table.

sbt test

Scaladoc

To generate scala-doc run: sbt unidoc

Referencing external libraries in scala-docs

To reference external libraries in scala-docs make sure the library is recognised by adding an apiMapping.

See scalaDocsAPIMapping in build.sbt file as a reference for creating this apiMapping for an external library.

def myLibraryAPIMapping(classPath: Classpath, scalaVersion: String): (sbt.File, sbt.URL) =
  ??? //follow `scalaDocsAPIMapping` in build.sbt

//add the apiMapping to the project that depends on `myLibrary`
apiMappings ++=
  Map(
    myLibraryAPIMapping(
      classPath = (Compile / fullClasspath).value,
      scalaVersion = scalaVersion.value
    )
  )

Node Customization

The steps below are for developers who want to reference a full node on another computer, such as a Raspberry Pi, that is on the same subnet.

Explorer: /app/src/main/resources/application.conf

blockflow {
    host = "full-node-ip-address"
    port = 12973

    direct-clique-access = false
    direct-clique-access = ${?BLOCKFLOW_DIRECT_CLIQUE_ACCESS}

    network-id = 0
    network-id = ${?BLOCKFLOW_NETWORK_ID}
    groupNum = 4
    api-key = "full-node-api-key"
}

Full Node: user.conf

alephium.api.api-key = "full-node-api-key"
alephium.api.network-interface = "0.0.0.0"

alephium.network.bind-address  = "0.0.0.0:9973"
alephium.network.internal-address  = "full-node-ip-address:9973"
alephium.network.coordinator-address  = "full-node-ip-address:9973"

explorer-backend's People

Contributors

aloiscochard avatar arbaba avatar eeysirhc avatar h0ngcha0 avatar killerwhile avatar lbqds avatar nop33 avatar polarker avatar simerplaha avatar sven-hash avatar tdroxler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

explorer-backend's Issues

`select count(*) ...` queries on partial indexes should be constant

Overview

Count queries like the following on indexes that fully satisfy the where clause should be constant time.

Query

select count(*) from block_headers where main_chain = true;

Index

create index blocks_main_chain_idx on block_headers (main_chain) where main_chain = true;

Issue

The query is not constant and is doing aggregate scan.

Raw SQL `BlockDAO.get`

Parent task: #110

It accesses Tables block_headers, block_deps, transactions, inputs & outputs over multiple queries.

Metrics

Explorer backend should report prometheus-compatible metrics.

Using a different port than the API might be good, so the metrics would not be publicly exposed.

Main metrics I'm looking for

  • Requests counter, also by status code
  • Requests time
  • DB queries counter
  • DB queries time, also by family (SELECT, UPDATE, INSERT, ...)

Preventing SQL injection attack

Since we are writing raw SQL queries it kinda tempting to just write & compose plain String because Slick doesn't offer a way to compose fragmented SQLs.

But to prevent SQL injection attacks we need to use query parameters.

  • Is SQL injection a concern?
  • The follow two are limitations of Slick when writing plain SQL that allow for SQL injection.
  • Do we want to look into Slick's internal types to see how we can extend or use it to avoid SQL injection?

Insert

Slick's documentation here shows its way to perform multiple inserts which generates an insert statement for each row. Which also means Postgres would parse and compiler each query.

LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'Colombian', $2 = '101', $3 = '7'
LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'French_Roast', $2 = '49', $3 = '8'
LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'Espresso', $2 = '150', $3 = '9'
LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'Colombian_Decaf', $2 = '101', $3 = '8'
LOG:  execute S_1: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'French_Roast_Decaf', $2 = '49', $3 = '9'

Instead of

insert into coffees values ($1, $2, $3), 
                           ($4 $5, $6),
                           ($7 $8, $9),
                           .... 

Building the above single insert via plain String allows for SQL injection.

Composing SQL fragments

Composing two SQL fragments is allowed through splicing which means one of them has to be a String. But if that String query has parameters we allow for SQL injections.

Composing fragments is something we need like here.

Convert `BlockDAO.listMainChain` to raw SQL

Task

Convert and merge typed Slick queries performed by BlockDAO.listMainChain into singular raw SQL.

Parent task #110
Dependant task #123

Progress

Initial benchmark

Typed SQL - 26.79 seconds
Raw SQL - 21.28 seconds

Code as doc

Let's prepare a makefile like full node for running the backend. I had issues in creating the database, or running the backend. I managed to solved them, but it's not dev friendly and it costs time.

And try to use the makefile for testing, so we are sure that the makefile (i.e. the doc) is correct always

Fix progression info

As we are now always downloading some past blocks to make sure not to forgot some, the progress bar is now wrong and you might see sometimes : progress 31312351% don't panic 😅

Database connection created by tests should be closed after each run

Issue

Our test cases do not close connections created after each test run.

We have a total of 70 test cases so it not a problem yet.

Test

Quick test demo the error

(1 to 300) foreach {
  _ =>
    val fixture = new Fixture {}
    import fixture.config.profile.api._
    fixture.run(fixture.blockDepsTable.result).futureValue
}

Errors the following after 214 iterations.

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
[49.957s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.

Update transaction tables to reduce queries dispatched & executed by Postgres

Overview

Update utransactions & transactions tables to use foreign key constraints

  • To reduce the number of queries required to fetch a single transaction to 1.
  • To make all inserts transactional.

Issue

To fetch a single transaction 3 queries are dispatched to Postgres.

select "hash", "chain_from", "chain_to", "gas-amount", "gas-price" from "utransactions" where "hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'
select "tx_hash", "hint", "output_ref_key", "unlock_script" from "uinputs" where "tx_hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'
select "tx_hash", "amount", "address", "lock_time" from "uoutputs" where "tx_hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'

Similar occurs for a write.

  • 5 queries in total
  • Inserts start individual transactions for single request and are not within a single transaction.
insert into "utransactions" ("hash","chain_from","chain_to","gas-amount","gas-price") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', $2 = '32', ...
BEGIN
update "uinputs" set "hint"=$1,"unlock_script"=$2 where "tx_hash"=$3 and "output_ref_key"=$4 ... $3 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444' ...
insert into "uinputs" ("tx_hash","hint","output_ref_key","unlock_script") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444' ...
COMMIT
BEGIN
update "uoutputs" set "amount"=$1,"lock_time"=$2 where "tx_hash"=$3 and "address"=$4 ... $3 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', ...
insert into "uoutputs" ("tx_hash","amount","address","lock_time") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', ...
COMMIT

Options

1. Foreign key constraints

Use foreign key constraints on existing tables so Postgres knows the relationships between the tables, and we can reduce
the number of queries dispatched & executed by Postgres for each read and write to 1.

2. Merge all data into a single transaction table

I'm seeing these transaction queries always fetch their relevant inputs and outputs rows. inputs and outputs tables are never queried individually. Merging inputs and outputs tables into the main transaction table as single column or two columns and storing them as un-indexed varchar or blob would increase performance by a large factor over option 1 above because

  • Postgres maintains a different file for each table & index. Merging them into a single table reduces IOps.
  • Increases write performance because only one index is maintained for transactions table and none for inputs
    and outputs (which are much larger tables with larger indexes).
  • Increases read performance because only a single index is queried.

We can store them as jsonb if we want indexing. But if indexing is needed option 1 is easier to maintain and jsonb's value size is limited to 268.435455 MB (from StackOverflow question).

Drawback of using this option over option 1 - In-case we want to query inputs and outputs individually, separate tables are much easier to maintain. But if we want performance this option 2 will be noticeably better.

Thoughts?

Improve `TokenSupplyService`

  • We could probably cache the genesis blocks as we always query the same
  • unspentTokens function can be very slow with millions of utxos
  • The syncPeriod should be adaptive, when we know computed all supply, we can wait for 1 all day.

Change column-type `varchar` to `bytea` for storing hashes in Postgres

Change the column-type varchar for storing hashes to bytea in Postgres because

  • Read performance increases by 24%
  • Storage (table + indexes sizes) reduces by 35%
  • Write performance is mostly unaffected

These benchmarks were run on the utransactions table

  • 1 million rows. 1 read & write request per row.
  • Single threaded using the same thread ExecutionContext.
column_type reads/second 1M reads: total time 1M writes: total time total space (table_size + index_size)
varchar 3557 282 seconds 324 seconds 350 MB (238 MB + 112 MB)
bytea 4671 214 seconds 315 seconds 227 MB (146 MB + 81 MB)

Data generated

varchar uses 65 bytes for each hash

image

bytea uses 33 bytes for each hash

image

SQL used to get table and index sizes

SELECT pg_size_pretty(pg_table_size('utransactions'))   as table_size,
       pg_size_pretty(pg_indexes_size('utransactions')) as index_size;

Sample code used in benchmarks

val data =
  new UnconfirmedTxEntity(
    hash = new Transaction.Hash(Hash.generate),
    chainFrom = GroupIndex.unsafe(0),
    chainTo = GroupIndex.unsafe(0),
    gasAmount = 0,
    gasPrice = U256.unsafe(0)
  )

config.db.run(unconfirmedTxsTable.insertOrUpdate(data)) //1 write request
config.db.run(unconfirmedTxsTable.filter(_.hash === data.hash).result) //1 read request

Benefit of varchar over bytea

With varchar we could use full-text search indexing that allows for partial hash searches which I think we don't need?

Changes to test-cases

Test-cases use In-memory h2 database that tries to store it as a blob which is not indexable.

org.h2.jdbc.JdbcSQLFeatureNotSupportedException: Feature not supported: "Index on BLOB or CLOB column: ""hash"" BLOB NOT NULL"; SQL statement:
create table "block_headers" ("hash" BLOB NOT NULL PRIMARY KEY,"timestamp" BIGINT NOT NULL,"chain_from" INTEGER NOT NULL,"chain_to" INTEGER NOT NULL,"height" INTEGER NOT NULL,"main_chain" BOOLEAN NOT NULL) [50100-200]

Slick has a few other supported databases that we can use for test-cases, I know Derby supports in-memory, I can try that out or just use Postgres? Tests might be slower with Postgres but at least they will be accurate.

Ok to make this change?

Introduce a table for address to speed up address queries

The goal is to speed up the address query for explorer. The address query is too slow for addresses with many txs. We'd better create a table to speed it up so we don't need to use JOIN to get all of the tx list.

Maybe (address, tx_hash, timestamp, main_chain) is enough

Test & benchmark auto-incrementing `Int` or `Long` IDs for blocks & transactions tables

Task

Test & benchmark if there is any performance benefit in using auto-incrementing & unique Int or Long based IDs (along with Hash Ids) to use in queries/joins for tables that have Hash IDs like block_header, transactions & utransactions.

Why?

Byte arrays generated by Hash strings are very random. This randomness means they are less likely to have common prefixes & suffixes and so are less likely to build better indexes than an Integer or Long where duplicate prefixes are very common and can be eliminated during indexing, reducing the index size - Theoretically! Unless Postgres has some magic for better Hash indexing.

Currently hash strings require 32-33 bytes whereas an Int would only be 4 bytes and Long would only be 8 bytes. Small sized data would also mean smaller indexes therefore faster queries & joins, lower read-IOps, lower cache & lower CPU usage.

Change

Every table that has a Hash ID, for example block_headers would store an auto-incrementing & unique Int/Long for each block_hash that other referencing tables like transactions would store as a reference for each block that can be used in queries like joins.

Similarly transactions would generate an auto-incrementing Int/Long which gets used as a reference by input & output tables.

This does not necessarily mean removal of hash columns & indexing in referencing tables.

Drawback

Using hash as the only ID makes things easier to maintain. Adding another Int or Long ID adds some maintenance cost.

Maybe the added cost is ok if the benchmark results in significant performance benefit.

Rename `order` table fields

order is a sql key-word, we need to make sore we don't use it. In few place we also use index...which is also a key word ^^ let's use something meaningful: tx_order etc

Add `amount` in `SweepAddressTransaction`

Would it be possible to add the amount in SweepAddressTransaction, so that the wallet can display the amount of the pending sweep transactions? At the moment there is no way to know how much amount each “consolidation” (aka “sweep”) transaction includes.

Optimise Block queries

Issue

  • Slow block query - Response time 4-6 seconds.
  • Dispatching 10 in-browser (Chrome) requests times-out after 20 seconds with the following error.

image

Work in progress

Replicating issue locally - Test data generated

Following the total blocks number show here - 1271466

image

Created 2 million block_headers rows with 10 transactions per block_header (nearly 20 million transaction
entries)

Query Count
count(*) where block_headers.main_chain = true 1012415
count(*) where block_headers.main_chain = false 1011095
count(*) block_headers 2023510
count(*) transactions 17578926

Test ran

  • Performance test 1 - On existing schema
  • Performance test 2 - With new my_tuple3_index for block_header queries.
    • create index my_tuple3_index on block_headers (main_chain, timestamp desc, hash)

Benchmark locally

Via Scala code

The above block query (blocks?page=52163&reverse=true) was run via a ScalaTest that returned the following times

Test type Thread 1 Thread 2 Thread 3 Thread 4 Thread 5
Sequential (Not concurrent - existing schema) 1.14 sec 0.86 sec 0.91 sec 0.97 sec 0.95 sec
Concurrent (existing schema) 2.93 sec 2.92 sec 2.94 sec 3.27 sec 3.92 sec
Sequential (with my_tuple3_index) 1.92 sec 1.66 sec 1.74 sec 1.64 sec 1.66 sec
Concurrent (with my_tuple3_index) 4.65 sec 4.65 sec 4.65 sec 4.70 sec 4.81 sec

Table summary

  • Sequential/single threaded reads to Postgres' are reasonably performant considering the size of data
  • Increasing the number of concurrent threads worsens the performance per Thread
    • 5 threads required 2-4 seconds.
    • 10 threads required 4-6 seconds.
    • 20 threads required 6-14 seconds
  • Using the custom index my_tuple3_index worsens the performance even further.
    • I've tried multiple index combinations and Postgres query planner just ignores it or does not do as well.

Via Postgres query analyser

Running the above queries via Postgres query planner returns the following performance.

Query planner Planning Time Execution Time
No schema changes 0 sec 0.81 sec
With my_tuple3_index 0 sec 1.65 sec

Table summary

  • Existing schema is reasonably performant
  • my_tuple3_index is twice as slow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.