alephium / explorer-backend Goto Github PK

View Code? Open in Web Editor NEW

7.0 5.0 13.0 4.13 MB

The explorer backend for Alephium protocol

Home Page: https://explorer.alephium.org

License: GNU Lesser General Public License v3.0

Scala 99.90% Shell 0.02% Makefile 0.08%

explorer-backend's Introduction

Alephium explorer backend

Development

1. Install the dependencies

You need to have Postgresql and sbt installed in your system.

2. Create the database

Start the postgresql service.

psql postgres # or `psql -U postgres` depending on your OS

Ensure that the postgres role exists, and if not, create it. List all roles:
```
postgres=# \du
```
Create postgres role:
```
postgres=# CREATE ROLE postgres WITH LOGIN;
```
Then, create the database:
```
postgres=# CREATE DATABASE explorer;
```

3. Start the server

sbt app/run

4. Single Jar

sbt app/assembly

The resulting assembly file will appear in app/target/scala-2.13/ directory.

5. Restore archived database

Syncing all data from scratch can take a while, you can choose to start from a snapshot instead. Snapshots are available at https://archives.alephium.org/ Download the explore-db dump you want and simply run:

psql database_name < dump_file

Please note that database_name must have been created before, see point 2 on how to do it.

Querying hashes

Hash strings are stored as bytea. To query a hash string in SQL use the Postgres function decode which converts it to bytea.

select *
from "utransactions"
where "hash" = decode('f25f43b7fb13b1ec5f1a2d3acd1bebb9d27143cdc4586725162b9d88301b9bd7', 'hex');

6. Configuration

There are two ways to configure the application:

Environment variables

Every value in application.conf file can be overridden by an environment variable.

export BLOCKFLOW_NETWORK_ID = 1
export DB_NAME = "testnet"

Using `user.conf` file

The same way it's done in our full node, you can override the application.conf file by creating a user.conf file in the EXPLORER_HOME folder, which default to ~/.alephium-explorer-backend.

alephium.blockflow.network-id = 1
db.db.name = "testnet"

Benchmark

1. Create benchmark database

The benchmark database (set via dbName) should exist:

CREATE DATABASE benchmarks;

2. Set benchmark duration

Update the time value in the following annotation in DBBenchmark to set the benchmark run duration:

@Measurement(iterations = 1, time = 1, timeUnit = TimeUnit.MINUTES)

3. Executing benchmarks

Execute the following sbt commands to run JMH benchmarks

sbt benchmark/jmh:run

Testing

The tests are using the Postgresql database and the default postgres table.

sbt test

Scaladoc

To generate scala-doc run: sbt unidoc

Referencing external libraries in scala-docs

To reference external libraries in scala-docs make sure the library is recognised by adding an apiMapping.

See scalaDocsAPIMapping in build.sbt file as a reference for creating this apiMapping for an external library.

def myLibraryAPIMapping(classPath: Classpath, scalaVersion: String): (sbt.File, sbt.URL) =
  ??? //follow `scalaDocsAPIMapping` in build.sbt

//add the apiMapping to the project that depends on `myLibrary`
apiMappings ++=
  Map(
    myLibraryAPIMapping(
      classPath = (Compile / fullClasspath).value,
      scalaVersion = scalaVersion.value
    )
  )

Node Customization

The steps below are for developers who want to reference a full node on another computer, such as a Raspberry Pi, that is on the same subnet.

Explorer: `/app/src/main/resources/application.conf`

blockflow {
    host = "full-node-ip-address"
    port = 12973

    direct-clique-access = false
    direct-clique-access = ${?BLOCKFLOW_DIRECT_CLIQUE_ACCESS}

    network-id = 0
    network-id = ${?BLOCKFLOW_NETWORK_ID}
    groupNum = 4
    api-key = "full-node-api-key"
}

Full Node: `user.conf`

alephium.api.api-key = "full-node-api-key"
alephium.api.network-interface = "0.0.0.0"

alephium.network.bind-address  = "0.0.0.0:9973"
alephium.network.internal-address  = "full-node-ip-address:9973"
alephium.network.coordinator-address  = "full-node-ip-address:9973"

explorer-backend's People

Contributors

Stargazers

Watchers

Forkers

developeralgo8888 eeysirhc sven-hash lbqds lwandsyj aliciasweten painoscomigalan korgamiclo katyapopo colegga manipulation44 tdroxler minijudie

explorer-backend's Issues

Endpoint to get paginated mempool txs

Add txCount to block_headers table

`select count(*) ...` queries on partial indexes should be constant

Overview

Count queries like the following on indexes that fully satisfy the where clause should be constant time.

Query

select count(*) from block_headers where main_chain = true;

Index

create index blocks_main_chain_idx on block_headers (main_chain) where main_chain = true;

Issue

The query is not constant and is doing aggregate scan.

Replace `Seq` with `IndexedSeq`

It seems slicks alows us to have IndexedSeq and not only Seq 🎉

Indexes that use `main_chain` should be partial indexes on `main_chain = true`

Overview

Majority of the queries are run on main_chain = true.

Task

Tables that only query main_chain when true should have partial indexes on main_chain = true.

Something to maybe look into

These tables can also be split into two TABLE_MAIN_CHAIN & TABLE which removes the need to have an index.

Raw SQL `BlockDAO.get`

Parent task: #110

It accesses Tables block_headers, block_deps, transactions, inputs & outputs over multiple queries.

Metrics

Explorer backend should report prometheus-compatible metrics.

Using a different port than the API might be good, so the metrics would not be publicly exposed.

Main metrics I'm looking for

Requests counter, also by status code
Requests time
DB queries counter
DB queries time, also by family (SELECT, UPDATE, INSERT, ...)

Preventing SQL injection attack

Since we are writing raw SQL queries it kinda tempting to just write & compose plain String because Slick doesn't offer a way to compose fragmented SQLs.

But to prevent SQL injection attacks we need to use query parameters.

Is SQL injection a concern?
The follow two are limitations of Slick when writing plain SQL that allow for SQL injection.
Do we want to look into Slick's internal types to see how we can extend or use it to avoid SQL injection?

Insert

Slick's documentation here shows its way to perform multiple inserts which generates an insert statement for each row. Which also means Postgres would parse and compiler each query.

LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'Colombian', $2 = '101', $3 = '7'
LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'French_Roast', $2 = '49', $3 = '8'
LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'Espresso', $2 = '150', $3 = '9'
LOG:  execute <unnamed>: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'Colombian_Decaf', $2 = '101', $3 = '8'
LOG:  execute S_1: insert into coffees values ($1, $2, $3)
DETAIL:  parameters: $1 = 'French_Roast_Decaf', $2 = '49', $3 = '9'

Instead of

insert into coffees values ($1, $2, $3), 
                           ($4 $5, $6),
                           ($7 $8, $9),
                           ....

Building the above single insert via plain String allows for SQL injection.

Composing SQL fragments

Composing two SQL fragments is allowed through splicing which means one of them has to be a String. But if that String query has parameters we allow for SQL injections.

Composing fragments is something we need like here.

Inline `GroupIndex` and `Height`

These types are wrappers around primitive type (Int) and can be inlined.

Convert `BlockDAO.listMainChain` to raw SQL

Task

Convert and merge typed Slick queries performed by BlockDAO.listMainChain into singular raw SQL.

Parent task #110
Dependant task #123

Progress

Initial benchmark

Typed SQL - 26.79 seconds
Raw SQL - 21.28 seconds

Benchmark and test different indexes for hashrates table

As discussed here: #153 (comment)

Code as doc

Let's prepare a makefile like full node for running the backend. I had issues in creating the database, or running the backend. I managed to solved them, but it's not dev friendly and it costs time.

And try to use the makefile for testing, so we are sure that the makefile (i.e. the doc) is correct always

Stop the backend if database isn't ready

Use partial index on `block_headers` table where `main_chain = true`

Parent task: #120

Target table: block_headers

Checks to perform after update

Ensure that queries to this table are still using INDEX ONLY SCAN
Benchmarks should produce better or similar results

Support ALPH and token holder rank list

Add a read only mode so we can't start without a node.

When developing it will be useful to be able to start the explorer without a node. When the db is already full of data we don't need to sync more depending on the tests.

Remove docker credentials when opening the project

Metrics system

[To Discuss] Do we want to show more info on the Unconfirmed Input?

Use IndexedSeq or ArraySeq or AVector instead of Seq for performance

Fix progression info

As we are now always downloading some past blocks to make sure not to forgot some, the progress bar is now wrong and you might see sometimes : progress 31312351% don't panic 😅

Optimize connection pool configuration

enable prepared statement
try to use prepared statement more
more to be investigated

Add pagination for block page

Unable to track real heights in the log of `explorer-backend`

Database connection created by tests should be closed after each run

Issue

Our test cases do not close connections created after each test run.

We have a total of 70 test cases so it not a problem yet.

Test

Quick test demo the error

(1 to 300) foreach {
  _ =>
    val fixture = new Fixture {}
    import fixture.config.profile.api._
    fixture.run(fixture.blockDepsTable.result).futureValue
}

Errors the following after 214 iterations.

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
[49.957s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.

Update transaction tables to reduce queries dispatched & executed by Postgres

Overview

Update utransactions & transactions tables to use foreign key constraints

To reduce the number of queries required to fetch a single transaction to 1.
To make all inserts transactional.

Issue

To fetch a single transaction 3 queries are dispatched to Postgres.

select "hash", "chain_from", "chain_to", "gas-amount", "gas-price" from "utransactions" where "hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'
select "tx_hash", "hint", "output_ref_key", "unlock_script" from "uinputs" where "tx_hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'
select "tx_hash", "amount", "address", "lock_time" from "uoutputs" where "tx_hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'

Similar occurs for a write.

5 queries in total
Inserts start individual transactions for single request and are not within a single transaction.

insert into "utransactions" ("hash","chain_from","chain_to","gas-amount","gas-price") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', $2 = '32', ...
BEGIN
update "uinputs" set "hint"=$1,"unlock_script"=$2 where "tx_hash"=$3 and "output_ref_key"=$4 ... $3 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444' ...
insert into "uinputs" ("tx_hash","hint","output_ref_key","unlock_script") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444' ...
COMMIT
BEGIN
update "uoutputs" set "amount"=$1,"lock_time"=$2 where "tx_hash"=$3 and "address"=$4 ... $3 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', ...
insert into "uoutputs" ("tx_hash","amount","address","lock_time") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', ...
COMMIT

Options

1. Foreign key constraints

Use foreign key constraints on existing tables so Postgres knows the relationships between the tables, and we can reduce
the number of queries dispatched & executed by Postgres for each read and write to 1.

2. Merge all data into a single `transaction` table

I'm seeing these transaction queries always fetch their relevant inputs and outputs rows. inputs and outputs tables are never queried individually. Merging inputs and outputs tables into the main transaction table as single column or two columns and storing them as un-indexed varchar or blob would increase performance by a large factor over option 1 above because

Postgres maintains a different file for each table & index. Merging them into a single table reduces IOps.
Increases write performance because only one index is maintained for transactions table and none for inputs
and outputs (which are much larger tables with larger indexes).
Increases read performance because only a single index is queried.

We can store them as jsonb if we want indexing. But if indexing is needed option 1 is easier to maintain and jsonb's value size is limited to 268.435455 MB (from StackOverflow question).

Drawback of using this option over option 1 - In-case we want to query inputs and outputs individually, separate tables are much easier to maintain. But if we want performance this option 2 will be noticeably better.

Thoughts?

Add benchmark module

Add benchmark module using jmh
Recreate bytea benchmark created in issue #91

Following comment

Change time range for averaging data charts

Currently data happening at 13:45, are averaged and stored with at timestamp 14:00.

It would simplify the hashrates query to average at 13:00

Batch write when insert a new block into database

Add more tests in `ApiModelJsonSpec`

Check all the Gen.posNum use cases

Support locked balance in backend

would be nice to give that info to the front end

Rename `script_hint` db fields next time we have a data dump.

Improve `TokenSupplyService`

We could probably cache the genesis blocks as we always query the same
unspentTokens function can be very slow with millions of utxos
The syncPeriod should be adaptive, when we know computed all supply, we can wait for 1 all day.

Cache latest blocks and transactions

As latest blocks and transactions are queried more often

Migrate main chain update to raw sql

this code is using heavily the slick dsl.

This issue could be tackle at the same time as #350 or not

Reduce block request interval slightly

Change column-type `varchar` to `bytea` for storing hashes in Postgres

Change the column-type varchar for storing hashes to bytea in Postgres because

Read performance increases by 24%
Storage (table + indexes sizes) reduces by 35%
Write performance is mostly unaffected

These benchmarks were run on the utransactions table

1 million rows. 1 read & write request per row.
Single threaded using the same thread ExecutionContext.

column_type	reads/second	1M reads: total time	1M writes: total time	total space (table_size + index_size)
`varchar`	3557	282 seconds	324 seconds	350 MB (238 MB + 112 MB)
`bytea`	4671	214 seconds	315 seconds	227 MB (146 MB + 81 MB)

Data generated

`varchar` uses 65 bytes for each hash

`bytea` uses 33 bytes for each hash

SQL used to get table and index sizes

SELECT pg_size_pretty(pg_table_size('utransactions'))   as table_size,
       pg_size_pretty(pg_indexes_size('utransactions')) as index_size;

Sample code used in benchmarks

val data =
  new UnconfirmedTxEntity(
    hash = new Transaction.Hash(Hash.generate),
    chainFrom = GroupIndex.unsafe(0),
    chainTo = GroupIndex.unsafe(0),
    gasAmount = 0,
    gasPrice = U256.unsafe(0)
  )

config.db.run(unconfirmedTxsTable.insertOrUpdate(data)) //1 write request
config.db.run(unconfirmedTxsTable.filter(_.hash === data.hash).result) //1 read request

Benefit of `varchar` over `bytea`

With varchar we could use full-text search indexing that allows for partial hash searches which I think we don't need?

Changes to test-cases

Test-cases use In-memory h2 database that tries to store it as a blob which is not indexable.

org.h2.jdbc.JdbcSQLFeatureNotSupportedException: Feature not supported: "Index on BLOB or CLOB column: ""hash"" BLOB NOT NULL"; SQL statement:
create table "block_headers" ("hash" BLOB NOT NULL PRIMARY KEY,"timestamp" BIGINT NOT NULL,"chain_from" INTEGER NOT NULL,"chain_to" INTEGER NOT NULL,"height" INTEGER NOT NULL,"main_chain" BOOLEAN NOT NULL) [50100-200]

Slick has a few other supported databases that we can use for test-cases, I know Derby supports in-memory, I can try that out or just use Postgres? Tests might be slower with Postgres but at least they will be accurate.

Ok to make this change?

Introduce a table for address to speed up address queries

The goal is to speed up the address query for explorer. The address query is too slow for addresses with many txs. We'd better create a table to speed it up so we don't need to use JOIN to get all of the tx list.

Maybe (address, tx_hash, timestamp, main_chain) is enough

Configure Port with environment variable

Endpoint for circulated token supply (consider time locks)

Clean all `DatabaseConfig[JDBCProfile]`

as started here 27ace1d and here 1b8effb

Using

import slick.jdbc.PostgresProfile.api._

We can now get rid of our traits and use simple object.

Test & benchmark auto-incrementing `Int` or `Long` IDs for blocks & transactions tables

Task

Test & benchmark if there is any performance benefit in using auto-incrementing & unique Int or Long based IDs (along with Hash Ids) to use in queries/joins for tables that have Hash IDs like block_header, transactions & utransactions.

Why?

Byte arrays generated by Hash strings are very random. This randomness means they are less likely to have common prefixes & suffixes and so are less likely to build better indexes than an Integer or Long where duplicate prefixes are very common and can be eliminated during indexing, reducing the index size - Theoretically! Unless Postgres has some magic for better Hash indexing.

Currently hash strings require 32-33 bytes whereas an Int would only be 4 bytes and Long would only be 8 bytes. Small sized data would also mean smaller indexes therefore faster queries & joins, lower read-IOps, lower cache & lower CPU usage.

Change

Every table that has a Hash ID, for example block_headers would store an auto-incrementing & unique Int/Long for each block_hash that other referencing tables like transactions would store as a reference for each block that can be used in queries like joins.

Similarly transactions would generate an auto-incrementing Int/Long which gets used as a reference by input & output tables.

This does not necessarily mean removal of hash columns & indexing in referencing tables.

Drawback

Using hash as the only ID makes things easier to maintain. Adding another Int or Long ID adds some maintenance cost.

Maybe the added cost is ok if the benchmark results in significant performance benefit.

Rename `order` table fields

order is a sql key-word, we need to make sore we don't use it. In few place we also use index...which is also a key word ^^ let's use something meaningful: tx_order etc

Pending transactions dont show all outputs

when there are several time the same address in the outputs, with different lock-time, only one output appears.

Removal of typed query `listMainChain`

Task for PR comment.

PR #131 replaces listMainChain with listMainChainSQL without removing listMainChain so it could be used for benchmarking and a test.

Since the PR is merged. Code related to listMainChain, benchmarks and that one test can be removed.

Block queries are not aggregated like transaction queries

Add `amount` in `SweepAddressTransaction`

Would it be possible to add the amount in SweepAddressTransaction, so that the wallet can display the amount of the pending sweep transactions? At the moment there is no way to know how much amount each “consolidation” (aka “sweep”) transaction includes.

Check parent block for new blocks

Optimise Block queries

Issue

Slow block query - Response time 4-6 seconds.
Dispatching 10 in-browser (Chrome) requests times-out after 20 seconds with the following error.

Work in progress

Replicating issue locally - Test data generated

Following the total blocks number show here - 1271466

Created 2 million block_headers rows with 10 transactions per block_header (nearly 20 million transaction
entries)

Query	Count
count(*) where `block_headers`.main_chain = true	1012415
count(*) where `block_headers`.main_chain = false	1011095
count(*) `block_headers`	2023510
count(*) `transactions`	17578926

Test ran

Performance test 1 - On existing schema
Performance test 2 - With new my_tuple3_index for block_header queries.
- create index my_tuple3_index on block_headers (main_chain, timestamp desc, hash)

Benchmark locally

Via Scala code

The above block query (blocks?page=52163&reverse=true) was run via a ScalaTest that returned the following times

Test type	Thread 1	Thread 2	Thread 3	Thread 4	Thread 5
Sequential (Not concurrent - existing schema)	1.14 sec	0.86 sec	0.91 sec	0.97 sec	0.95 sec
Concurrent (existing schema)	2.93 sec	2.92 sec	2.94 sec	3.27 sec	3.92 sec
Sequential (with `my_tuple3_index`)	1.92 sec	1.66 sec	1.74 sec	1.64 sec	1.66 sec
Concurrent (with `my_tuple3_index`)	4.65 sec	4.65 sec	4.65 sec	4.70 sec	4.81 sec

Table summary

Sequential/single threaded reads to Postgres' are reasonably performant considering the size of data
Increasing the number of concurrent threads worsens the performance per Thread
- 5 threads required 2-4 seconds.
- 10 threads required 4-6 seconds.
- 20 threads required 6-14 seconds
Using the custom index my_tuple3_index worsens the performance even further.
- I've tried multiple index combinations and Postgres query planner just ignores it or does not do as well.

Via Postgres query analyser

Running the above queries via Postgres query planner returns the following performance.

Query planner	Planning Time	Execution Time
No schema changes	0 sec	0.81 sec
With `my_tuple3_index`	0 sec	1.65 sec

Table summary

Existing schema is reasonably performant
my_tuple3_index is twice as slow

alephium / explorer-backend Goto Github PK

explorer-backend's Introduction

Alephium explorer backend

Development

1. Install the dependencies

2. Create the database

3. Start the server

4. Single Jar

5. Restore archived database

Querying hashes

6. Configuration

Environment variables

Using user.conf file

Benchmark

1. Create benchmark database

2. Set benchmark duration

3. Executing benchmarks

Testing

Scaladoc

Referencing external libraries in scala-docs

Node Customization

Explorer: /app/src/main/resources/application.conf

Full Node: user.conf

explorer-backend's People

Contributors

Stargazers

Watchers

Forkers

explorer-backend's Issues

Overview

Query

Index

Issue

Overview

Task

Something to maybe look into

Insert

Composing SQL fragments

Task

Progress

Initial benchmark

Checks to perform after update

Issue

Test

Overview

Issue

Options

1. Foreign key constraints

2. Merge all data into a single transaction table

Data generated

varchar uses 65 bytes for each hash

bytea uses 33 bytes for each hash

SQL used to get table and index sizes

Sample code used in benchmarks

Benefit of varchar over bytea

Changes to test-cases

Task

Why?

Change

Drawback

Issue

Work in progress

Replicating issue locally - Test data generated

Test ran

Benchmark locally

Via Scala code

Via Postgres query analyser

Recommend Projects

Recommend Topics

Recommend Org

Using `user.conf` file

Explorer: `/app/src/main/resources/application.conf`

Full Node: `user.conf`

2. Merge all data into a single `transaction` table

`varchar` uses 65 bytes for each hash

`bytea` uses 33 bytes for each hash

Benefit of `varchar` over `bytea`