near / near-analytics Goto Github PK

View Code? Open in Web Editor NEW

21.0 6.0 10.0 105 KB

Python 99.69% Dockerfile 0.31%

near-analytics's Introduction

near-analytics

Analytics Tool for NEAR Blockchain.
NEAR Explorer uses it for mainnet and testnet.

Raw data is available for your needs.

testnet credentials: postgres://public_readonly:[email protected]/indexer_analytics_testnet
mainnet credentials: postgres://public_readonly:[email protected]/indexer_analytics_mainnet

Keep in mind that the data (both format and the contents) could be changed at any time, the tool is under development.

Install

sudo apt install python3.9-distutils libpq-dev python3.9-dev postgresql-server-dev-all

python3.9 -m pip install --upgrade pip
python3.9 -m pip install -r requirements.txt

Run

python3.9 main.py -h

Usage examples

Apart from NEAR Explorer, see nice blogpost with the tutorial and beautiful pictures based on NEAR Analytics data.

near-analytics's People

Contributors

Stargazers

Watchers

Forkers

dcibruno token-terminal mike-weinberg neil-oliver sudiptobhukta lipuu primelabcore nguquen

near-analytics's Issues

Indices on daily_ingoing_transactions_per_account_count and daily_outgoing_transactions_per_account_count tables

Consider adding indices to efficiently query in/out-going transactions for a given account ID.

https://github.com/near/near-analytics/blob/main/aggregations/db_tables/daily_outgoing_transactions_per_account_count.py

https://github.com/near/near-analytics/blob/main/aggregations/db_tables/daily_ingoing_transactions_per_account_count.py

indexer_analytics_mainnet=> EXPLAIN ANALYZE SELECT
         SUM(outgoing_transactions_count) AS out_transactions_count
         FROM daily_outgoing_transactions_per_account_count
         WHERE account_id = 'frol.near';
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=38190.41..38190.42 rows=1 width=32) (actual time=1740.131..1744.408 rows=1 loops=1)
   ->  Gather  (cost=38190.19..38190.40 rows=2 width=32) (actual time=1736.723..1744.391 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=37190.19..37190.20 rows=1 width=32) (actual time=1731.115..1731.116 rows=1 loops=3)
               ->  Parallel Seq Scan on daily_outgoing_transactions_per_account_count  (cost=0.00..37190.17 rows=7 width=8) (actual time=150.107..1730.885 rows=11 loops=3)
                     Filter: (account_id = 'frol.near'::text)
                     Rows Removed by Filter: 944698
 Planning Time: 0.177 ms
 Execution Time: 1744.478 ms

Make reconnecting & retry logic

Started computing daily_deposit_amount for 2021-10-02
terminating connection due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL connection has been closed unexpectedly

Started computing daily_gas_used for 2021-10-02
connection already closed

Installation instructions not applicable to Mac OS environments

apt is not available on Mac OS and brew will need to be used instead. Current listed packages are not available on brew.

Rewrite `deployed_contracts` implementation

https://near.zulipchat.com/#narrow/stream/295306-nearinc-contract-runtime

We need to

get all existing contracts
split them into groups by uniqueness of their source code
compute how many invocations do we have for each group. How many groups do we have with at least one invocation

(1) should be easily collected by a small modification of deployed_contracts table
(2) could be done on python side
Info for (3) is in daily_receipts_per_contract_count, we can take number of invocation here

Total number of unique contracts is broken

mainnet (there should be 800+ unique contracts):

testnet (there should be 17k+ unique contracts):

Do we have correct formula for computing `daily_tokens_spent_on_fees`?

We have a new metric, daily_tokens_spent_on_fees.

It's calculated by the following SQL

SELECT SUM(chunks.gas_used * blocks.gas_price)
FROM blocks JOIN chunks ON ...
WHERE blocks.block_timestamp ...

I'm wondering whether it's a fully correct formula. Maybe we should take gas_used[i] * gas_price[i - 1], or gas_used[i] * gas_price[i + 1].
Now it's not a burning issue since gas_price is constant, but it could change in any moment, so it's better to clarify this.

@Longarithm could you please check this?

Write a guide for the contributors

Daily gas fee in NEAR stopped getting computed a while ago

~~@telezhnaya Could you take a look there?~~

Create more stable solution if the computation of statistics failed

Now, the cron job runs each day.
It tries to collect all the data several times and it could fail.
If it fails, I have the notification about it and I have to go and investigate it manually.
Our SRE team silently revoked my rights some time ago by mistake, and I haven't received any notifications for some time (maybe half a year).
Some of our tables fully depends on the consistency of the data in the other tables. If the data is missing, the new computed data is wrong.

So, I'm no longer sure if the data is correct.

Now, if something goes wrong and noone investigated that, next day we will just compute next day statistics, and therefore we fill the tables with potentially broken data.

I want to create the separate table where I put the last successful datetime of each analytics table we have.
Each day we should at first read these values and try to fill again the missing parts. We also should not even try to fill the dependant tables if we can't compute all the data they need.

After that, we should clean all the data we have and re-compute it again to be sure that everything is consistent.

guildnet is failed to collect the statistics

I see errors each day last 3-4 months but I can't find time to investigate this

Fill all periodic tables with zero values if there was not any movements for given period

Missing lines break charts

As a hotfix, I inserted 0 values manually, but it's hacky

Data should be collected from the genesis date for each source

Now, everything is fine for mainnet because mainnet genesis is hardcoded.

Hotfix for testnet DB:
delete from daily_new_unique_contracts_count where collected_for_day < '2021-04-01';

There could be also an issue because of some rows missing both in mainnet and testnet, for example in daily_deleted_accounts_count, I need to investigate that. Had a glance - stats look OK

Get rid of int timestamps everywhere and use datetime.datetime in all places where possible

Set up automatic code validation

@chefsale @mhalambek

I want to run

black .

and check if the sources changed. If something is changed, we shouldn't be able to merge the PR.

Create transaction_fees aggregation under db_tables

Currently daily_gas_used is the closest aggregation under db_tables: https://github.com/near/near-analytics/blob/main/aggregations/db_tables/daily_gas_used.py

In order to get the transaction fees, we would also have to include the gas_price under blocks table.

The preliminary SQL for this would look something like this:

SELECT
    -- 1000000000: nanoseconds in a day
    date_trunc('day', to_timestamp(b.block_timestamp / 1000000000)) as collected_for_day,
    -- 1E24 yoctoNEAR
    NULLIF(SUM(c.gas_used * b.gas_price) / 1E24, 0) as transaction_fees
FROM blocks as b
JOIN chunks as c ON b.block_hash = c.included_in_block_hash
GROUP BY collected_for_day
ORDER BY collected_for_day desc

Open questions:

Would it make sense to make the transaction_fees column a NUMERIC(30,0)?