Giter Club home page Giter Club logo

near-analytics's Introduction

near-analytics

Analytics Tool for NEAR Blockchain.
NEAR Explorer uses it for mainnet and testnet.

Raw data is available for your needs.

  • testnet credentials: postgres://public_readonly:[email protected]/indexer_analytics_testnet
  • mainnet credentials: postgres://public_readonly:[email protected]/indexer_analytics_mainnet

Keep in mind that the data (both format and the contents) could be changed at any time, the tool is under development.

Example of data

Install

sudo apt install python3.9-distutils libpq-dev python3.9-dev postgresql-server-dev-all

python3.9 -m pip install --upgrade pip
python3.9 -m pip install -r requirements.txt

Run

python3.9 main.py -h

Contribute

See Contributing Guide for details

Usage examples

Apart from NEAR Explorer, see nice blogpost with the tutorial and beautiful pictures based on NEAR Analytics data.

near-analytics's People

Contributors

frol avatar jschintz avatar mhalambek avatar neil-oliver avatar rbval avatar telezhnaya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

near-analytics's Issues

Indices on daily_ingoing_transactions_per_account_count and daily_outgoing_transactions_per_account_count tables

Consider adding indices to efficiently query in/out-going transactions for a given account ID.

https://github.com/near/near-analytics/blob/main/aggregations/db_tables/daily_outgoing_transactions_per_account_count.py

https://github.com/near/near-analytics/blob/main/aggregations/db_tables/daily_ingoing_transactions_per_account_count.py

indexer_analytics_mainnet=> EXPLAIN ANALYZE SELECT
         SUM(outgoing_transactions_count) AS out_transactions_count
         FROM daily_outgoing_transactions_per_account_count
         WHERE account_id = 'frol.near';
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=38190.41..38190.42 rows=1 width=32) (actual time=1740.131..1744.408 rows=1 loops=1)
   ->  Gather  (cost=38190.19..38190.40 rows=2 width=32) (actual time=1736.723..1744.391 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=37190.19..37190.20 rows=1 width=32) (actual time=1731.115..1731.116 rows=1 loops=3)
               ->  Parallel Seq Scan on daily_outgoing_transactions_per_account_count  (cost=0.00..37190.17 rows=7 width=8) (actual time=150.107..1730.885 rows=11 loops=3)
                     Filter: (account_id = 'frol.near'::text)
                     Rows Removed by Filter: 944698
 Planning Time: 0.177 ms
 Execution Time: 1744.478 ms

Make reconnecting & retry logic

Started computing daily_deposit_amount for 2021-10-02
terminating connection due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL connection has been closed unexpectedly

Started computing daily_gas_used for 2021-10-02
connection already closed

Rewrite `deployed_contracts` implementation

https://near.zulipchat.com/#narrow/stream/295306-nearinc-contract-runtime

We need to

  1. get all existing contracts
  2. split them into groups by uniqueness of their source code
  3. compute how many invocations do we have for each group. How many groups do we have with at least one invocation

(1) should be easily collected by a small modification of deployed_contracts table
(2) could be done on python side
Info for (3) is in daily_receipts_per_contract_count, we can take number of invocation here

Do we have correct formula for computing `daily_tokens_spent_on_fees`?

We have a new metric, daily_tokens_spent_on_fees.

It's calculated by the following SQL

SELECT SUM(chunks.gas_used * blocks.gas_price)
FROM blocks JOIN chunks ON ...
WHERE blocks.block_timestamp ...

I'm wondering whether it's a fully correct formula. Maybe we should take gas_used<span class="error">[i]</span> * gas_price<span class="error">[i - 1]</span>, or gas_used<span class="error">[i]</span> * gas_price<span class="error">[i + 1]</span>.
Now it's not a burning issue since gas_price is constant, but it could change in any moment, so it's better to clarify this.

@Longarithm could you please check this?

Create more stable solution if the computation of statistics failed

Now, the cron job runs each day.
It tries to collect all the data several times and it could fail.
If it fails, I have the notification about it and I have to go and investigate it manually.
Our SRE team silently revoked my rights some time ago by mistake, and I haven't received any notifications for some time (maybe half a year).
Some of our tables fully depends on the consistency of the data in the other tables. If the data is missing, the new computed data is wrong.

So, I'm no longer sure if the data is correct.

Now, if something goes wrong and noone investigated that, next day we will just compute next day statistics, and therefore we fill the tables with potentially broken data.

I want to create the separate table where I put the last successful datetime of each analytics table we have.
Each day we should at first read these values and try to fill again the missing parts. We also should not even try to fill the dependant tables if we can't compute all the data they need.

After that, we should clean all the data we have and re-compute it again to be sure that everything is consistent.

Data should be collected from the genesis date for each source

Now, everything is fine for mainnet because mainnet genesis is hardcoded.

Hotfix for testnet DB:
delete from daily_new_unique_contracts_count where collected_for_day < '2021-04-01';

There could be also an issue because of some rows missing both in mainnet and testnet, for example in daily_deleted_accounts_count, I need to investigate that. Had a glance - stats look OK

Create transaction_fees aggregation under db_tables

Currently daily_gas_used is the closest aggregation under db_tables: https://github.com/near/near-analytics/blob/main/aggregations/db_tables/daily_gas_used.py

In order to get the transaction fees, we would also have to include the gas_price under blocks table.

The preliminary SQL for this would look something like this:

SELECT
    -- 1000000000: nanoseconds in a day
    date_trunc('day', to_timestamp(b.block_timestamp / 1000000000)) as collected_for_day,
    -- 1E24 yoctoNEAR
    NULLIF(SUM(c.gas_used * b.gas_price) / 1E24, 0) as transaction_fees
FROM blocks as b
JOIN chunks as c ON b.block_hash = c.included_in_block_hash
GROUP BY collected_for_day
ORDER BY collected_for_day desc

Open questions:

  • Would it make sense to make the transaction_fees column a NUMERIC(30,0)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.