Giter Club home page Giter Club logo

Comments (14)

telezhnaya avatar telezhnaya commented on August 22, 2024 1

@frol just need to highlight: adding uuid has the same complexity as the previous proposal.

We need to add 4 additional columns (1 column for each table: transactions, account_changes, transaction_actions, receipts)
Change PK for transactions
Change 3 FKs
And do all other related stuff mentioned above

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024 1

We had a new conversation with @frol and @khorolets about this issue: some new cases appeared and the problem became more actual.

We came up with the new solution:

Creating a new table

We create a new table (1) that helps us to store all transactions and detect collisions. If we fail on a PK constraint, it means that the second indexer added this item before, we don't need to worry. If we fail on UNIQUE constraint, this is the collision.

We need to create a new name for collided transaction. I suggest old-tx-hash_#. Example: 5f4hC9yDL9zLKCeeDRZnLpe5WQb1YbtaDjxkn3t1QuW4_1. We need to try to insert this tx again. If we failed again, we try 5f4hC9yDL9zLKCeeDRZnLpe5WQb1YbtaDjxkn3t1QuW4_2 and so on. We don't have such cases for now, but we need to be ready to handle such cases.

Insert into transactions

All the logic of inserting into the transactions table is valid. Just one new detail: we change the transaction_hash if we've found the collision. The name is known after the previous step.

Speeding up the process

It's easier to think about the process sequentially: at first, we insert into the new table, then we know all the details and insert into transactions table.
In the practice, it's better to do that in parallel, hoping that everything will go fine. It's actually true in most cases. But, for the blocks with collisions, we will just need to run INSERT again. Let's discuss it in the next section

Handling BULK INSERT

We now use bulk insert statements to speed up the process. We insert all the transactions for the current block by one insert statement.
If we have no collisions, we have no problem: we will run 2 bulk inserts in parallel (to new table and to transactions table).
If we have a collision:

  1. We make INSERT INTO transactions ... ON CONFLICT DO NOTHING. It will insert all the correct transactions and ignore collided ones.
  2. We don't use ON CONFLICT DO NOTHING for the new table, so we fail to insert into the new table. In this case, we start inserting to the new table one by one. All the correct transactions will be inserted fine, and they will be also inserted into the transactions table without any additional movements (see p.1). All the collided transactions will fail: we need to assign them a new name (see the beginning of this message), and then go and insert them into the transactions table.

(1) SQL draft (please suggest better naming if you have any ideas)

CREATE TABLE transactions_with_converted_receipts (
	transaction_hash TEXT NOT NULL,
	receipt_id TEXT NOT NULL
);

ALTER TABLE transactions_with_converted_receipts
ADD PRIMARY KEY (transaction_hash, receipt_id);
CREATE UNIQUE INDEX transactions_with_converted_receipts_unique_idx
	ON transactions_with_converted_receipts (transaction_hash);

from near-indexer-for-explorer.

khorolets avatar khorolets commented on August 22, 2024

@telezhnaya good catch! Thanks

@frol looks weird. https://explorer.near.org/accounts/ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088

It looks like accounts was deleted before create_account was created on behalf of this account.

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

get another one
85fe743120593424437b34b05e45af39f2c3de67b5701950d2fd43be1671e544

upd: I will give the final list of strange accounts tomorrow, I want to go to sleep now :)
found third one, and it's not implicit: nflfun.near

from near-indexer-for-explorer.

frol avatar frol commented on August 22, 2024

ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088 is indeed a problematic case, I will mention you in another issue where you will be able to read more details.

nflfun.near and 85fe743120593424437b34b05e45af39f2c3de67b5701950d2fd43be1671e544 look fine to me:

http post https://archival-rpc.mainnet.near.org jsonrpc=2.0 id=dontcare method=query \
                                 params:='{
                               "request_type": "view_account",
                               "finality": "final",
                               "account_id": "nflfun.near"
                             }'
{
    "id": "dontcare",
    "jsonrpc": "2.0",
    "result": {
        "amount": "997673579100734400000000",
        "block_hash": "4N4unU4VxZqJ63WAe1BpAYj1haAtGyVc3AZLDvjqBRmC",
        "block_height": 33072649,
        "code_hash": "11111111111111111111111111111111",
        "locked": "0",
        "storage_paid_at": 0,
        "storage_usage": 390
    }
}

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

@frol yeap, you are right, I added them by mistake. They were just created later than I needed.
So we only need to think about ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

Just to save the context of the discussion: https://github.com/near/nearcore/security/advisories/GHSA-2v6r-g342-282f

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

This issue is harder than we expected. After a long discussion, we decided that the best solution of keeping the database consistent is to include block_hash column into primary key in transactions table (so it will be composite).

Unfortunately, it gives us many additional tasks. 3 tables depend on transactions, so block_hash column should there as well.

  • account_changes already contains it;
  • transaction_actions should be updated;
  • receipts should be updated as well. We have block_hash column there, but it's the hash of the receipt, so it does not equal to block_hash column in the transactions.

We also need to think about indexes not to slow down queries.

The resulting plan is:

  • Add block_hash column to transaction_actions:
    • create the migration script with default empty value;
    • fix Rust code to fill new lines with the right value;
    • create the migration script with fulfilling old lines;
    • remove default empty value;
    • change primary key: make it composite;
    • add index not to slow down the table;
  • Add transaction_block_hash column to receipts: (steps are the same, copy-pasted them just to mark them completed)
    • create the migration script with default empty value;
    • fix Rust code to fill new lines with the right value;
    • create the migration script with fulfilling old lines;
    • remove default empty value;
    • change primary key: make it composite;
    • add index not to slow down the table;
  • Remove foreign key in transaction_actions;
  • Remove foreign key in receipts;
  • Remove foreign key in account_changes;
  • Change primary key in transactions, make it composite: (not sure how good guys are doing that on production, so that's just a proposal)
    • add unique constraint on future primary key;
    • remove existing primary key (duplicates will not appear because of previous step);
    • add new composite primary key;
    • remove unique constraint that was created 3 steps before;
  • Add index to transactions to transaction_hash column;
  • Add new foreign key to transaction_actions;
  • Add new foreign key to receipts;
  • Add new foreign key to account_changes;
  • Reindex the DB to fill it with all missing transactions (tables transactions, receipts, transaction_actions, execution_outcomes, action_receipts, action_receipt_actions, accounts, account_changes will be affected for sure, maybe something else)
  • Fix the documentation, we should definitely explain what is going on there.

I hope I haven't missed anything 🤯
For now, let's collect all such cases where tx hash collided and live with what we have. Not sure all that work worth the outcome.

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

For the story: we have another such case, https://explorer.near.org/accounts/f1125.bridge.near

from near-indexer-for-explorer.

frol avatar frol commented on August 22, 2024

I just want to dump my thought here. Alternatively, instead of relying on certain properties of the blockchain ("tx hash is unique" they said), we may follow an unhappy path of the real world, where the foreign key is not dependent on the data and instead we define our own primary ID (UUID).

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

We were thinking about creating hash collision postmortem somewhere in Docs, but we are not sure about the right place, so I'll leave the draft of postmortem here.
Initial issue with postmortem

## POSTMORTEM June 9th, 2021

### Quick summary

On December 2, 2020, we noticed the determined way of creating [transaction hash collisions on Testnet](https://github.com/near/nearcore/security/advisories/GHSA-2v6r-g342-282f), then also on Mainnet.

It still affects the data representation in some services such as [Near Explorer](https://explorer.near.org).

[The fix](https://github.com/near/nearcore/pull/4064) was merged on 11 of March 2021 and released in [1.19.0](https://github.com/near/nearcore/releases/tag/1.19.0), so new examples of reproducible hash collisions will not appear again.

### In details

There was a possibility to apply the same transaction two or more times.
Such transactions are logically different objects, but they have the same hash.
It means that now, in the general case, the transaction hash does not fully identify the transaction. 

We observe some limitations because of that:
- It's not possible to get the non-first appearance of transaction via [RPC](https://docs.near.org/docs/api/rpc#transaction-status).
- There is no information about the non-first appearance of transaction and all related entities in the [Indexer DB](https://github.com/near/near-indexer-for-explorer).
- As a result, [Near Explorer](https://explorer.near.org) miss such transactions, which may look confusing.

We are aware of these appearances on Mainnet:
- ...
- ...

**TODO** we should collect and report here about all such cases on Mainnet (@telezhnaya will do that)

The existence of this issue does not mean that the blockchain is corrupted somehow or there are problems with balances.
Since we've fixed the issue, the only problem now is the data representation.

**TODO** add the high-level explanation about the solution that we've come up with. (@bowenwang1996 please write here 1-2 sentences)

We are confident that such a problem will never repeat again.
Sorry for the inconvenience caused.

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

I didn't mention: we need to fill a new table in a separate process, we could do that in parallel (with NFT and other stuff).

from near-indexer-for-explorer.

rayn316 avatar rayn316 commented on August 22, 2024

When I analyzed the block from the beginning, there were some errors that caused the analysis to get stuck, I hope it helps.
You can view the file for detailed block stuck information

Cnm9MQxz3iPuStu18nr4giauGB2GodVFaM5LqUBnUZZg
23579740

21k5gkx6Zdxc5bXpJemLvmS7GnrZ5ptf9iZx6yAmsKCH
28186777

94Bk8sDKdibM1WMFuArdF3kcKzF3SCyufCQiLNLujRCa
53434285

FG95BFtSjshutbYYzcnBvugeQLhHsdwCKNEh8vFuJcFr
53720835

indexer_error.txt

from near-indexer-for-explorer.

telezhnaya avatar telezhnaya commented on August 22, 2024

For the reference:
On mainnet, we had 13 such transactions through all history:

2DuxYYMgpeYsrGakXzQYa1K4NyuCXoUsKe3u2fgXLeLV, block 57870941
2cmob9xw2x8jGRi318qRfhAvUuZzfr7gX5DQPDPtAmo9, block 36561754
H4Eh2avNzmJcKUuNcjPi6uLF27ypt6nugWEkccGW23p7, block 57472820
5pzyWUKLf2BnvjjdCVFQCXG9BPX9uvXoVDBEWVsiZUC1, block 37998063
6zZ7yEVS8RLEeDRNmQN8svWMjWwhc3UFyu9L7gkquHFR, block 57469803
9tWidTU62N2YZvwEzzKPycYgr5odzGm7yLEUxVdDRtdU, block 36953251
AUh97Kbbg962AfqnQTnmM7tSSy5rnczZ99xXNjAxkctL, block 35056813
BUGpeYLAp3TvxEj2CgAXK2h7qfB19jF6Ust51ZqNJz4v, block 56620868
BXpvf5cnQsYkdcFhDv9SDbHESkKe7Q9S4vKRMHsXuNw8, block 36662692
FsDPP4gPqzTuc1WbuGJ8CVSobz8PNJeDxLfGQq7Mnysk, block 36953251
J4s3ghYCeoQQdSag4fHruPLy8B9Jg2MhKMFYsbRuG6Ak, block 56640668
GHLd5q4vtD2QCeu7bdrowP5v7quebCnQxczH3u55FZ8d, block 57529286
56t8ceK2B8BympLJ6gPL3CYsdPMJiWSegsCEFsAhcAj7, block 35857713

from near-indexer-for-explorer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.