Comments (19)
Here is the snippet where the EAH gets mixed into the bank hash as I was looking up for my own understanding as well:
Lines 6963 to 7004 in 9db4e84
from solana.
bad
[2024-01-21T03:16:02.013644297Z INFO solana_runtime::bank] bank frozen: 243108000 hash: 7h4iAoXfX6KwitVhazib3fCnr3J4koFu7ArJZTY5heFZ accounts_delta: GCj8CFVaqeHwkVgaLb6f9PaPYeZMofBsKL1op3LXUJJ8 signature_count: 500 last_blockhash: Ant9w6LfnGbG4Jpm5VMxmyRpzN3myx3dDpAQSE3kTwcs capitalization: 567558615827727708, epoch_accounts_hash: xwb6iXsG3vdHgUTAFkgUiACYPwYkW8F7465CNL9WVaC, stats: BankHashStats { num_updated_accounts: 1465, num_removed_accounts: 14, num_lamports_stored: 39484939731971, total_data_len: 10492554, num_executable_accounts: 1 }
good
[2024-01-21T03:16:02.022355358Z INFO solana_runtime::bank] bank frozen: 243108000 hash: HMk3tMMympeHyBRpoDrqMLxfLjSUZvNiLYx6i2JCuNGZ accounts_delta: GCj8CFVaqeHwkVgaLb6f9PaPYeZMofBsKL1op3LXUJJ8 signature_count: 500 last_blockhash: Ant9w6LfnGbG4Jpm5VMxmyRpzN3myx3dDpAQSE3kTwcs capitalization: 567558615826502748, epoch_accounts_hash: 824tUYuwAKFv2kKz5m2Xf8YHNYhYUhsqwohjmrvTp3Be, stats: BankHashStats { num_updated_accounts: 1465, num_removed_accounts: 14, num_lamports_stored: 39484939731971, total_data_len: 10492554, num_executable_accounts: 1 }
Hmm yeah, it looks like the only thing that differs here are the epoch_accounts_hash and capitalization. The fact that you node did not diverge previously would suggest that the account that caused the EAH to diverge did NOT appear as part of any bank hashes recently. So, I don't think replaying the slot will give us any useful information. Rather, I think we would have to examine each account to find the offending one.
from solana.
And just making sure, you were running 6a9f729 with no other modifications ?
I think this is due to the reward PDA account was created in the previous epoch when run my node with partitioned rewards enabled (#34809). That PR is incomplete.
I have pushed fixes for this just now.
So to confirm, your believe that an account from a PR that has not landed yet altered your account state and caused your node to diverge ? And the fixes were pushed to your PR?
from solana.
243107999 matches
good: bank frozen: 243107999 hash: 7X1s4w65yBnwz18fKg5tamNrSHnK1AZyBhQEZRjctDhc
bad: bank frozen: 243107999 hash: 7X1s4w65yBnwz18fKg5tamNrSHnK1AZyBhQEZRjctDhc
243108000 mismatches
good bank frozen: 243108000 hash: HMk3tMMympeHyBRpoDrqMLxfLjSUZvNiLYx6i2JCuNGZ
bad bank frozen: 243108000 hash: 7h4iAoXfX6KwitVhazib3fCnr3J4koFu7ArJZTY5heFZ
from solana.
And you have confirmed it with and without the commit you linked above, 6a9f729 ?
from solana.
bad
[2024-01-21T03:16:02.013644297Z INFO solana_runtime::bank]
bank frozen: 243108000 hash: 7h4iAoXfX6KwitVhazib3fCnr3J4koFu7ArJZTY5heFZ
accounts_delta: GCj8CFVaqeHwkVgaLb6f9PaPYeZMofBsKL1op3LXUJJ8 signature_count: 500 last_blockhash: Ant9w6LfnGbG4Jpm5VMxmyRpzN3myx3dDpAQSE3kTwcs
capitalization: 567558615827727708, epoch_accounts_hash: xwb6iXsG3vdHgUTAFkgUiACYPwYkW8F7465CNL9WVaC,
stats: BankHashStats { num_updated_accounts: 1465, num_removed_accounts: 14, num_lamports_stored: 39484939731971, total_data_len: 10492554, num_executable_accounts: 1 }
good
[2024-01-21T03:16:02.022355358Z INFO solana_runtime::bank]
bank frozen: 243108000 hash: HMk3tMMympeHyBRpoDrqMLxfLjSUZvNiLYx6i2JCuNGZ
accounts_delta: GCj8CFVaqeHwkVgaLb6f9PaPYeZMofBsKL1op3LXUJJ8 signature_count: 500 last_blockhash: Ant9w6LfnGbG4Jpm5VMxmyRpzN3myx3dDpAQSE3kTwcs
capitalization: 567558615826502748, epoch_accounts_hash: 824tUYuwAKFv2kKz5m2Xf8YHNYhYUhsqwohjmrvTp3Be,
stats: BankHashStats { num_updated_accounts: 1465, num_removed_accounts: 14, num_lamports_stored: 39484939731971, total_data_len: 10492554, num_executable_accounts: 1 }
from solana.
And you have confirmed it with and without the commit you linked above, 6a9f729 ?
Yes. #34623 is not the problem.
I have verified with ledger tool that with and without the #34623 we get the same bad bank hash 7h4iAoXfX6KwitVhazib3fCnr3J4koFu7ArJZTY5heFZ
from solana.
It seems on 243108000 the epoch_accounts_hash mismatched, which could cause the bank hash mismatch?
from solana.
Yes. #34623 is not the problem.
I have verified with ledger tool that with and without the #34623 we get the same bad bank hash 7h4iAoXfX6KwitVhazib3fCnr3J4koFu7ArJZTY5heFZ
Gotcha, maybe we need to bisect then. We seemingly don't have a last known good commit, but I also would have expected canaries to hit this if the issue was too long ago.
And just making sure, you were running 6a9f729 with no other modifications ?
from solana.
@brooksprumo My understanding is that If epoch_account_hash_mismatched, then at the bank when we look for epoch_accounts_hash (i.e. 3/4 of the epoch??), we will get a mismatch at the bank hash? Is that right?
from solana.
The epoch accounts hash is a (full) accounts hash calculation taken at the rooted slot 1/4 into the epoch. That value is saved into accounts-db and then hashed into the bank (when freezing) that's 3/4 into the epoch.
If the EAH values are different, that would imply an account hash calculation mismatch.
And yes, if the EAH values are different, that will cause the bank hashes to be different.
from solana.
Yeah. 243108000 is at the 75% of epoch.
>>> x = 243108000
>>> begin = 242784000
>>> end = 243215999
>>> (x-begin)/(end-begin)
0.7500017361151299
from solana.
Yeah. 243108000 is at the 75% of epoch.
Also the EAH is only part of the "bank frozen" log line for the one1 bank that's including the EAH.
Footnotes
-
There can be multiple forks such that more than one bank includes the EAH, since this is only "frozen". Eventually only one of these banks will be rooted. ↩
from solana.
Rather, I think we would have to examine each account to find the offending one.
Yeah, that's what I would think too. The accounts hash calculation happens way after the bank hash is calculated (the eah start slot), so replaying the slot won't include anything about the EAH. We occasionally see accounts hash mismatches on snapshots, and it hasn't been reproducible before. Often theorized to be a HW issue, esp disk related.
from solana.
Note that I see this at startup when unpacking the snapshot:
[2024-01-22T20:02:23.357305050Z WARN solana_runtime::bank]
verify failed: slot: 243107789, 4n2M6Y7tgkjUfahST7DFPLL1Su4V3cWGxo3gduJSv99N (calculated) != 95ahPUghb36UjDVtAmpLHGhYWFSwZ22spWW6k78AgGHQ (expected)
[2024-01-22T20:02:23.357326060Z INFO solana_metrics::metrics]
datapoint: verify_snapshot_bank clean_us=4i shrink_us=1i verify_accounts_us=577i verify_bank_us=9414i
thread 'main' panicked at /solana-labs/solana-secondary/runtime/src/snapshot_bank_utils.rs:374:9:
Snapshot bank for slot 243107789 failed to verify
from solana.
yeah. I see that too.
[2024-01-22T21:29:57.829048056Z INFO solana_metrics::metrics] datapoint: accounts_db_active hash=0i
[2024-01-22T21:29:57.834151537Z WARN solana_accounts_db::accounts_db] Mismatched total lamports: 567558617892842802 calculated: 567558617891617842
[2024-01-22T21:29:57.834178147Z WARN solana_accounts_db::accounts] verify_accounts_hash failed: MismatchedTotalLamports(567558617891617842, 567558617892842802), slot: 243107789
[2024-01-22T21:29:57.834662541Z INFO solana_runtime::bank] Initial background accounts hash verification has stopped
I am working on a fix for this.
d76a5a3
from solana.
I think this is due to the reward PDA account was created in the previous epoch when run my node with partitioned rewards enabled (#34809). That PR is incomplete. It only patches the bank-hash to ignore the PDA accounts. but it didn't take care of epoch_hash and bank lamport adjustment. Therefore, we fail at the slot when we include epoch hash into the bank hash.
I have pushed fixes for this just now.
from solana.
So to confirm, your believe that an account from a PR that has not landed yet altered your account state and caused your node to diverge ? And the fixes were pushed to your PR?
Yes, that's correct. All these are specific to my node, which was running with --partitioned-epoch-rewards-force-enable-single-slot
on and off, which then messed up with the accounts stored in my local node.
from solana.
Close the issue since this is only specific related to my node.
from solana.
Related Issues (20)
- Compile error when using the offset_of macro in the solana-program crate HOT 1
- My own node has a very low amount of data HOT 1
- Run solana rpc node got error "connection timed out" HOT 1
- Run solana rpc node got error "connection timed out" HOT 1
- solana-watchtower: add minimum delegated option HOT 1
- Docker - Apple Silicon M3: Illegal instruction HOT 1
- extra entry found: "._genesis.bin" Regular HOT 1
- anchor build failed HOT 1
- 网络攻击像是个被操控的木马程序 HOT 1
- Solana Cli Issue for set RPC URL HOT 1
- SOLANA_METRICS_CONFIG incompatibility with InfluxDB OSS v2 HOT 1
- Problem in the main contribution to Solana HOT 1
- Feature Gate: HOT 1
- Wallet HOT 1
- This was added to fix a `nightly` issue in https://github.com/solana-playground/solana-playground/pull/57. It probably doesn't exist anymore since you also removed the `solana-frozen-abi` dependency, but could you double check to make sure? HOT 1
- Stack offset of 4752 exceeded max offset HOT 1
- solana-test-validator error, expects GNU-tar: proposed solution! HOT 1
- solana-validator crashes during the snapshot unpacking process HOT 1
- ![image](https://github.com/user-attachments/assets/b32f13af-6be5-41d3-9528-6bb2e541e1bf) HOT 1
- max rent epoch serialization HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from solana.