Comments (11)
Suggestion to enforce ~45MB witness size per chunk
Here is an idea how to limit the worst-case sizes.
We could add additional chunk space limits. Today, it's already limited by gas, by compute costs (in case of known-to-be-undercharged parameters), and by the size of transactions. We can add more conditions.
Specifically, I would like to add a limit for:
- Total network bandwidth. This would count the total size of receipts created either from a transaction or from within a contract call. Once we go above a limit of e.g. 10MB, we would declare the chunk as full and put the unprocessed receipts in the delayed receipts queue.
- Total witness size. This would count how large the witness of a chunk application is, including state trie nodes and state values. For now, I assume we will have an additional phase after receipt execution that produces the witness, which blocks execution of the next receipt. This phase includes reading all trie nodes necessary (with flat state access we didn't need to read the nodes to compute the result) and adding up the actual node sizes to get an exact witness size. If the total witness size exceeds a limit (maybe another 10MB) then again we would not execute additional receipts for this chunk.
For this to work, we need good congestion control. If the delayed receipts queue gets too long, this itself could blow up the witness size. Let's say we can keep the queue length per shard below 10MB, then we have used 3 * 10MB = 30MB of witness size so far. The remaining 15MB come from assumption around how we enforce the limit.
Note on enforcing (soft-)limits
To avoid re-execution, we might want to make these soft limits. (The last applied receipt is allowed to go beyond, we just don't execute another one.) That's how we are currently handling all existing limits. But that means the real limit is the soft limit + the largest possible size of single receipt application. If we do nothing, this completely blows up the limits back to the 100s of MBs territory.
To limit the effect of the last receipt that goes above the soft limit, I suggest we also add per-receipt limits to network bandwidth and witness size. Going above the limit will result in an error, which is observable by users as a new failure mode. But the hypothesis is that we can set the limit high enough that normal usage will not hit them.
Ballpark numbers:
- ~5MB for bandwidth per receipts seems reasonable to me. A receipt can produce lots of network bandwidth by making a cross contract call. Today, it's possible to make many calls at once and attach an argument of up to 4MB to each. We could say the total limit per receipt is 5MB, then users can still make a single cross contract call with a 4MB argument, or two calls with 2MB each. But they couldn't do two calls with 4MB each. I don't think that's a problem for anyone.
- ~10MB for witness size per receipt also seems reasonable. This would allow 20k trie nodes, which should be plenty. Also, it would still allow to overwrite an existing 4MB value in storage with another 4MB value, which results in 8MB state access. (4MB is the maximum size allowed for single storage entry.)
Hard limit?
The obvious alternative would be to just make these hard limits. As in, the receipt that breaches the limit is reverted and will only be applied in the next chunk.
In this design, the per-receipt limit would be the same as the per chunk limit. Single receipts that go above the chunk limit will never execute successfully, so we will also need to make them explicitly fail which again introduces a new failure mode for receipt execution. Still, enforcing hard limits could bring down the worst-case witness size from ~45MB to ~30MB if we just take all the suggested numbers used so far.
from nearcore.
Trie node access would be pretty cheap, yeah. But writes are not completely free (coz they gotta go to disk), so probably keep that.
Access may be cheap but this issue here is specifically about witness size in the worst case. (hard limits we can proof, not practical usage patterns) Even if it's served from memory, you still have to upload and download over the network, potentially archive it and so on.
Not sure about the empty receipts part - what does that mean?
With stateless validation, my assumption here is/was that each trie node involved in the state transition needs to be in the witness. (Assuming pre-ZK implementation)
An empty receipt, as in a receipt with no actions, is still a receipt that is stored and loaded from the trie. But because it's empty, it's particularly cheap in terms of gas costs. In other words, many empty receipts may be included in a single chunk.
According to my table above, you can access 3 million trie nodes with just 1000 Tgas if it's filled with empty receipts. This seems like a problem for state witness size. Because to prove that a receipt is indeed part of a current trie, you have to include all those trie nodes in the witness for a single chunk.
Does that make sense?
from nearcore.
It looks like I will need to split this into several stages. Data receipts can become quite tricky to figure out in which witnesses they belong inside, or how many of them can clump up at one place. I will write more details about them in a future comment.
For now, let me list for action receipts how large they can become based on gas and other runtime limits.
This is not considering the state accessed, or even the output logs produced.
Network Message Sizes
Network MB Output / 1Pgas | Network MB Input / 1Pgas | Main Parameter | |
---|---|---|---|
[MB] | [MB] | ||
empty action receipt | 0.60 | 0.60 | |
CreateAccount | 0.00 | 0.00 | |
DeployContract | 145.83 | 15.48 | action_deploy_contract_per_byte .send |
FunctionCall | 358.57 | 358.57 | action_function_call_per_byte |
Transfer | 0.15 | 0.15 | |
Stake | 0.58 | 0.80 | |
AddKey | 20.44 | 20.44 | action_function_call_per_byte |
DeleteKey | 0.69 | 0.69 | |
DeleteAccount | 0.47 | 0.47 | |
Delegate (empty) | 1.44 | 1.44 |
The table above shows that function call actions can become the largest per gas unit. A chunk filled with function calls could be up to 358MB large, just for storing the actions. This is true today, before we even start talking about changes that would be necessary for stateless validation.
from nearcore.
The data dependency rabbit hole looks a bit too deep to reach a conclusion just yet. I will ignore them for now and just look at other boundaries we can figure out.
from nearcore.
Accessed state size per action
To understand how much state an action receipt may access in the worst case, we have to consider two parts.
- How many trie nodes are necessary to include in the witness
- How much actual data is accessed
Values
Observations:
- Most actions look up a small number of objects, such as access keys or account meta data. We should compare the borsh serialized sizes of these objects with the gas cost to understand how much data is required for a chunk witness.
- Function calls are trickier because they access data dynamically. But all value accesses from a smart contract are limited by gas, so we only have to understand what is the cheapest way to read a byte and extrapolate from there.
Number of trie nodes accessed
Observations:
- Each action has a fixed number of trie keys it accesses, except for the function call which may access a dynamic number of keys.
- Function calls are limited by 300 Tgas. But with flat storage, the touched-trie-node count (TTN) itself doesn't actually increase gas usage. Instead we need to find the cheapest way to read keys with many trie nodes (which is
storage_has_key()
) and extrapolate. - Not all trie keys are equal. For example,
TrieKey::Account
has a maximum key length of 65 bytes, whereasTrieKey::ContractData
has a maximum length of 2114 bytes. By far, theContractData
keys have the highest upper limit.
Results
I've done all the analysis in the spreadsheet I also used for calculating how large receipts themselves are.
It should be publicly readable: https://docs.google.com/spreadsheets/d/1m66nacHFbvM0rogeS_lV8uJanhbvQKEF0caY93jMOpg/edit#gid=200844971
Here is the summary. Note that the first row (action receipt) is always necessary, so the touched trie nodes for a account creation transaction will be action_receipt
+ CreateAccount
.
TTN / PGas | Trie Values Bytes / Pgas | |
---|---|---|
[TTN] | [MB] | |
empty action receipt | 3'629'630 | * |
CreateAccount | 0 | 0.00 |
DeployContract | 702'703 | 145.83 |
FunctionCall | 36'106'032 | 183.54 |
Transfer | 0 | 0.00 |
Stake | 0 | 0.00 |
AddKey | 2'475'185 | 20.44 |
DeleteKey | 2'757'895 | 22.78 |
DeleteAccount | 0 | 0.00 |
Delegate (empty) | 1'310'000 | 10.82 |
* empty action receipt needs to access data receipts, these calculations are still ongoing
Conclusion
- Trie nodes likely need explicit limits
- Function calls can currently access a lot of trie nodes (more than 36 million nodes in a single chunk!).
- The next biggest offender is simply the overhead for empty receipts. This can also lead to 3M nodes in a chunk.
- (for context: a trie node with 16 children is 512B large, so if a witness should be limited to 10MB we can only fit around 20000 nodes, definitely not millions of them)
- The values themselves are also quite large. But only deployments and function calls have the potential to blow things up beyond ~20MB per chunk.
from nearcore.
For this to work, we need good congestion control.
@jakmeier , Is this already handled with local congestion control? Or are you suggesting us to do global congestion control as well?
from nearcore.
Sounds like this needs NEP?
from nearcore.
- I think we would need global congestion control to guarantee a limit on the delayed receipts queue, local congestion control only stops new transactions from coming in but not the receipts from other shards. But local congestion control definitely helps in practice and might be good enough for a first iteration.
- Yes, it needs a NEP
from nearcore.
@robin-near, when we are loading trie in memory, how much do we have to pay attention on TTN? Can we completely get rid of it? In other words, can we cross out the following statement by Jakob?
Trie nodes likely need explicit limits:
* Function calls can currently access a lot of trie nodes (more than 36 million nodes in a single chunk!).
* The next biggest offender is simply the overhead for empty receipts. This can also lead to 3M nodes in a chunk.
from nearcore.
Trie node access would be pretty cheap, yeah. But writes are not completely free (coz they gotta go to disk), so probably keep that.
Not sure about the empty receipts part - what does that mean?
from nearcore.
Oh sorry I was answering Yoon's question without looking at the context. For the greater state witness size issue I don't have any useful comment at this moment.
from nearcore.
Related Issues (20)
- RPC speedup proposal
- [Nayduck] Known flaky tests
- [Nayduck] Flaky test test_trie_consistency_random
- Rename PeerStore
- [Betanet] Support starting split storage archival node from genesis
- [Betanet] Support resharding needs for archival nodes in betanet HOT 4
- 🔷 [2023-11] zkASM backend produces machine code the right way
- Flaky snapshot_hosts::tests::invalid_signature HOT 1
- bit flip test: bigger and better
- [Betanet] GitHub workflow "Prepare snapshot for forknet deployments" won't run from scheduler HOT 1
- [Betanet] Update tooling to use binary from master
- [Betanet] custom GCP role needs to be created from bootstrap
- [Betanet] create re-usable workflow for prepare
- [Betanet] Prepare workflow gets killed by GitHub
- Clean up old state and flat state after resharding is finished
- Wasm validation should happen directly inside near-vm-compiler-singlepass HOT 1
- Review all near-vm’s TODOs
- Make sure we validate genesis, so that min gas price is not above max gas price
- delete snapshot on archival nodes after resharding is done to minimize storage overhead
- fork-network amend-access-keys crashes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nearcore.