Comments (11)
Andrei, thanks for your profiling! Can you please repeat this on this commit to check if (and how) my changes reduce time consumption by
self.post_json
Yes, I plan to do this as a part of refactoring started in #11364. Concretely, I plan to add an API that allows to submit transaction without waiting for completion directly by the Locust user.
I'm still thinking how exactly the Locust user should behave to avoid overloading the system with these async transactions as we lose the backpressure mechanism that comes from waiting for transaction.
from nearcore.
This should be the fix: aabf18b
Let's see if we can productionize it.
from nearcore.
Small note:
This fix uses FastHttpUser
instead of HttpUser
, and, according to locust docs it can perform not 4000, but 16000 TPS. From my experience with our setup switching from HttpUser
to FastHttpUser
gives us around 10-20% extra TPS, not 300%
from nearcore.
Probably, such difference in ratio shows that in our case bottleneck for locust is computations to generate and sign a transaction, not sending it to the node. (But I didn't read yet what is actually the difference between fast and not fast HttpUser
in locust)
from nearcore.
TLDR: I don't surprised that generation of NEAR transactions is 10 times slower than "best case scenario"
And for me it sounds reasonable, they say about 4k TPS in "best case scenario", so probably they just continuously sending thousands of identical requests like GET / HTTP/1.1 Host: example.com
in their benchmark
So probably their request generation takes 0 CPU (while for us CPU is the bottleneck for locust) and all CPU is provided for locust internals. Probably they also located the server on same Mac machine and made it as simple as possible, like just send back HTTP/1.1 200 OK Content-Length: 0
But maybe I'm wrong.
from nearcore.
All fair observations - I want to confirm this by running Locust under profiler.
Here is a sample profile of send_tx
function:
Timer unit: 1e-09 s
Total time: 2612.09 s
File: /home/akashin/repos/work/nearcore/pytest/tests/loadtest/locust/common/base.py
Function: send_tx at line 268
Line # Hits Time Per Hit % Time Line Contents
==============================================================
268 @line_profiler.profile
269 def send_tx(self, tx: Transaction, locust_name: str) -> dict:
270 """
271 Send a transaction and return the result, no retry attempted.
272 """
273 3115 4e+11 1e+08 14.2 block_hash = self.final_block_hash()
274 2937 5597757423.0 2e+06 0.2 signed_tx = tx.sign_and_serialize(block_hash)
275
276 2937 10644362.0 3624.2 0.0 meta = self.new_locust_metadata(locust_name)
277 2937 2838567.0 966.5 0.0 start_perf_counter = time.perf_counter()
278
279 2937 717732.0 244.4 0.0 try:
280 2937 467654.0 159.2 0.0 try:
281 # To get proper errors on invalid transaction, we need to use sync api first
282 4222 2e+12 5e+08 85.5 result = self.post_json(
283 2937 698936.0 238.0 0.0 "broadcast_tx_commit",
284 2937 19713730.0 6712.2 0.0 [base64.b64encode(signed_tx).decode('utf8')])
285 1462 162273411.0 110994.1 0.0 evaluate_rpc_result(result.json())
286 1 747.0 747.0 0.0 except TxUnknownError as err:
287 # This means we time out in one way or another.
288 # In that case, the stateless transaction validation was
289 # successful, we can now use async API without missing errors.
290 submit_raw_response = self.post_json(
291 "broadcast_tx_async",
292 [base64.b64encode(signed_tx).decode('utf8')])
293 meta["response_length"] = len(submit_raw_response.text)
294 submit_response = submit_raw_response.json()
295 # extract transaction ID from response, it should be "{ "result": "id...." }"
296 if not "result" in submit_response:
297 meta["exception"] = RpcError(message="Didn't get a TX ID",
298 details=submit_response)
299 meta["response"] = submit_response.content
300 else:
301 tx.transaction_id = submit_response["result"]
302 # using retrying lib here to poll until a response is ready
303 self.poll_tx_result(meta, tx)
304 1 328.0 328.0 0.0 except NearError as err:
305 logging.warn(f"marking an error {err.message}, {err.details}")
306 meta["exception"] = err
307
308 4386 4688350.0 1068.9 0.0 meta["response_time"] = (time.perf_counter() -
309 2924 1080117.0 369.4 0.0 start_perf_counter) * 1000
310
311 # Track request + response in Locust
312 1462 77682295.0 53134.3 0.0 self.request_event.fire(**meta)
313 1462 583757.0 399.3 0.0 return meta
2612.09 seconds - /home/akashin/repos/work/nearcore/pytest/tests/loadtest/locust/common/base.py:268 - send_tx
The 99% of the wall-clock time is split between two lines:
block_hash = self.final_block_hash()
result = self.post_json("broadcast_tx_commit", ...)
Which are both RPC requests to the node.
The transaction signing takes 0.2% of wall-clock time which is non-zero but also not that significant.
I think the next step is to understand the threading model that Locust is using - if these transaction submissions are executed on a single thread in a blocking fashion that will surely limit the throughput. Instead, we should be using something like asyncio here.
from nearcore.
To analyse this further, signing takes 2ms of wallclock time, so we can do around 500 of those per second in a Python process, which would be a bottleneck for a single worker setup. For 16 workers that should not be a problem though.
It would be nice to improve the performance of signing, but that probably is not the root cause of limited throughput that we see with multiple shards.
from nearcore.
Here is another profile from a longer run with more users:
Timer unit: 1e-09 s
Total time: 6.04117e+06 s
File: /home/akashin/repos/work/nearcore/pytest/tests/loadtest/locust/common/base.py
Function: send_tx at line 268
Line # Hits Time Per Hit % Time Line Contents
==============================================================
268 @line_profiler.profile
269 def send_tx(self, tx: Transaction, locust_name: str) -> dict:
270 """
271 Send a transaction and return the result, no retry attempted.
272 """
273 2062925 5e+15 2e+09 80.7 block_hash = self.final_block_hash()
274 1889490 2e+12 1e+06 0.0 signed_tx = tx.sign_and_serialize(block_hash)
275
276 1889490 4870509915.0 2577.7 0.0 meta = self.new_locust_metadata(locust_name)
277 1889490 1717009524.0 908.7 0.0 start_perf_counter = time.perf_counter()
278
279 1889490 341764477.0 180.9 0.0 try:
280 1889490 230492041.0 122.0 0.0 try:
281 # To get proper errors on invalid transaction, we need to use sync api first
282 2528557 1e+15 5e+08 19.3 result = self.post_json(
283 1889490 337802006.0 178.8 0.0 "broadcast_tx_commit",
284 1889490 8958336492.0 4741.1 0.0 [base64.b64encode(signed_tx).decode('utf8')])
285 812412 8e+10 104359.0 0.0 evaluate_rpc_result(result.json())
286 2262 1064247.0 470.5 0.0 except TxUnknownError as err:
287 # This means we time out in one way or another.
288 # In that case, the stateless transaction validation was
289 # successful, we can now use async API without missing errors.
290 submit_raw_response = self.post_json(
291 "broadcast_tx_async",
292 [base64.b64encode(signed_tx).decode('utf8')])
293 meta["response_length"] = len(submit_raw_response.text)
294 submit_response = submit_raw_response.json()
295 # extract transaction ID from response, it should be "{ "result": "id...." }"
296 if not "result" in submit_response:
297 meta["exception"] = RpcError(message="Didn't get a TX ID",
298 details=submit_response)
299 meta["response"] = submit_response.content
300 else:
301 tx.transaction_id = submit_response["result"]
302 # using retrying lib here to poll until a response is ready
303 self.poll_tx_result(meta, tx)
304 2262 490022.0 216.6 0.0 except NearError as err:
305 2188 144508862.0 66046.1 0.0 logging.warn(f"marking an error {err.message}, {err.details}")
306 2188 804307.0 367.6 0.0 meta["exception"] = err
307
308 2437236 2564044058.0 1052.0 0.0 meta["response_time"] = (time.perf_counter() -
309 1624824 519629076.0 319.8 0.0 start_perf_counter) * 1000
310
311 # Track request + response in Locust
312 812412 4e+10 49686.0 0.0 self.request_event.fire(**meta)
313 812412 303569684.0 373.7 0.0 return meta
6041174.94 seconds - /home/akashin/repos/work/nearcore/pytest/tests/loadtest/locust/common/base.py:268 - send_tx
Here final_block_hash
is taking a massive 80% of the time and would be a first candidate to optimize.
Signing on average takes 1ms which is similar to a previous run.
The rest 20% is in broadcast_tx_commit
, which on average takes 500ms.
I'll go ahead and try a few local optimizations to improve the single-thread performance.
from nearcore.
For the reference, here is the result of running a sampling profiler py-spy
in the same setup:
from nearcore.
Andrei, thanks for your profiling! Can you please repeat this on this commit to check if (and how) my changes reduce time consumption by self.post_json
from nearcore.
We can now genereate 1000 TPS with a single Locust process at 0.8 CPU core usage, which should be good enough for our benchmarks.
from nearcore.
Related Issues (20)
- Header sync incorrectly disregards peers at startup
- [Epoch Sync] EpochInfoAggregator treatment is awkward at the epoch sync boundary
- [Epoch Sync] Implement EpochSyncProof validation
- [Epoch Sync] Network message size optimization
- [Epoch Sync] Implement deriving EpochSyncProof from an older EpochSyncProof
- [Epoch Sync] Error in gc: DB Not Found Error: BLOCK HEADER: 11111111111111111111111111111111
- [Epoch Sync] Implement network rate limiting and caching for EpochSyncRequest
- [Epoch Sync] Minor: Header sync progress is misleading after epoch sync
- Spoofed compute costs metrics
- Chunk endorsement by shard metrics must use shards chunks-endorsed, instead of shard chunks-produced
- Tracking issue for fixing minimal required stake HOT 2
- [Epoch Sync] Blocks received during state sync marked as invalid
- mainnet: synced node has regular Downloading blocks logs HOT 3
- [ReshardingV3] Receipts Reassignment HOT 1
- [EpochSync] Defend against malicious peers sending invalid epoch sync
- Difficulties Managing Gas Allocation for Chained Cross-Contract Calls HOT 2
- publishing: add a CI task to test packaging/building of all published crates individually HOT 4
- High cardinality metric HOT 4
- Downloading blocks stuck at 0.00% HOT 3
- Removing the validator key and then sending SIGHUP to the neard process does not appear to stop it validating
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nearcore.