Comments (7)
I've done some testing.
In tests/branch_prediction, there is a RandomX program translated into C. Only CALL and RET instructions are implemented, the rest is NOP. The program contains 128 CALL/RET instructions and 896 NOPs.
The program has 4 versions with different branching patterns:
- branch_always - CALL/RET is always taken.
- branch_predictably - One global counter is used to make random regular branching patterns for all CALL/RET instructions.
- branch_randomly - CALL/RET is randomly taken/not taken (each instruction has a different probability). An inline LCG is used for random number generation.
- branch_mixed - there are 2 types of CALL instructions: first type has the "predictable" branching pattern and a small number of CALL instructions have the "random" branching pattern.
The program uses the perf_event_open syscall to measure the number of hardware branch mispredictions.
> ./branch_mixed 1000000
0 1364014
1 16404
2 2807174
VM branches: 131967
Index 0 is the number of hardware branch instructions executed, index 1 is the number of mispredictions and index 3 is the runtime measured by the CPU timer.
As expected, the results vary a lot depending the CPU architecture. I have tested Intel Ivy Bridge (Core i3-3220) and AMD Ryzen (R7-1700).
Intel Ivy Bridge
HW branches | mispred. | runtime | VM branches (VMB) | mispred. per VMB | runtime per VMB | |
---|---|---|---|---|---|---|
branch_always | 1185261 | 427 | 3929248 | 111110 | 0.4% | 112% |
branch_predictably | 1223308 | 6562 | 3837753 | 134598 | 4.9% | 90% |
branch_randomly | 1271279 | 61055 | 4401908 | 139093 | 43.9% | 100% |
branch_mixed | 1364015 | 19060 | 3942136 | 131967 | 14.4% | 94% |
AMD Ryzen
HW branches | mispred. | runtime | VM branches (VMB) | mispred. per VMB | runtime per VMB | |
---|---|---|---|---|---|---|
branch_always | 1185261 | 338 | 2739858 | 111110 | 0.3% | 110% |
branch_predictably | 1223308 | 7214 | 2789977 | 134598 | 5.4% | 92% |
branch_randomly | 1271278 | 56676 | 3129017 | 139093 | 40.7% | 100% |
branch_mixed | 1364014 | 16766 | 2918477 | 131967 | 12.7% | 98% |
The performance difference between the predictable and random strategies is < 10%, which will become even smaller once the remaining (non-branching) instruction are taken into account.
If someone has access to a Linux machine with another CPU architecture, please test the 4 strategies and report results. Makefile is included to build the 4 tests. Run with 1000000 instructions like the example above. Beware that in some cases, the results vary widely between runs, so it's best to run multiple times and calculate the average.
Summary
Implementing any kind of branch-prediction-friendly patterns in RandomX is probably not worth the effort and risk of ASIC optimization.
from randomx.
Intel Core i5-8250U CPU @ 1.60GHz, average over 10 runs
./branch_always 1000000
0 1185262
1 84
2 4956039
VM branches: 111110
./branch_predictably 1000000
0 1262030
1 2454
2 4745556
VM branches: 134598
./branch_randomly 1000000
0 1304653
1 49918
2 5590118
VM branches: 139093
./branch_mixed 1000000
0 1364016
1 8635
2 4717666
VM branches: 131967
from randomx.
@wowario Can you please repeat branch_predictably and branch_mixed with the latest commit? There was a bug in the predictable algorithm that made it never branch.
from randomx.
How is "runtime per VMB" calculated?
from randomx.
Ryzen 2200G
HW branches | mispred. | runtime | VM branches (VMB) | mispred. per VMB | runtime per VMB | |
---|---|---|---|---|---|---|
branch_always | 1185262 | 139 | 2909838 | 111110 | 0.1% | 126% |
branch_predictably | 1262032 | 5386 | 2825920 | 134598 | 4.0% | 101% |
branch_randomly | 1304653 | 55902 | 2885222 | 139093 | 40.2% | 100% |
branch_mixed | 1364016 | 13485 | 2854595 | 131967 | 10.2% | 104% |
from randomx.
@fuwa0529 It's the runtime divided by VMB scaled so that branch_randomly = 100%.
For you, it's ~126%, ~101%, 100% and ~104%. Strangely, random is the fastest.
from randomx.
I did notice some variation of the result, yet I was being lazy and only ran each test once ...
from randomx.
Related Issues (20)
- How to build with clang HOT 1
- Hash verifications fails on OSX ventura with M1 HOT 11
- Can `spAddr0 + 16 * i` and `spAddr1 + 8 * i` be the same value? HOT 2
- Linking error on aarch64 (ARM64) architecture HOT 5
- gcc 13.x not building RandomX tests HOT 2
- Try to understand hashrate of epyc cpus HOT 1
- Segfault on sample code on init cache HOT 36
- Help understanding the codebase HOT 1
- Why fast mode is being used? HOT 6
- Compilation failed in a project using RandomX HOT 3
- `randomx_reciprocal` is slow HOT 1
- Random X
- Cross-compiling for Windows with MXE HOT 2
- hash_first vs hash_next vs hash? HOT 1
- Trying to compile RandomX on Sailfish OS (armv7hl, aarch64) HOT 4
- Crash on Apple Silicon HOT 4
- Benchmarking test: commitment calculation HOT 2
- mx register: inconsistency between specification and reference implementation - alignment HOT 2
- How to
- why not add avx, avx2 and possibly avx512 use to randomx? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from randomx.