Currently, branches are taken/not taken randomly. This is not ideal for CPUs as they r

I've done some testing. In <a href="https://github.com/tevador/Rando

Intel Core i5-8250U CPU @ 1.60GHz, average over 10 runs <div class="snippet-clipbo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Branch prediction about randomx HOT 7 CLOSED

tevador commented on July 23, 2024

Branch prediction

from randomx.

Comments (7)

tevador commented on July 23, 2024

I've done some testing.

In tests/branch_prediction, there is a RandomX program translated into C. Only CALL and RET instructions are implemented, the rest is NOP. The program contains 128 CALL/RET instructions and 896 NOPs.

The program has 4 versions with different branching patterns:

branch_always - CALL/RET is always taken.
branch_predictably - One global counter is used to make random regular branching patterns for all CALL/RET instructions.
branch_randomly - CALL/RET is randomly taken/not taken (each instruction has a different probability). An inline LCG is used for random number generation.
branch_mixed - there are 2 types of CALL instructions: first type has the "predictable" branching pattern and a small number of CALL instructions have the "random" branching pattern.

The program uses the perf_event_open syscall to measure the number of hardware branch mispredictions.

> ./branch_mixed 1000000
0       1364014
1       16404
2       2807174
VM branches: 131967

Index 0 is the number of hardware branch instructions executed, index 1 is the number of mispredictions and index 3 is the runtime measured by the CPU timer.

As expected, the results vary a lot depending the CPU architecture. I have tested Intel Ivy Bridge (Core i3-3220) and AMD Ryzen (R7-1700).

Intel Ivy Bridge

	HW branches	mispred.	runtime	VM branches (VMB)	mispred. per VMB	runtime per VMB
branch_always	1185261	427	3929248	111110	0.4%	112%
branch_predictably	1223308	6562	3837753	134598	4.9%	90%
branch_randomly	1271279	61055	4401908	139093	43.9%	100%
branch_mixed	1364015	19060	3942136	131967	14.4%	94%

AMD Ryzen

	HW branches	mispred.	runtime	VM branches (VMB)	mispred. per VMB	runtime per VMB
branch_always	1185261	338	2739858	111110	0.3%	110%
branch_predictably	1223308	7214	2789977	134598	5.4%	92%
branch_randomly	1271278	56676	3129017	139093	40.7%	100%
branch_mixed	1364014	16766	2918477	131967	12.7%	98%

The performance difference between the predictable and random strategies is < 10%, which will become even smaller once the remaining (non-branching) instruction are taken into account.

If someone has access to a Linux machine with another CPU architecture, please test the 4 strategies and report results. Makefile is included to build the 4 tests. Run with 1000000 instructions like the example above. Beware that in some cases, the results vary widely between runs, so it's best to run multiple times and calculate the average.

Summary

Implementing any kind of branch-prediction-friendly patterns in RandomX is probably not worth the effort and risk of ASIC optimization.

from randomx.

commented on July 23, 2024

Intel Core i5-8250U CPU @ 1.60GHz, average over 10 runs

./branch_always 1000000
0	1185262
1	84
2	4956039
VM branches: 111110

./branch_predictably 1000000
0	1262030
1	2454
2	4745556
VM branches: 134598

./branch_randomly 1000000
0	1304653
1	49918
2	5590118
VM branches: 139093

./branch_mixed 1000000
0	1364016
1	8635
2	4717666
VM branches: 131967

from randomx.

tevador commented on July 23, 2024

@wowario Can you please repeat branch_predictably and branch_mixed with the latest commit? There was a bug in the predictable algorithm that made it never branch.

from randomx.

commented on July 23, 2024

How is "runtime per VMB" calculated?

from randomx.

commented on July 23, 2024

Ryzen 2200G

	HW branches	mispred.	runtime	VM branches (VMB)	mispred. per VMB	runtime per VMB
branch_always	1185262	139	2909838	111110	0.1%	126%
branch_predictably	1262032	5386	2825920	134598	4.0%	101%
branch_randomly	1304653	55902	2885222	139093	40.2%	100%
branch_mixed	1364016	13485	2854595	131967	10.2%	104%

from randomx.

tevador commented on July 23, 2024

@fuwa0529 It's the runtime divided by VMB scaled so that branch_randomly = 100%.

For you, it's ~126%, ~101%, 100% and ~104%. Strangely, random is the fastest.

from randomx.

commented on July 23, 2024

I did notice some variation of the result, yet I was being lazy and only ran each test once ...

from randomx.

Branch prediction about randomx HOT 7 CLOSED

Comments (7)

Intel Ivy Bridge

AMD Ryzen

Summary

Ryzen 2200G

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent