Giter Club home page Giter Club logo

Comments (7)

tevador avatar tevador commented on July 23, 2024

I've done some testing.

In tests/branch_prediction, there is a RandomX program translated into C. Only CALL and RET instructions are implemented, the rest is NOP. The program contains 128 CALL/RET instructions and 896 NOPs.

The program has 4 versions with different branching patterns:

  • branch_always - CALL/RET is always taken.
  • branch_predictably - One global counter is used to make random regular branching patterns for all CALL/RET instructions.
  • branch_randomly - CALL/RET is randomly taken/not taken (each instruction has a different probability). An inline LCG is used for random number generation.
  • branch_mixed - there are 2 types of CALL instructions: first type has the "predictable" branching pattern and a small number of CALL instructions have the "random" branching pattern.

The program uses the perf_event_open syscall to measure the number of hardware branch mispredictions.

> ./branch_mixed 1000000
0       1364014
1       16404
2       2807174
VM branches: 131967

Index 0 is the number of hardware branch instructions executed, index 1 is the number of mispredictions and index 3 is the runtime measured by the CPU timer.

As expected, the results vary a lot depending the CPU architecture. I have tested Intel Ivy Bridge (Core i3-3220) and AMD Ryzen (R7-1700).

Intel Ivy Bridge

  HW branches mispred. runtime VM branches (VMB) mispred. per VMB runtime per VMB
branch_always 1185261 427 3929248 111110 0.4% 112%
branch_predictably 1223308 6562 3837753 134598 4.9% 90%
branch_randomly 1271279 61055 4401908 139093 43.9% 100%
branch_mixed 1364015 19060 3942136 131967 14.4% 94%

AMD Ryzen

  HW branches mispred. runtime VM branches (VMB) mispred. per VMB runtime per VMB
branch_always 1185261 338 2739858 111110 0.3% 110%
branch_predictably 1223308 7214 2789977 134598 5.4% 92%
branch_randomly 1271278 56676 3129017 139093 40.7% 100%
branch_mixed 1364014 16766 2918477 131967 12.7% 98%

The performance difference between the predictable and random strategies is < 10%, which will become even smaller once the remaining (non-branching) instruction are taken into account.

If someone has access to a Linux machine with another CPU architecture, please test the 4 strategies and report results. Makefile is included to build the 4 tests. Run with 1000000 instructions like the example above. Beware that in some cases, the results vary widely between runs, so it's best to run multiple times and calculate the average.

Summary

Implementing any kind of branch-prediction-friendly patterns in RandomX is probably not worth the effort and risk of ASIC optimization.

from randomx.

 avatar commented on July 23, 2024

Intel Core i5-8250U CPU @ 1.60GHz, average over 10 runs

./branch_always 1000000
0	1185262
1	84
2	4956039
VM branches: 111110

./branch_predictably 1000000
0	1262030
1	2454
2	4745556
VM branches: 134598

./branch_randomly 1000000
0	1304653
1	49918
2	5590118
VM branches: 139093

./branch_mixed 1000000
0	1364016
1	8635
2	4717666
VM branches: 131967

from randomx.

tevador avatar tevador commented on July 23, 2024

@wowario Can you please repeat branch_predictably and branch_mixed with the latest commit? There was a bug in the predictable algorithm that made it never branch.

from randomx.

 avatar commented on July 23, 2024

How is "runtime per VMB" calculated?

from randomx.

 avatar commented on July 23, 2024

Ryzen 2200G

  HW branches mispred. runtime VM branches (VMB) mispred. per VMB runtime per VMB
branch_always 1185262 139 2909838 111110 0.1% 126%
branch_predictably 1262032 5386 2825920 134598 4.0% 101%
branch_randomly 1304653 55902 2885222 139093 40.2% 100%
branch_mixed 1364016 13485 2854595 131967 10.2% 104%

from randomx.

tevador avatar tevador commented on July 23, 2024

@fuwa0529 It's the runtime divided by VMB scaled so that branch_randomly = 100%.

For you, it's ~126%, ~101%, 100% and ~104%. Strangely, random is the fastest.

from randomx.

 avatar commented on July 23, 2024

I did notice some variation of the result, yet I was being lazy and only ran each test once ...

from randomx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.