Giter Club home page Giter Club logo

Comments (7)

siavashzk avatar siavashzk commented on July 21, 2024

Hi,

Yes, as of now, Scarab does not have a knob for adjusting icache latency.

If you want to add additional cycles after resolving a branch misprediction, you should use EXTRA_RECOVERY_CYCLES. Or, do you need to model anything more than that? You can also use FETCH_TAKEN_BUBBLE_CYCLES to insert bubbles after any predicted taken branch (FETCH_BREAK_ON_TAKEN must be set).

EXTRA_REDIRECT_CYCLES has a somewhat confusing name. It actually controls the number of cycles needed to restart the frontend after resolving a branch BTB miss. It is probably not what you are looking for as a proxy for icache latency.

from scarab.

hlitz avatar hlitz commented on July 21, 2024

Hi Siavash,
thank you. If I understand correctly Scarab implements a decoupled frontend by inserting a buffer after decode and before scheduling.

On a correctly predicted taken branch, I would assume the BP can do another prediction in the next cycle but the icache latency would still apply. Does FETCH_TAKEN_BUBBLE_CYCLES introduce bubbles on the BP (ie. no branch prediction for the next cycles) or does it only introduce bubbles after branch prediction (ie. make the decode take longer?)?

from scarab.

siavashzk avatar siavashzk commented on July 21, 2024

Actually, Scarab does not implement a decoupled frontend. The frontend is a simple pipeline. There's an icache stage that is responsible for both branch prediction and icache access. On an icache hit, it sends a packet of instructions to the next stage in one cycle. Then, there's a few cycles of decode and rename (decode_stage and map_stage in scarab src) before insertion into the reservation station(s). Decode and rename (map) are straightforward pipelines.

To get a sense of how uops move in the pipeline, you can run in debug mode with "--debug_model 1" for a visualization of uops in the pipeline at every cycle. (I would only do a short run in this mode since the logs can get huge)

Since icache and branch prediction are not decoupled, FETCH_TAKEN_BUBBLE_CYCLES affects both. That is, after a taken branch, there will be a few bubbles with no prediction and icache access.

from scarab.

spruett avatar spruett commented on July 21, 2024

For more details on how to run debug mode to visualize the pipeline, see the answer to another question on Using the debug flag in Scarab #43

from scarab.

hlitz avatar hlitz commented on July 21, 2024

@siavashzk Thanks for your info. Let me know if you agree with the following.
In a modern frontend, the BP can predict every cycle, even if a branch is taken. The BHT/BTB enable predictions without waiting for the icache. As a result, it is reasonable to model the icache with latency=0 as we can hide/pipeline its latency. However, on a mispredict, we have to wait for the icache and hence we should set EXTRA_RECOVERY_CYCLES to e.g. 3.
FETCH_TAKEN_BUBBLE_CYCLES should not be required if we assume that we can predict every cycle (even after a taken branch).

Secondly, let's think about what is required to model a decoupled frontend. I think it does two things: 1) Hiding icache latency by overlapping icache reads in the presence of taken branches 2) Prefetch icache entries by running ahead even further than the 3 cycles required to hide icache latency.

We achieve 1) by setting icache latency to zero and by setting EXTRA_RECOVERY_CYCLES to capture the cost of a branch mispredict. Unfortunately, EXTRA_RECOVERY_CYCLES is dynamic as it depends on whether the icache line is in L1/L2/DRAM. If you have an idea on how to dynamically set EXTRA_RECOVERY_CYCLES based on this let me know.
2) we could achieve by emitting a prefetch of the target line whenever a branch is predicted taken

Let me know what you think

from scarab.

siavashzk avatar siavashzk commented on July 21, 2024

I think what you’re suggesting in the first paragraph is a reasonable implementation of a state of the art front end. i.e., no bubbles after taken branches, EXTRA_RECOVERY_CYCLES as a proxy for restarting the frontend pipeline and hide the icache latency.

I can't think of a good way to approximate a decoupled branch predictor with only small tweaks to Scarab. It seems to me that the interaction will be very dynamic and benchmark dependent: for example, how far is the branch predictor running ahead of the rest of the frontend? If you want to model the icache prefetching affect of a decoupled branch predictor, you have to modify scarab to separate branch prediction and icache access. It should be doable, but may require some major overhaul in icache_stage.c.

I'm not sure if your two additions are sufficient for modeling a decoupled branch predictor.

  1. Actually, I think dynamic EXTRA_RECOVERY_CYCLES is not necessary. You'd just need to set it to Icache latency. When you resolve a misprediction, you'd first take EXTRA_RECOVERY_CYCLES to restart the frontend (which models icache latency). Then if the instruction misses in the icache, Scarab will have to fetch from L1 (LLC) or DRAM, so the dynamic part of the latency is already modelled.

  2. However, prefetching the target line during branch prediction will not do much now, because without any scarab modifications, the target will get accessed in the same/next cycle anyway. Recall that icache and branch predictor of scarab are coupled. If you want the prefetch to have a meaningful effect, you'd need to actually decouple the branch predictor so it can run ahead of the rest of the frontend.

from scarab.

siavashzk avatar siavashzk commented on July 21, 2024

Closing this issue due to inactivity. Feel free to reopen or open another issue with other questions/comments.

from scarab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.