On systems with Intel SPS MRC cache does not get used and memory retraining gets force

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi All, <a class="user-mention notranslate" data-hovercard-type="use

FSP 2.0 skips MRC cache and forces MRC training on Intel SPS systems about fsp HOT 19 CLOSED

intel commented on July 22, 2024

FSP 2.0 skips MRC cache and forces MRC training on Intel SPS systems

from fsp.

Comments (19)

nate-desimone commented on July 22, 2024

@c0d3z3r0, @PatrickRudolph:

It appears this is happening during the CPU Replacement Check. The CPU replacement check is a function in the ME firmware specifically for LGA socket (non-soldered down) motherboards that enables the FSP to detect if the user has replaced the CPU in the motherboard with a new one. If a new CPU is found, then the MRC training needs to be redone since there is some part-to-part variation in physical silicon characteristics.

I looked through the KBL FSP source code, and found the following snippet of code:

if (MeTypeIsSps ()) { // // SPS firmware does not support CPU replaced detection // *ForceFullTraining = TRUE; return EFI_SUCCESS; }

Hence, SPS firmware does not implement the CPU Replacement Check feature. Since it is not possible for the MRC to determine if the CPU is the same, it is forced to run the full training every time. So it appears this behavior is expected when using the SPS firmware.

from fsp.

Th3Fanbus commented on July 22, 2024

Hi Nate,
I wonder: what would happen if this were to be replaced by a no-op? If the CPU has actually been replaced, I guess the old timings could not work properly, in which case a full reset and retrain should flush them, I guess?
Or maybe it could be made configurable via an UPD.

from fsp.

nate-desimone commented on July 22, 2024

Hi @Th3Fanbus, the thing we are concerned about is the case of it kinda/sorta working, but not being the best training data. This may result in a performance degradation that would make the Intel processor appear to be slower than it actually is. Performance degradation has the potential to impact Intel's brand perception. For this reason we take the conservative approach of re-running training.

Its not actually not a bad idea to re-run training about once a year anyway. As the CPU ages, salt leeches into the silicon and alters its physical characteristics (for the worse.) Usually some other component of the computer breaks before the CPU... but like everything CPUs don't last forever. Re-running training will help mitigate the natural aging process. Since reliability is more important than boot time on server platforms we made this design decision.

from fsp.

c0d3z3r0 commented on July 22, 2024

@nate-desimone Couldn't that problem be solved by simply implementing the CPU replacement check in SPS firmware?

from fsp.

nate-desimone commented on July 22, 2024

@c0d3z3r0 Yes that would absolutely solve it. The SPS firmware team decided against implementing that feature for reasons unknown to me. I don't know who to ask as I don't know anyone from the SPS firmware team. Most of my work thus far has been on client platforms, which have a completely different ME firmware implementation and a different team.

from fsp.

c0d3z3r0 commented on July 22, 2024

Two suggestions from my side, if there is now way of talking to the SPS team:

document it
add a upd to force-ignore SPS

from fsp.

c0d3z3r0 commented on July 22, 2024

oh well, or just finally make FSP open-source, as Intel promised...

from fsp.

Th3Fanbus commented on July 22, 2024

Hi @nate-desimone,
I understand that, given the stringent requirements of server environments, a more conservative approach was chosen for them. In addition, servers seldom need to be rebooted, so longer boot times due to memory retraining are not a problem. However, in a workstation with a server mainboard, longer boot delays significantly degrade user experience. Therefore, it would be reasonable to make this behavior configurable.

Although I am not a lawyer, I believe that the FSP license does not allow modifying a FSP binary so that it does not forcefully retrain on every boot. In any case, it would not be an ideal solution. Moreover, given Skylake's age, I would not expect any feature updates for its SPS firmware: adding a mere CPU replacement check is not worth the costs and risks of rolling an update of such a highly privileged piece of software.

Considering that, other options to avoid training delays I could think of:

Read the flash chip of another Skylake board with a regular ME firmware. Then hope that ME firmware works, because it likely contains data specific to the donor board that is incompatible with the recipient board.
Extract a "clean slate" ME firmware from a firmware updater for another mainboard. Of course, provided that the license terms of that firmware updater allows doing so, which is usually not the case.
Just add a new FSP-M UPD to control this behavior, which could default to the current behavior for compatibility purposes.

I would say the latter proposal is a reasonably simple enhancement to ask for. What do you think?

from fsp.

n-huber commented on July 22, 2024

@nate-desimone, I just had this thought: For boards that are not designed for CPU hotplugging, we can be reasonably sure that the CPU wasn't changed if we boot from S5 (not G3). Wouldn't this be something that could easily be implemented in FSP? i.e. extend the warm-boot behaviour to boots from S5?

from fsp.

nate-desimone commented on July 22, 2024

Hi All,

@Th3Fanbus - With regard to workstation platforms... we encourage OEMs to use the regular ME firmware on workstations for this exact reason. If you have SPS firmware flashed originally by your OEM, then your PCH is actually fused for SPS and won't run the regular ME firmware even if you were to load the regular ME binary on to the flash.

@n-huber - This is actually a check for a cold swap of the CPU. We run the "fast" memory training flow even on a cold boot. CPU hotplug is not supported on this platform at all.

from fsp.

c0d3z3r0 commented on July 22, 2024

If you have SPS firmware flashed originally by your OEM, then your PCH is actually fused for SPS and won't run the regular ME firmware even if you were to load the regular ME binary on to the flash.

huh? are those SPS fuses documented anywhere? My machine runs ME fine, even though being shipped with SPS

from fsp.

nate-desimone commented on July 22, 2024

You are right on this chipset that will work. On other chipsets there are issues.

from fsp.

c0d3z3r0 commented on July 22, 2024

are those SPS fuses documented anywhere?

from fsp.

JayTalbott commented on July 22, 2024

I'm using CFL. On the CFL-H CRB, I replaced the original CPU that came with the CRB with one that is the same SKU as the customer board. Would that cause retraining on every boot?

If so, once it's retrained after the change in the CPU, is there way to inform the ME that the memory has been retrained so that it doesn't cause the retraining every time unnecessarily?

Would building a new IFWI with the latest ME kit (instead of just stitching SBL into the original IFWI extracted from the CRB) so that you get a clean/fresh ME image with no knowledge of the original CPU solve this problem?

Thanks!

from fsp.

Th3Fanbus commented on July 22, 2024

Hi Jay,

I'm using CFL. On the CFL-H CRB, I replaced the original CPU that came with the CRB with one that is the same SKU as the customer board. Would that cause retraining on every boot?

As I understand the CpuReplacementCheck in MRC, it should only force full training once.

If so, once it's retrained after the change in the CPU, is there way to inform the ME that the memory has been retrained so that it doesn't cause the retraining every time unnecessarily?

The ME should record which CPU is currently installed somewhere within the ME region. Of course, if you flash a firmware image whose ME firmware thinks the currently-installed CPU is the old one, then the CpuReplacementCheck would trigger again.

Would building a new IFWI with the latest ME kit (instead of just stitching SBL into the original IFWI extracted from the CRB) so that you get a clean/fresh ME image with no knowledge of the original CPU solve this problem?

It should work, yes. Something that would also work for a single board is to extract the IFWI after booting with the new CPU, and use it when stitching.

Thanks!

Hope this helps!

from fsp.

nate-desimone commented on July 22, 2024

Hi @JayTalbott, I agree with @Th3Fanbus that is is probably a good idea to re-stitch the ME. Technically, ME can cause the full training to happen under the following circumstances:

The CPU was cold-swapped since the last boot
The ME encountered an error while checking for CPU Replacement (This could happen if the ME is an older version and does not have all the newest CPUIDs)
The ME is in recovery mode or otherwise "out to lunch", the FSP reaches its time out period and assumes that no response means run a full training for safety.
The SPS variant of ME is being used (which does not implement the CpuReplacementCheck)

from fsp.

JayTalbott commented on July 22, 2024

I'm currently using the original ME that came on the CFL-H CRB.

I restored the original BIOS image that was on the CRB when I first received it, and it does the same thing - always retrains on every boot - with the different CPU.

I will try rebuilding the IFWI with a more recent ME kit.

from fsp.

nate-desimone commented on July 22, 2024

My guess is that ME image was built before support for the newer SKUs was added. Re-stitching with a new ME kit will probably help.

from fsp.

JayTalbott commented on July 22, 2024

New ME version solved the problem.

Thanks everybody!

from fsp.

FSP 2.0 skips MRC cache and forces MRC training on Intel SPS systems about fsp HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent