Giter Club home page Giter Club logo

Comments (12)

aneessahib avatar aneessahib commented on July 3, 2024 1

so what we are seeing here is expected. looks like on this system, the negative time drift is detected as early as 25ms, which is why the test fails only at that point, and begins to pass soon as it goes above 50ms. Unfortunately, we cant correct this anymore as there could be a perf impact.

The original issue that was fixed was for a 100% reproducible time drift issue. And now since this residual problem is quite infrequent (1 out 8) times, i think we can leave it as is for now, or may be close with a documentation.

from graphene.

mkow avatar mkow commented on July 3, 2024
TINFO: Failed to set zero latency constraint: No such file or directory

These tests also seem to rely on disabling C-states, so I'd disable them with a proper comment.

from graphene.

mkow avatar mkow commented on July 3, 2024

@anjalirai-intel: Are you planning to submit a PR with this? If not, I can do this.

from graphene.

anjalirai-intel avatar anjalirai-intel commented on July 3, 2024

@mkow We are waiting for reply from Anees, And then I can submit the PR

from graphene.

aneessahib avatar aneessahib commented on July 3, 2024

@mkow this ones a slightly different issue (atleast looks a bit different). may not be related to C-states, but if it is then we can close this one too. will get back soon. Assign this one to me.

from graphene.

aneessahib avatar aneessahib commented on July 3, 2024

On some systems - the RDTSC and the system clock drifts away from each other significantly, that at higher gaps (above 100ms), rdtsc falls behind, and this gets flagged by LTP. Currently the correction is done every 10seconds #define TSC_REFINE_INIT_TIMEOUT_USECS 10000000 . Reducing this timeout further could resolve this issue, but might have a perf impact.
@dimakuv - i think 10s timeout is too late for a correction. One second should be reasonable without any perf impact (although it may not help pass the test since it fails as early as 100ms itself. I remember switching to rdtsc had improved specpower benchmarks significantly - i can get this retested with 100ms and 1s timeout to see if readjusting the timeout is feasible solution?

from graphene.

dimakuv avatar dimakuv commented on July 3, 2024

@aneessahib Reducing our hard-coded timeout from 10 seconds to e.g. 1 second makes sense. Please try it out on a couple workloads, and if their performance doesn't change much, then no problem.

The original 10 second timeout value was chosen without any good justification, other than it is definitely good enough. We can decrease this constant as we see suitable.

Decreasing to 100 milliseconds sounds risky though. 100 ms sounds it can have a non-negligible impact on performance of typical workloads. But worth an experiment.

from graphene.

jinengandhi-intel avatar jinengandhi-intel commented on July 3, 2024

@mkow @dimakuv @aneessahib Can you re-open the issue? Seems like the issue is not completely fixed and we still see tests failing with woken up early message on a server but its is very intermittent. Please find attached logs for 2 of the tests where I was able to repro the issue 1 in 8/9 times. (Search for TFAIL in the logs)

In terms of priority it can definitely go down as the issue is not consistent like before.

select04_woken_up_early.txt
epoll_wait02_woken_up_early.txt

from graphene.

dimakuv avatar dimakuv commented on July 3, 2024

I generally agree with @aneessahib. Since this is not a real "bug" (it is simply that timestamp reported by Gramine may be very slightly off in comparison to the previous timestamp), and it only exhibits itself in an LTP test (and not on any real workload), we could ignore it for now. (this comment was wrong, so I mark it as cut out)

from graphene.

mkow avatar mkow commented on July 3, 2024

Hmm, but it's pretty bad that we can expose negative time delta to the application. The LTP test exposes it consistently, but normal apps may still hit it sporadically, just not as often as this specialized test.

from graphene.

dimakuv avatar dimakuv commented on July 3, 2024

@mkow Sorry, I think I got confused by the two issues. My comment about the "negative time delta" concerned this issue: gramineproject/gramine#82.

The issue here is different: here the complaint is that the time drift is observable after a tiny amount of time (25ms), and we only recalibrate the time drift with 50ms periodicity (see https://github.com/gramineproject/gramine/pull/38/files). Setting the recalibration period to less than 50ms would hurt normal workloads (e.g., Redis), so we don't want to decrease this period further.

from graphene.

mkow avatar mkow commented on July 3, 2024

Ah, I see. Then it's not very important. This test is disabled anyways in the public CI because it requires disabling C-states, so I think we can just close this issue as we don't plan to do anything more about it.

from graphene.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.