Comments (12)
so what we are seeing here is expected. looks like on this system, the negative time drift is detected as early as 25ms, which is why the test fails only at that point, and begins to pass soon as it goes above 50ms. Unfortunately, we cant correct this anymore as there could be a perf impact.
The original issue that was fixed was for a 100% reproducible time drift issue. And now since this residual problem is quite infrequent (1 out 8) times, i think we can leave it as is for now, or may be close with a documentation.
from graphene.
TINFO: Failed to set zero latency constraint: No such file or directory
These tests also seem to rely on disabling C-states, so I'd disable them with a proper comment.
from graphene.
@anjalirai-intel: Are you planning to submit a PR with this? If not, I can do this.
from graphene.
@mkow We are waiting for reply from Anees, And then I can submit the PR
from graphene.
@mkow this ones a slightly different issue (atleast looks a bit different). may not be related to C-states, but if it is then we can close this one too. will get back soon. Assign this one to me.
from graphene.
On some systems - the RDTSC and the system clock drifts away from each other significantly, that at higher gaps (above 100ms), rdtsc falls behind, and this gets flagged by LTP. Currently the correction is done every 10seconds #define TSC_REFINE_INIT_TIMEOUT_USECS 10000000
. Reducing this timeout further could resolve this issue, but might have a perf impact.
@dimakuv - i think 10s timeout is too late for a correction. One second should be reasonable without any perf impact (although it may not help pass the test since it fails as early as 100ms itself. I remember switching to rdtsc had improved specpower benchmarks significantly - i can get this retested with 100ms and 1s timeout to see if readjusting the timeout is feasible solution?
from graphene.
@aneessahib Reducing our hard-coded timeout from 10 seconds to e.g. 1 second makes sense. Please try it out on a couple workloads, and if their performance doesn't change much, then no problem.
The original 10 second timeout value was chosen without any good justification, other than it is definitely good enough. We can decrease this constant as we see suitable.
Decreasing to 100 milliseconds sounds risky though. 100 ms sounds it can have a non-negligible impact on performance of typical workloads. But worth an experiment.
from graphene.
@mkow @dimakuv @aneessahib Can you re-open the issue? Seems like the issue is not completely fixed and we still see tests failing with woken up early message on a server but its is very intermittent. Please find attached logs for 2 of the tests where I was able to repro the issue 1 in 8/9 times. (Search for TFAIL in the logs)
In terms of priority it can definitely go down as the issue is not consistent like before.
select04_woken_up_early.txt
epoll_wait02_woken_up_early.txt
from graphene.
I generally agree with @aneessahib. Since this is not a real "bug" (it is simply that timestamp reported by Gramine may be very slightly off in comparison to the previous timestamp), and it only exhibits itself in an LTP test (and not on any real workload), we could ignore it for now. (this comment was wrong, so I mark it as cut out)
from graphene.
Hmm, but it's pretty bad that we can expose negative time delta to the application. The LTP test exposes it consistently, but normal apps may still hit it sporadically, just not as often as this specialized test.
from graphene.
@mkow Sorry, I think I got confused by the two issues. My comment about the "negative time delta" concerned this issue: gramineproject/gramine#82.
The issue here is different: here the complaint is that the time drift is observable after a tiny amount of time (25ms), and we only recalibrate the time drift with 50ms periodicity (see https://github.com/gramineproject/gramine/pull/38/files). Setting the recalibration period to less than 50ms would hurt normal workloads (e.g., Redis), so we don't want to decrease this period further.
from graphene.
Ah, I see. Then it's not very important. This test is disabled anyways in the public CI because it requires disabling C-states, so I think we can just close this issue as we don't plan to do anything more about it.
from graphene.
Related Issues (20)
- Data transmission error with Python gRPC running in graphene HOT 9
- Huge performance drop when running pytorch training with graphene-sgx HOT 26
- untrusted PAL sent PAL event HOT 10
- BUG() triggered during vfork and clone HOT 15
- With Go program, inside a docker container, bind fails with permission denied error, invalid handle error. HOT 7
- Workloads (Redis, Curl, R) failing with Out of memory PAL error after new manifest syntax to define lists of SGX trusted files. HOT 7
- Unable to Sign the graphenized Docker image using gsc sign-image: HOT 5
- RFC: Trusted files metadata sideloading
- [Error:38]Function not implemented. multiprocessing in graphene HOT 7
- How to transmit variables between SGX and untrusted environments HOT 4
- Function not implemented (src/ip.cpp:563) in testing GSC container HOT 2
- [Examples] Python Example Stuck Without Any Error Message HOT 1
- ModuleNotFoundError: No module named 'graphenelibos' HOT 5
- web server use golang, QPS(queries per second) is very low HOT 4
- File Listener Based on INOTIFY Throws Error HOT 1
- Issue with libprotobuf version. HOT 1
- Issue in Cloud Deployment to AKS HOT 3
- Decimal type prone to float rounding error. HOT 1
- pytorch sample config for better performance HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphene.