Comments (12)
Hi, I have tried with kernel_irq_handler u mentioned last time, and that works fine. Currently, I have been running bench several times with ur new code, and so far no screen freeze happens. I will let u know the result after several days using(Since it used to be 'sometimes' freeze). Thanks for ur updating !
from sgx-step.
Hi Neo!
Nice to hear you are experimenting with the single stepping -- the problem you describe, sounds indeed like a known infamous issue :/ Unfortunately, I also experienced that somehow sometimes for some yet-unknown reason the system crashes with a complete freeze and you have to reboot the machine. However, in my case, this is relatively infrequent that I have to reboot and I am able to do single-stepping of enclaves with several millions of instructions without crashes ^^
I have a few suspicions of what the bug causing these crashes might be, but have to investigate further at some point.. I think it's related to some race condition between the user-space code and the kernel, so from experience it really helps to use the isolcpus
kernel option and CPU pinning with the claim_cpu()
function. What also helps is to disable the NMI interrupts with the kernel option nmi_watchdog=0
.
Hope this helps! It's already great that you write that it works for you perfectly when it doesn't crash: that means at least that your setup is correct! Not sure what you mean with:
Besides, once I load the sgx-step kernel, System always warn me that System program problem detected.
I think Ubuntu systems might sometimes show such a GUI warning, but not sure why it's triggered or why it's relevant. If you can provide more details, that would be helpful. If you suspect this is a kernel problem, then check and provide the output of dmesg | tail
after loading the driver? Also make sure to pass iomem=relaxed no_timer_check
to the kernel as described in the README to suppress some warnings:)
from sgx-step.
from sgx-step.
Hi Neo,
Thanks for your reply. I tried enabling HT, the system still crashes but less frequently.
Shujie
from sgx-step.
Hi Shujie,
Too bad you run into this issue. HT should normally not interfere too much with single-stepping--I'd even expect things work more stable w/o HT. Important however is to affinitize the victim CPU 1 with the isolcpus=1
Linux kernel param, as described in the README. You can check the kernel params with dmesg
or cat /proc/cmdline
The error messages you posted seem to indicate something is going wrong with the page-table remapping. Linux may complain when it detects the user-space tampering with PTEs. What kernel version are you using as specified by uname -a
?
What do you mean exactly with:
Everytime after testing bench, idt and cpl, the system crashes and I have to reboot my machine
Do you mean the example first works and produces expected outputs, and after that the system crashes? Or does it not work at all? In the first case, it might be related to tear down.
from sgx-step.
Hi Shujie,
Too bad you run into this issue. HT should normally not interfere too much with single-stepping--I'd even expect things work more stable w/o HT. Important however is to affinitize the victim CPU 1 with the
isolcpus=1
Linux kernel param, as described in the README. You can check the kernel params withdmesg
orcat /proc/cmdline
I did that. The system is configured with all the parameters mentioned in README.
The error messages you posted seem to indicate something is going wrong with the page-table remapping. Linux may complain when it detects the user-space tampering with PTEs. What kernel version are you using as specified by
uname -a
?
Sorry, the error messages I posted can be ignored. It is shown only when testing foreshadow, and the system doesn't crash after testing foreshadow.
What do you mean exactly with:
Everytime after testing bench, idt and cpl, the system crashes and I have to reboot my machine
Do you mean the example first works and produces expected outputs, and after that the system crashes? Or does it not work at all? In the first case, it might be related to tear down.
Both bench and idt work and produce expected outputs, but the system crashes after that.
from sgx-step.
okay so the fact that it works and produces expected outputs is great, but seems to indicates there's an issue with the teardown. As I mentioned above, I think it might have something to do with a race condition or unexpected interrupt point between the kernel and libsgxstep when configuring privilege levels and call gates..
Unfortunately, not sure where the bug would be and how to fix it. Some suggestions:
- Which kernel version are you using? It could be that the GDT/IDT vectors being overwritten are already in use by the kernel, and the current code does not properly backup/restore them (I'm aware of that but didn't yet find time to do that more properly and just hacked in some unused vectors on my kernel). Vectors can be changed here and you should be able to inspect gdt and idt via the
dump_gdt
/dump_idt
functions, eg by modifying app/idt app/cpl to only print w/o modifications after a fresh reboot.
jo@breuer:~$ uname -a
Linux breuer 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
-
There's config switches in app/idt and app/cpl to only do a subset of things (eg only sw IRQs w/o timer or only IRQ gates and no call gates). Try toggling these to further narrow down which functionality exactly causes the crash?
-
there's a
USER_IDT_ENABLE
switch that you can disable here, which is used by app/bench and may fix the problem by not relying on custom irq gates and falling back to the "old" approach of directly hooking the existing Linux APIC timer handler. This option apparently broke with some recent changes, but I'll push a commit to fix this option again in case it might help you -
Try configuring SGX-Step without user-space interrupt handlers.
from sgx-step.
I fixed the USER_IDT_ENABLE=0 option for app/bench in the latest commit. This works stable on my machine and allows to do single-stepping with minimal intervention in the kernel data structures (ie without having to change IDT or GDT entries), which may fix your issue(?)
This commit also fixes a possible issue where the APIC timer vector got improperly restored (zeroed) that you can run into in corner cases (when restoring the APIC without having reconfigured it). I don't think this is the root cause of the troubles in this thread, but it might help ^^
from sgx-step.
Unluckily, it doesn't help.
But enabling HT really helps.
from sgx-step.
so just thought of one more thing that you might try: it could be that the issue arises from interrupting the user-space handler code and the kernel not expecting that somehow.. This is possible as the user-spcae handlers run as a trap gate allowing to be interrupted (as they otherwise don't properly restore interrupts on user-space iret
).
You might want to try replacing the install_user_irq_handler
with install_kernel_irq_handler
so the handler will run as a proper interrupt gate with ring-0 privileges and without being interrupted. See for instance app/cpl
for an example of install_kernel_irq_handler
and make sure to disable SMAP/SMEP for this to work(!)
Hope it helps, let me know if you see any improvements or not ^^
from sgx-step.
So I've worked a bit more on this issue. Currently, the most likely hypothesis is that sometimes (i.e., infrequently that explains that the issue only sometimes arises and depends on the target system configuration) the kernel may interrupt the user-space application after a timer IRQ has been scheduled and before the timer has fired. Consequently, the CPU may raise a #GP exception when attempting to vector to our user-space ring3 timer IRQ handler while currently executing in ring0. As follows:
- SGX-Step schedules an APIC timer IRQ in user space
- CPU may switch into kernel space for some reason
- External timer IRQ arrives in kernel space
- CPU locates the handler for the timer IRQ in the IDT and finds the
handler user-space code segment with index 0x6 in the GDT - When attempting to load the user-mode segment selector, the CPU
detects a privilege level violation and generates a #GP - The kernel doesn't expect a #GP for a timer IRQ and crashes
The relevant section is in Intel SDM "6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures":
The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a
less privileged code segment (numerically greater privilege level) than the CPL.
An attempt to violate this rule results in a general-protection exception (#GP).
I was aware that this scenario is exotic and def not recommended in non-adversarial deployments, but was not aware that there seems simply to be no way to allow this by the processor apparently.
So I did some coding and managed to reproduce the above hypothesis in the updated app/idt
program on the irq_cpl branch:
https://github.com/jovanbulck/sgx-step/tree/irq_cpl
For reference, the following matrix summarizes whether code with privilege level my_cpl can be interrupted by a handler with privilege level irq_cpl.
my_cpl \ irq_cpl | 0 | 3 |
---|---|---|
0 | OK | FAIL |
3 | OK | OK |
All the OK entries go smooth without any problems on my machine, but for the FAIL I get an immediate system freeze an I have to reboot the machine. So the solution seems to be to simply never use ring-3 IRQ handlers so that things keep working, even if the processor would be in kernel mode somehow. (The original motivation for ring3 handlers was to avoid a privilege switch in the interrupt path and improve Nemesis IRQ latency measurements, but I expect that an added CPL switch will not significantly affect Nemesis). The new code sets the IRQ gate DPL and segment to the kernel and adds some custom asm in the handler to directly set the APIC end-of-interrupt register and return w/o clobbering registers.
I pushed some preliminary commits to the irq_cpl branch and also updated app/bench
on that branch to make use of the new ring-0 handlers. For me the new code seems to run very stable now and I haven't encountered any #GP so far! So I hope this may have pinpointed the problem and the app/idt
and app/bench
code on the irq_cpl branch also works for you?
I'd be curious to hear your experiences! If things work out to be stable, I'll later merge the new code to master after doing some more refactoring and duplicate code removal etc ^^
from sgx-step.
Fixed and merged to master in #31
from sgx-step.
Related Issues (20)
- Hyperlink of the approach to bypass devmem_is_allowed checks is unavailable HOT 3
- ./app: undefined symbol: sgx_get_aep HOT 8
- undefined symbol : sgx_get_aep HOT 10
- kernel panics when single-stepping [SOLVED: KPTI #PF for kernel IRQ] HOT 12
- error when running bench: [file.c] assertion '(f = fopen(path, "w"))' failed: No such file or directory HOT 6
- Work-in-progress Gramine port HOT 17
- Could add some explanation for each test application under app/ to README? HOT 1
- /dev/sgx-step would be uninstalled after os reboot HOT 2
- foreshadow/lvi building error , memcmp running error HOT 2
- os would always hang after running cpl/idt/memcmp HOT 8
- Refactor: page-fault abstraction in libsgxstep
- Could sgx-step support SGX in-kernel/dcap driver? HOT 2
- victim.base && "no enclave found in /proc/self/maps HOT 4
- Support multithreaded enclaves
- ./install_SGX_SDK.sh can't find python2 HOT 3
- Trying to run app/memcmp but gives assertion error HOT 5
- Questions regarding the use of unmap_alias and sim_reload HOT 3
- Questions regarding fs_reload_threshold in foreshadow HOT 1
- Refactor build system
- More questions regarding unmap_alias() and leaking data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sgx-step.