Further to #33, I decided to do a bit of digging to see if I could figure out where that async abort was coming from.
Let's start with the some details:
- element14 i.MX6Q hardware
- U-Boot 2016.03-20505-g7e2d42d (latest Boundary Devices version, but the same happens with u-boot master)
- Current sel4test master, kernel b25b826 and elfloader seL4/seL4_tools@79251ae
And to recap, here's the async abort on return to user space (c.f. sel4test-driver's virt_entry) I'm seeing with an unmodified debug build:
=> fatload mmc 1:1 $loadaddr sel4test-driver-image-arm-imx6 && bootelf $loadaddr
reading sel4test-driver-image-arm-imx6
3065572 bytes read in 165 ms (17.7 MiB/s)
## Starting application at 0x20000100 ...
ELF-loader started on CPU: ARM Ltd. Cortex-A9 r2p10
paddr=[20000100..202f041f]
ELF-loading image 'kernel'
paddr=[10000000..10037fff]
vaddr=[e0000000..e0037fff]
virt_entry=e0000000
ELF-loading image 'sel4test-driver'
paddr=[10038000..103fafff]
vaddr=[10000..3d2fff]
virt_entry=1de7c
Enabling MMU and paging
Jumping to kernel-image entry point...
DIDRv: 3, armv 70, coproc baseline only? no.
CPU is in secure mode. Enabling debugging in secure user mode.
Bootstrapping kernel
Caught cap fault in send phase at address 0x0
while trying to handle:
vm fault on data at address 0x7b63f3a2 with status 0x1c06
in thread 0xffdfad00 "rootserver" at address 0x1de7c
Okay, so let's turn on async aborts everywhere and see what blows up. Step 1: Add cpsie a
to the kernel's _start
and see what happens:
Bootstrapping kernel
KERNEL DATA ABORT!
Faulting instruction: 0xe0000338
FAR: 0xeb25bbaa DFSR: 0x1c06
halting...
Nice! Working backwards just to make sure the abort didn't exist earlier in the boot chain, I also initialised CPSR and turned on async abort interrupts in elfloader's _start
. Same error. Nice!
What's the faulting instruction?
e0000300 <initL2Cache>:
e0000300: e30330ff movw r3, #12543 ; 0x30ff
e0000304: e3a00000 mov r0, #0
e0000308: e34f3ff0 movt r3, #65520 ; 0xfff0
e000030c: e52de004 push {lr} ; (str lr, [sp, #-4]!)
e0000310: e1a02003 mov r2, r3
e0000314: e3001121 movw r1, #289 ; 0x121
e0000318: e3430c07 movt r0, #15367 ; 0x3c07
e000031c: e3a0e203 mov lr, #805306368 ; 0x30000000
e0000320: e30fcfff movw ip, #65535 ; 0xffff
e0000324: e5830005 str r0, [r3, #5]
e0000328: e5831009 str r1, [r3, #9]
e000032c: e583100d str r1, [r3, #13]
e0000330: e583ee61 str lr, [r3, #3681] ; 0xe61
e0000334: e583c67d str ip, [r3, #1661] ; 0x67d
e0000338: e592367d ldr r3, [r2, #1661] ; 0x67d /* !!! */
e000033c: e6ff3073 uxth r3, r3
e0000340: e3530000 cmp r3, #0
e0000344: 1afffffb bne e0000338 <initL2Cache+0x38>
...
Fine, I'll be a jerk about it and disable the L2 cache. On the off chance it doesn't blow up elsewhere, we'll have narrowed things down at least.
Lo and behold!
Bootstrapping kernel
... 8< ... lots of test output ... 8< ...
218/222 tests passed.
*** FAILURES DETECTED ***
Looks like the imx6 platform doesn't have its own initL2Cache
function, so I'll have to start digging into src/arch/arm/machine/l2c_310.c
. (So far I'm a wee bit flummoxed by the difference between what's in kernel_final.c
versus kernel_final.s
.)
Given it's an async/imprecise abort, it's not necessarily that instruction that caused the problem, but at least we know something's going awry with the L2 cache. I might see what happens if I tell u-boot not to initialise or use it…