RPi3 elfloader bug with dcache enabled

Hi all! I’ve been spending the last few nights getting seL4 to boot on my Raspberry Pi 3, and picking up the old threads on why a u-boot patch to disable dcache is needed:

This forum post is a progress report on what I’m doing, for the sake of sharing - I’m largely treating this as a fun learning exercise, but if the bug gets squashed too then so much the better.

So far I have added printf debugging statements to elfloader to understand more about the problem. (I also attempted adding a call to arm_disable_dcaches before the kernel was loaded. Also I’ve read lots of assembler and compared elfloader to u-boot code.) My results are intermittent:

  1. Most often, I have seen output stop somewhere during the call to leave_hyp() - this is a hang rather than a crash
  2. Occasionally I get “unknown instruction” with a dump of the registers, somewhere after “Enabling MMU and paging” but before the end of that function call - I think this happened more often after adding the call to arm_disable_dcaches.
  3. Very occasionally (once or twice) I actually got a successful boot (with various modifications made to elfloader)! But this did not work consistently, which is interesting, and makes me think that I still don’t understand what’s happening.

Having reproduced the problem, I’m working on improving my test-debug cycle time and getting more information: last night I got boot.scr working so that I didn’t have to type in boot commands, and disassembled the binary with objdump to help understand the crash info better. Debugging with gdb might be the next milestone.

Any advice welcome! I’ll post any progress updates here.

1 Like

I’ve been working on getting gdb working via OpenOCD and a JTAG interface. While working on that, I tried building u-boot and seL4 with an aarch64 toolchain (as I saw in the release notes this should now be possible).

The intermittent elfloader bug does not seem to appear so far with everything built in 64-bit mode - this is a stock u-boot v2020.04 and seL4_tools rev a5de0e5. The other change is that I’m building with a Debian toolchain on my host machine rather than inside the docker containers. I would be interested to hear whether others see the same thing.

Great to see someone working on this! I’m interested to see what you find.

How to locate the unknown instruction error. I have meet the same problem when I try to port the kernel to rockpro platform. I cannot use the objdump to find the output of rockpro compile setting.
The error instruction is at the very begining of booting process, just after the memory has scanned.