seL4_GetIPCBuffer causing Cap Fault

chunky125 · July 28, 2022, 7:19am

Hello,

I’ve been tinkering with seL4 and some rust crates (selfe mostly). I’ve got everything compiling with the mainline seL4 git, rather than a fork with some modifications, and now I’m trying to run a simple example program using selfe-sys, selfe-arc, etc.

When I run the example program, I get a Cap Fault:

Caught cap fault in send phase at address 0
while trying to handle:
vm fault on data at address 0x10 with status 0x93c08006
in thread 0x807f00a400 "rootserver" at address 0x406084
With stack:
0x63ffc0: 0x0
0x63ffc8: 0x4007d0
0x63ffd0: 0x0
0x63ffd8: 0x641000
0x63ffe0: 0x642000
0x63ffe8: 0x642000
0x63fff0: 0x40076c
0x63fff8: 0x0
0x640000: 0x642000
0x640008: 0x1
0x640010: 0x0
0x640018: 0x0
0x640020: 0x0
0x640028: 0x0
0x640030: 0x0
0x640038: 0x0

which I’ve located to seL4_GetIPCBuffer:

0000000000406078 <seL4_GetIPCBuffer>:
  406078:       d53bd040        mrs     x0, tpidr_el0
  40607c:       91400000        add     x0, x0, #0x0, lsl #12
  406080:       91004000        add     x0, x0, #0x10
  406084:       f9400000        ldr     x0, [x0]
  406088:       d65f03c0        ret

When I break at entry to seL4_GetIPCBuffer, tpidr_el0 is 0x0. Looking at the commit history on seL4, it seems to be hinting that I need to set the IPC buffer before I can get it. If I do that, I get a similar Cap Fault.

If I compile this with the previous version of this rust crate which uses a fork of the seL4 and associated tools, I find that the program compiles and runs fine, but there are some differences in the code for seL4_GetIPCBuffer

Any thoughts?

Thanks,

Chris

kent-mcleod2 · July 28, 2022, 8:18am

seL4_GetIPCBuffer is expecting that a thread-local-storage memory has been setup and that the tpidr_el0 register has been initialized to point to that region of memory. If you were setting up your initial task using the sel4runtime library, it this would be initialized for you between the binary entry point (_sel4_start) and your program’s main function: sel4runtime/env.c at master · seL4/sel4runtime · GitHub.

If you just want to get started in a single threaded environment, then you could make a small change to libsel4 and remove the _thread modifier on the __sel4_ipc_buffer declaration. With this change seL4_SetIPCBuffer should work and then syscalls will work too.

chunky125 · July 28, 2022, 8:24am

This is a great explanation and ties in exactly with what the commit note was saying. I was looking in libsel4 for the initiation of the variable and not the runtime.

I think that the most sensible thing to do is make sure that the rust runtime is initialising this variable, either by including the sel4 runtime or expanding it’s scope now.

kent-mcleod2 · July 28, 2022, 8:28am

I’m not too familiar with selfe-sys or selfe-arc, but using a TLS variable isn’t strictly required by seL4. the C libsel4 chooses this policy because it saves having to pass a reference to an IPC buffer into every syscall wrapper, but it wouldn’t be unreasonable for a different library to prefer to take a different approach. If the selfe-sys library is calling libsel4 via FFI, then I guess you would have to make sure something sets up the TLS and then the IPC buffer variable.

Good luck!

chunky125 · July 28, 2022, 9:49am

I’m not that familiar with selfe-* or seL4 yet. selfe is all bit spread out around 2/3 crates and I feel that it would make sense to add some structure as well as trying to keep it aligned to mainline seL4 rather than the companies fork.

I’ll do some reading and digging around sel4_runtime and see how best this sits within rust. Hopefully I can put together something coherent that others can get some value from, but equally if it gives me something to do while I wait for the baby to go to sleep, I’m happy!

gerwin.klein · July 28, 2022, 10:04am

You might have seen that already, but if your’e interested in Rust on seL4, I should point out that there is a current discussion on what we can/should do to support that from the seL4 foundation perspective over here.

chunky125 · July 30, 2022, 7:36pm

Thanks, as I say, I’m very raw to all of this and I’m just playing while the baby sleeps (or I try to get the baby to sleep). Ultimately I’d just like to play around with rust/sel4 and a raspberry pi.

I had a quick read of the other thread, it all rings true for me as well, the rust bindings try to automate the FFI interface and this makes them less stable. I’ve also seen the abstraction approach which falls over because the low level is trying to script.

I’m going to try and chug away at this and see if I can get to a stage where I have enough knowledge to contribute something of value.

chunky125 · August 8, 2022, 8:38pm

Just trying to understand something here, if I follow the execution path through sel4_runtime I can see that there is a static, thread local structure which is the thread environment, and that this is zeroed, then initialised with data passed from sel4.

What I can’t see in sel4runtime is where tpidr_el0 is initialised.

If I try to replicate the code from sel4runtime in rust, I find that my code falls over because tpidr_el0 is uninitialised. If I make the static structure not thread local, then it runs fine, but that’s not really the point.

So, again, in short form, my question is when is tpidr_el0 initialised, and by whom? It looks like if I apply [thread_local] the compiler optimises the location to tpidr_el0 (makes complete sense to me since it’s what I want), but doesn’t actually set the value of tpidr_el0.

I can see that the static structure isn’t visible in the .tbss segment of the ELF file, I’m clearly missing something here so I’m going to do some more reading. If anyone feels like taking pity on me, I’m all ears!

kent-mcleod2 · August 8, 2022, 9:41pm

It is initialized by sel4runtime as part of __sel4runtime_load_env():

try_init_static_tls() is called by __sel4runtime_load_env()
sel4runtime_move_initial_tls() is called by try_init_static_tls()
sel4runtime_set_tls_base() is called by sel4runtime_move_initial_tls()
on aarch64, sel4runtime_write_tpidr_el0() is called by sel4runtime_set_tls_base()
https://github.com/seL4/sel4runtime/blob/master/include/sel4_arch/aarch64/sel4runtime/thread_arch.h#L20

chunky125 · August 9, 2022, 5:11am

Thanks,

That’s what I thought was the process.

when I try to clear out my static structure in the equivalent of try_init_static_tls I can see that the compiler is trying to use tpidr_el0 for the base address of the rust equivalent of env.

However, tpidr_el0 is 0x0 and when I try to find address of env that appears to be null.

As I say. I just don’t understand how the compiler is using tpidr_el0 with the static, thread local structure and why it seems to be setting address of the structure to 0x0.

If I remove #[thread_local] from the structure it compiles and runs fine but isn’t using tpidr_el0.

kent-mcleod2 · August 9, 2022, 6:06am

Removing the #[thread_local] from your version of static_tls is the right way to go. You can’t use a symbol that’s allocated in .tbss as the backing storage for a thread’s .tbss section as it would create a circular dependency. Instead the static_tls is allocated as a global, ending up in .bss. Then the address of static_tls is stored in tpidr_el0 and will be used as the storage for any TLS variables used by the main thread.

chunky125 · August 12, 2022, 8:24pm

Thanks again, your comments make complete sense and a very clear explanation so I continue to appreciate your patience.

I’ve now got things compiling and running by compiling selfe-sys (bindgen bindings of libsel4) and selfe-start (very simple runtime), together with code I’ve ported from sel4-runtime into rust. I’ve combined these into a single crate (sel-claw) since I don’t see how you’d use one without the other.

I’m now working through all of this to make it more rust focused and get unsafe code concentrated into the init code. Once I’ve done this, I’ll try to expand the scope of sel-claw to include the rust-aligned interfaces defined in “ferros”, since I see that already my expanded runtime is starting to tread on ferros’ toes.

I have no idea if this will be useful to others, and my code will definitely not be up to much in rust, as I’m pretty new to rust and sel4, but I’ll motor away and maybe someone will see a nugget or two that benefits them.

One question - is there any reason why selfe-sys would regenerate the bindings to libsel4 at compile time (other than x86 vs aarch64 vs aarch32 vs x86_64) given that libsel4’s C API should be fixed?

gerwin.klein · August 14, 2022, 12:01am

No worries!

Main reason is that the C API makes use of a lot of conditional compilation options for different seL4 build configurations that don’t map nicely to Rust, i.e. there is not one fixed C API. The discussion in this thread has more details on that and what the plan is to make this a bit less painful in Rust.

(The C API is itself also generated, but that is not the real reason)

Topic		Replies	Views
Rust call cause "vm fault on data at address" New to seL4 arm	14	132	September 12, 2024
Pre-RFC: To effortlessly develop seL4 systems in rust RFC discussion	73	2263	December 9, 2022
Kernel crash on x86 (but not on x86_64) seL4 kernel	10	396	August 24, 2021
Procedural generation of the seL4 API seL4 kernel	8	1757	February 26, 2020
invalidateLocalTLB() is getting stucked in sel4 kerenl New to seL4 arm	0	31	September 1, 2024

seL4_GetIPCBuffer causing Cap Fault

Related topics