UnknownSyscall Exception Handler for non-seL4 syscall API

Hello,

I was posting on the mailing list just before, not sure if this here is necessary or if I should have posted here in the first place. I am working on a Linux-like syscall handler in user-space and want to implement a trap-and-emulate like mechanism.

Firstly, some background information and motivation. I see that the seL4 system call API uses negative syscall numbers. This allows me theoretically to use positive syscall numbers to implement an additional syscall API. The seL4 kernel triggers an UnknownSyscall Exception and leads it to the fault handler
endpoint in user space of the corresponding CAmkES component.

I work with CAmkES and was able to create a fault handler similar to the one implemented in GDBMem with the help of templates. I am generally able to catch the exception in a separate thread in the same component and get the corresponding arguments. What I don’t seem to manage is redirecting the program flow so that we return to right after the faulting syscall instruction in the control thread. It generally just retriggers the same exception or a different one (the VMFault exception).

To better understand what I did, I quickly provide the code:

  • My C code: Triggers the system call number 60 with 6 arguments:

int run(void) {
while(1){
register int rax asm (“rax”) = 60;
register int rdi asm(“rdi”) = 1;
register int rsi asm(“rsi”) = 2;
register int rdx asm(“rdx”) = 3;
register int r10 asm(“r10”) = 4;
register int r8 asm(“r8”) = 5;
register int r9 asm(“r9”) = 6;
asm volatile (
“syscall”
: “+r” (rax)
: “r” (rdi), “r” (rsi), “r” (rdx), “r” (r10), “r” (r8), “r” (r9)
: “rcx”, “r11”, “memory”);
printf(“After the syscall instruction!\n”);
}
}

  • My CAmkES code: We connect two threads of the component CompA, the faultIn thread is related to the From Template, the faultOut thread is related to the To template.

connector camkesFaultHandlerThread {
from Procedure;
to Procedure;
}

procedure CAmkES_FaultHandlerThread {
void bla();
}

component CompA {
control;
uses CAmkES_FaultHandlerThread faultIn;
provides CAmkES_FaultHandlerThread faultOut;
}

assembly {
composition {
component CompA compA;
connection camkesFaultHandlerThread fault0 (from compA.faultIn, to compA.faultOut);
}
}

  • The from Template: Waits on the fault endpoint object of TCBs in the same
    component

/- set thread_caps = [] -/
/- set fault_ep = alloc(“fault”, seL4_EndpointObject, read=True, write=True, grantreply=True) -/

/- for cap in cap_space.cnode: -/
/- if isinstance(cap_space.cnode[cap].referent, capdl.TCB): -/
/- set cap_name = cap_space.cnode[cap].referent.name-/
/- do thread_caps.append((cap, cap_name)) -/
/- endif -/
/- endfor -/

/- for cap, cap_name in thread_caps: -/
/- do cap_space.cnode[fault_ep].set_badge(cap) -/
/- do cap_space.cnode[cap].referent.set_fault_ep_slot(fault_ep) -/
/- endfor -/

  • The to Template: gets triggered with every fault, should handle the system call and then redirect program flow

/- set fault_ep = alloc(“fault”, seL4_EndpointObject, read=True, write=True, grantreply=True) -/
/- set info = c_symbol(‘info’) -/

int /? me.interface.name ?/__run(void) {
seL4_Word fault_type;
seL4_Word length;
seL4_Word delegate_tcb;
seL4_UserContext regs;
seL4_MessageInfo_t info;
while (1) {
info = seL4_Recv(/? fault_ep ?/, &delegate_tcb);
seL4_Fault_t fault = seL4_getFault(info);
fault_type = seL4_MessageInfo_get_label(info);

    if(fault_type == seL4_Fault_UnknownSyscall){
        printf("faulting PC: %zx\n",seL4_Fault_UnknownSyscall_get_FaultIP(fault));

        //handle system call
        ...

        // set the corresponding registers of the faulting control thread, i.e. the one that caused the UnknownSyscall Exception
        seL4_Fault_UnknownSyscall_set_RAX(fault, 0);
        seL4_Fault_UnknownSyscall_set_RCX(fault, seL4_Fault_UnknownSyscall_get_FaultIP(fault) + 2);
        seL4_Fault_UnknownSyscall_set_R11(fault, seL4_Fault_UnknownSyscall_get_FLAGS(fault));
        
        length = seL4_MessageInfo_get_length(info);
        seL4_TCB_ReadRegisters(delegate_tcb, false, 0,
                            sizeof(seL4_UserContext) / sizeof(seL4_Word),
                            &regs);

        // Which registers should I set for the faultOut thread?, right now simply jump over the syscall instruction
        regs.rip += 2;

        // Write registers back
        seL4_TCB_WriteRegisters(delegate_tcb, false, 0,
                                sizeof(seL4_UserContext) / sizeof(seL4_Word),
                                &regs);
        
        // Resume the caller
        seL4_MessageInfo_t info = seL4_MessageInfo_new(0, 0, 0, length);
        seL4_Reply(info);
    } else if(fault_type = seL4_Fault_VMFault){
        // Why do we get in here?
    }
}

}

  • the available threads in the system before triggering the fault:
                          compA:faultOut	        running	0x401626	                 254	0
                         compA:faultIn	        running	0x401626	                 254	0
                 compA:fault_handler	blocked on recv	0x401626	                 255	0
                       compA:control	        running	0x4011f7	                 254	0
                         idle_thread	           idle	0	                   0	0
                          rootserver	       inactive	0x4014bb	                 255	0
  • the available threads in the system after triggering the fault:
                          compA:faultOut	        running	0x406d69	                 254	0
                         compA:faultIn	blocked on recv	0x401626	                 254	0
                 compA:fault_handler	blocked on recv	0x401626	                 255	0
                       compA:control	blocked on reply	0x40146d	                 254	0
                         idle_thread	           idle	0	                   0	0
                          rootserver	       inactive	0x4014bb	                 255	0

So, I have a faulting syscall instruction in the control thread, i.e. the one in the run method. This triggers the UnknownSyscall Exception and somehow, the faultIn thread blocks then and we receive on the fault endpoint in the faultOut thread. I can catch this fault and handle it in the faultOut thread. What I want to achieve is that after handling the fault, I want to return to the control thread right after the faulting syscall instruction, i.e. 0x40146d + 2. The registers in the regs struct consists of the values in the faultIn thread. The values we can set/get from seL4_Fault_UnknownSyscall_set/get_X are the ones from the control thread, i.e. the actual faulting syscall instruction.

My problem is that I can’t properly redirect the control flow without triggering the same exception or a different one.

Has anyone ever tried something similar? Help would be very much appreciated.

Thank you very much!

Kind Regards,
Lukas

A reply to an seL4_FaultUnknownSyscall fault can update the registers of the faulting thread according to the seL4_UnknownSyscall_Msg enum layout. Updating the pc register in this reply message will update the pc register of the faulting thread. If the reply message length is 0, then the thread will be restarted from the start of the faulting instruction. If the PC is updated then it will restart from the start of the new instruction. (You might find that just writing back the received message with updated argument registers but an unchanged PC will cause the kernel to resume the thread on the next PC value which would save you from having to decode the instruction length but I’m not 100% sure that it works this way)

Hi Lukas,

It’s not clear to me why you’re setting registers on two different threads: 1) The thread that faulted that you’re changing the state on with the reply to resume the caller, and 2) the delegate_tcb thread by calling seL4_TCB_WriteRegisters? Or are these the same thread?

Also, note that the syscall number goes in rdx on x86_64 when seL4 is configured to use the syscall instruction.

This may present a problem depending on your use case for this. For example, even though the syscall numbers used by linux and seL4 are disjoint, on this platform linux puts the system call number in rax while seL4 puts it in rdx.

To expand on this:

On x86_64 the cpu register used for the seL4 system call number is used as the third argument to a linux system call. Therefore, if any linux system call happens to use one of seL4 system call number constants that do not raise an exception as its third argument the system call will not be emulated.

In addition, you might stash the rsp register in your syscall test program to prevent it from being clobbered, and then restore it after the syscall. I noticed that the seL4 syscall stubs do this:

Thank you for your fast response.

It’s not clear to me why you’re setting registers on two different threads: 1) The thread that faulted that you’re changing the state on with the reply to resume the caller, and 2) the delegate_tcb thread by calling seL4_TCB_WriteRegisters? Or are these the same thread?

Yes, you are right. I was working on two different threads. The code in the to template (faultOut thread) should only be triggered for the control thread. I had to change the code in the from template as follows:

/- for cap, cap_name in thread_caps: -/
/- if “control” in cap_name: -/
/- do cap_space.cnode[fault_ep].set_badge(cap) -/
/- do cap_space.cnode[cap].referent.set_fault_ep_slot(fault_ep) -/
/- endif -/
/- endfor -/

This way, the faultIn thread only receives on the fault endpoint of the control thread. As a result, the registers we can read with the regs struct and the MR registers are the ones of the control thread.

Also, note that the syscall number goes in rdx on x86_64 when seL4 is configured to use the syscall instruction.

That is an interesting observation and would explain why the seL4_Fault_UnknownSyscall_get_Syscall(fault) returns the number of the RDX argument.

For some reason the UnknownSyscall Exception is triggered even if I use the seL4 syscall numbers in RDX that are shown in the generated build_folder/libsel4/include/sel4/syscall.h.

register int rax asm (“rax”) = 60;
register int rdi asm(“rdi”) = 1;
register int rsi asm(“rsi”) = 2;
register int rdx asm(“rdx”) = -1;
register int r10 asm(“r10”) = 4;
register int r8 asm(“r8”) = 5;
register int r9 asm(“r9”) = 6;
asm volatile (
“movq %%rsp, %%rbx \n”
“syscall \n”
“movq %%rbx, %%rsp \n”
: “+r” (rax)
: “r” (rdi), “r” (rsi), “r” (rdx), “r” (r10), “r” (r8), “r” (r9)
: “%rcx”, “%rbx”, “r11”, “memory”);

I suspect sizeof(int) is likely 4, so it’s probably only setting the lower half of the rdx register. You can verify by checking the assembly with objdump -d $PATH_TO_EXECUTABLE.

You might instead use type long or int64_t.