Procedural generation of the seL4 API

The sel4 library, which abstracts over the seL4 kernel API currently exists across multiple sources including a set of generated bitfield definitions, a generated syscall sum type, a set of invocations generated from type signatures, and multiple code and header files written in C. This means that generating the C implementation requires the use of a number of different tools and implementing the API in any other language requires some combination of even more tools and additional manual work to keep up with changes in non-generated C code.

There has been much internal discussion of how to improve this situation, particularly to allow for API implementations in languages other than C (such as CakeML and Rust) to be generated entirely automatically from the same source as the C implementation (preferably in the same manner) as well as to couple the API documentation from the manual more tightly with the API definition.

To this end, we will be working on unifying the definition and documentation into a single source format and providing the tools necessary to generate the API in just about any language. Initially, this will use a collection of XML files for the source of the definition and a command line tool that applies the parsed structure to a set of user-provided templates to generate the entire implementation in one file. Multiple passes can be used to generate implementations depending on multiple languages or that need to split definitions and compilation units (such as C).

We will probably port the XML source to a more refined IDL at a later date.

Components of the seL4 API

The seL4 library does not currently use any sort of namespacing, however it could almost certainly benefit from it in terms of structuring the generated libraries as well as structuring generated documentation.

The current C implementation of libsel4 contains the following kinds of items:

  • Constant values (as macros)
  • Associated compile-time configuration (as macros)
  • Internal implementation macros
    • Attribute macros
    • Internal debugging macros
  • Base integer types
  • Newtypes
    • Identifier types
    • Capability types
      • Associated invocations
  • Structures
  • Bitfield types and associated functions
    • Tagged-union bitfields
  • Tagged unions
  • System calls
  • System-call wrappers (as in intermediary for the implementation of invocations)
  • Invocations
  • Structs
  • Global and thread-local variables (with accessors)
  • Functions that access particular sections of the IPC buffer to retrieve additional return values from received IPC.

These should probably be unified and revised to some degree in a manner that would be independent of representation on the kernel.

Structure of the seL4 API

Despite the lack of namespacing the the C implementation of the seL4 API, the seL4 Reference Manual informs the following overall structure to the API:

  • Capability Spaces
    • CNode
    • Capability rights
  • Memory objects
    • Untyped
    • Device untyped (vs data untyped)
    • Retyping
  • Inter-process communication
    • Endpoint message passing (synchronous messages)
    • Notification (asynchronous semaphores)
  • Threads of execution
    • TCB (thread control block)
    • Domains
      • DomainSet
    • Faults
    • Virtualisation
      • VCPU (ARM and X86 only)
  • Virtual memory
    • Common architecture-dependent objects
      • Page
      • ASIDControl
      • ASIDPool
    • MMU (ARM and X86 only)
      • IOPageTable
      • IOSpace (Page Directory)
    • X86
      • EPTPD
      • EPTPDPT
      • EPTPT
      • PageDirectory
      • PageTable
      • PDPT (x86_64 only)
    • ARM
      • ASIDControl
      • ASIDPool
  • Hardware I/O
    • Interrupts
      • IRQControl
      • IRQHandler
    • x86 I/O
      • IOPort
      • IOPortControl
  • System bootstrapping
  • API Errors
  • Debugging
    • System calls
  • Debug API
  • Benchmarking
    • System calls

One goal of this project is to rearrange the API into the above structure using namespacing and to use that to structure generated API documentation. This would also allow much of the seL4 Reference Manualto be moved into the documentation as is and for the API documentation to provide context to methods and types by placing them in a well documented namespace.


To ensure compatibility with languages such as C, where items cannot be referenced before they have been declared, the IDL should enforce similar constraints on declaration.

Some enumeration values may need to be stable with differing configurations?

Some items only exist on certain platforms or when certain configuration options are enabled. Some items have different definitions or values on different platforms or when different combinations of configuration options are enabled.

System call semantics

Each system call has slightly different semantics for the 5-7 registers used during system calls as follows:

  1. target capability being invoked / badge of received IPC
  2. seL4_MessageInfo describing the IPC
  3. Message register 0 in+out
  4. Message register 1 in+out
  5. Message register 2 in+out (not present in ia32)
  6. Message register 3 in+out (not present in ia32)
  7. System call identifier

As can be seen, a platform actually only needs a minimum of 3 general purpose registers to perform a system call, although this then requires the IPC buffer to be used to pass any arguments and thus also requires the calling code to look up the address of its IPC buffer.

All invocations are mapped over these semantics. Where there is no associated capability, the 0th register does not need to be set. When the associated system call does not expect arguments or capability transfer, the 1st register doesn’t need to be set. Of any message registers sent or received, the first words should be written to and read from the associated physical registers with further words being written to the IPC buffer.

Invocations use a packing strategy with their arguments where possible to ensure as few words (and memory accesses) as possible are used during the handling of the system call.

Issues with the current model

Some interfaces assume the entire response of a syscall is written into the IPC buffer and stays there (which is unsafe and and unhygienic) and have additional functions to retrieve additional values that are returned. Languages which support multiple returns should return all values and languages which support returning structures should return all values packed into a single structure.

This would prevent situations where subsequent system calls clobber the contents of the thread-local IPC buffer and would ensure that the IPC buffer is only used from the userland side when it is actually needed (like it is in the kernel) rather than introducing unnecessary copying of data back and forth.

Current alternate language implementations


There exist handwritten wrappers of the C API that unmarshal the bytearray from the CakeML FFI on the C side in order to call functions from libsel4. In addition, handwritten CakeML wrappers wrap the FFI and present to present the API functions to CakeML.


I’m not sure what the Robogalia project currently does to maintain its API implementation but it would be helpful to understand their current setup and use case better.

I would really appreciate any feedback on the above plan, particularly:

  • Information on how interfaces for other languages are maintained,
  • possible options for an IDL that would could get or build a parser for that can represent all of the items,
  • ways in which the items that appear in the API could be consolidated without losing functionality or requiring the kernel to change its representation of them,
  • ways in which the proposed structure of the API could be improved, or
  • ways in which content from the manual can be improved before being moved into the API documentation.
1 Like

Gave a presentation internally to update the team on the progress of this.

Slide content is below.

The current seL4 API

The seL4 API

The seL4 API defines the interface between user-space and the kernel.

  • Invocations (specific instances of system calls)
  • Kernel object addresses
  • Common data types (e.g. seL4_Word, seL4_MessageInfo, &
  • Shared memory and data structures (e.g. seL4_BootInfo &
  • Global state management (__sel4_ipc_buffer & bootinfo)

What gets generated?


  • The Syscall variants (with implicit numbering)
    • All 8 system calls.
    • All of the other system calls too.
  • The invocations
    • Invocations for each of the kernel objects
    • (No generic system calls invocations)

The bit-field generator

  • Fault tagged union
    • Tag
    • Variants
  • MessageInfo
  • CapRights
  • CapData

What is implemented by hand?

  • Benchmarking interface types
  • Global state accessors (for bootinfo and __sel4_ipc_buffer)
  • BootInfo type and internal types
  • IPCBuffer type
  • Debug API types
  • IPC message & capability passing limits
  • Error constants
  • Aliases for deprecated functions
  • Fault tagged union deserialising
  • C implementation attribute constants
  • C implementation utilities (Bool, Null, etc.)
  • Direct system call invocations (rather than the kernel object invocations)

The manual

The manual provides some overview of the kernel design.

It supports understanding the API by documenting the kernel model that is acted upon by the API.

Does it support the API sufficiently?

  • A bit too general in some cases, too detailed in some others.
  • Several key details missing.
  • Only generated documentation exists for object invocations and general system call invocations.
  • Manually maintained information for all other kernel types.
  • Omits most of the types and functions available in the API.
  • Maintained separately to the API definition and the kernel implementation.
  • Many sections confuse most readers.
  • FAQ suggests deficiency in the documentation.

The seL4 Interface Description Language

A single representation

The IDL seeks to represent the API in a single specification source.

The same source is used to generate the types, invocations, constants, variables, and system calls across all implementations.

The IDL also couples the documentation directly with the specification. The same definition used to generate all of the language generates all of the documentation for the interfaces.

All of the API items are kept up to date in the documentation. Type signatures and names are always accurate, even if the descriptive text is not.

Using the IDL

The IDL can be shipped as a Rust library or a standalone binary tool.

The library is responsible for:

  • parsing the IDL source (currently very basic XML),
  • constructing a validated in-memory abstract syntax tree, and
  • calculating the layout for all of the types to ensure consistent representation and serialization of types.

Using the Tool

The IDL binary will use this tool to generate a single C source implementation and a set of C source and header files, CakeML implementation, and to dump the validated AST and layout information into JSON for other tools to use.

The Rust library or JSON output may be used for other implementations.

The input language

XML definition

Very simple definition for XML source.

<!ELEMENT invocation
  (condition?, description?, argument*, result?)
<!ATTLIST invocation name    CDATA        #REQUIRED>
<!ATTLIST invocation syscall CDATA        #REQUIRED>
<!ATTLIST invocation message (true|false) "false">

Parse in Rust

Rust’s serde library in makes it trivial to parse the XML by simply defining the structures.

#[derive(Debug, Deserialize, Serialize)]
pub struct Invocation {
    name: Identifier,
    condition: Option<String>,
    description: Option<String>,
    syscall: Path,
    message: bool,
    arguments: Vec<Argument>,
    result: Option<TypeReference>,

XML source

    Identify the type of a capability in the current cspace.
  <argument name="cap" type="seL4::CPtr">
      A capability slot in the current cspace
  <result type="seL4::arch::ObjectType" />

Structured syntax

Considering more ergonomic syntax to use in the future.

/// Identify the type of a capability in the current cspace.
invocation DebugCapIdentify:
  /// A capability slot in the current cspace
  argument cap: seL4::CPtr;
  result seL4::arch::ObjectType;

Implementation details

Building and validating the AST

Rust’s type system helps enforce the invariant that an invalid Ast will never be constructed.

Rust’s lifetime tracking enforces a strict ordering to items being added and evaluated in the AST.


The source structure is walked and a builder is used across the abstract syntax tree, ensuring unique naming and correct referencing whenever an item is added to the tree.

The Builder wraps each node of the tree as it is constructed, checking for name uniqueness before adding any new items.


The Builder is also responsible for invoking the constructor of each item, ensuring it is only ever constructed with a unique name and that it is only ever constructed inside the tree for which it was validated.

The constructor for each item ensures that any references made to types, system calls, or enum variants already exist in the tree and are of the correct kind before actually constructing the item.

Accessing the AST

Accessing the AST works in a similar manner to building it.

A Cursor wraps access to each element of the AST and holds a reference to the root, allowing it to easily and reliably dereference paths as necessary.

As all of the paths have been checked to refer to existing items with the correct kind, the cursor can provide these lookups without failure.

Item layout

Cursors are also able to walk the tree to determine the memory layout of values and the packing requirements for invocations.

This ensures that the binary representation of items packed into the message buffer is consistent across all implementations.

Generating implementations

The only thing the generator has to do is generate compatible definitions in the source language for all of the necessary types, constants, globals, and invocations.

Generating invocations requires packing the argument values correctly into the system and message registers according to the layout rules from the library.

Generating API documentation

Generating documentation is much the same a generating a source code implementation.

All documentation appears as items in the AST or attributes on other items.

The handwritten documentation is formatted as reStructuredText and the documentation generates signatures for all of the items in-line.

Sphinx is then used to generate PDF and HTML API documentation.

Special cases


All invocations depend directly on the MessageInfo and IPCBuffer types described in the API itself.

Each thread has a dedicated IPCBuffer which the invocations require access to. This is currently provided by a GetIPCBuffer function with a number of dedicated functions for optimised field access across platforms.

This will likely become a pointer in the TLS, but all of the same accessor functions will need to be generated for compatibility.

Capability subtyping

Every system call invocation can be invoked upon any capability and many invocations accept arbitrary capabilities in their arguments.

Object invocations can only be invoked for capabilities of specific objects.

It would be beneficial to represent capability pointers to particular kinds of objects to subtype a generic capability pointer type.

Enum subtyping

It is common for a C enum to include the entire definition of an existing enum within itself. This would basically imply that the more general enum is a subtype of the more specific one.

For example, a seL4::ObjectType may subtype a more specific seL4::arch::ObjectType.

Future possibilities

  • Generating IPC interfaces.
  • Generating proof code for data structure representation and aliasing.
  • Dedicated structured syntax; easier to read and maintain.
  • Generate deserialisation code for the kernel.


Generating an interface for new languages just requires a new code generator.

More comprehensive and accurate documentation for the API.

Reliable and extensible tooling for maintaining the implementation of seL4 APIs.

There was a lot of good feedback from this session. Please comment if I have missed any of the discussion or if there are any further questions.

I have paraphrased the key details of the discussion below.

The current manual contains 3 kinds of documentation:

  • API interface documentation,
  • A description of the kernel model, and
  • Documentation regarding the kernel’s implementation of that model.

Should these forms of documentation be split into 3 different documents?


I think that the description of the kernel model is needed as part of the description of the API to provide context and help users of the API build an intuition of what the API represents.

I don’t think that the API documentation should detail how the kernel implements the model. That should probably exist in an external document.

The current API is not currently verified but some previous ideas have been discussed regarding ways we could do this. Most of them involved generating C code for the API from the Isabelle proofs for the model.

With this IDL generating the C source code, would it be possible to generate proofs for a verified implementation?


My hope is that, in a similar manner to the existing bit-field generator, this would be able to generate the proofs for the generated data structures. Ideally, generating these proofs from the AST could be done in the same way documentation and code implementation is.

The proofs would likely apply specifically to the C implementation, so the proof generation would need to be closely aligned with the C code generation.

I will need to discuss more with the proof team to determine the best way to go about making this possible.

For generating structures and bit-fields, does the IDL call out to the existing bit-field generator?


The intent is for the IDL tooling to completely replace the existing bit-field generator. The existing bit-field generator has some maintainability issues so a replacement would mean fewer tools to maintain as well as reducing any issues with that generator.

The existing bit-field generator also generates proofs for the data structures that are used in the kernel so that IDL should be able to generate the same proofs as the bit-field does. Further discussion with the proof team is necessary to ensure that the IDL is built in such a way that makes this not only possible but a minimal maintenance overhead.

Have you considered using other documentation tools such as Doxygen?


We have considered a number of options for the documentation tooling including Doxygen.

When we were considering documentation tooling the main requirements we had were that will could generate good HTML documentation in addition to standalone PDF documentation appropriate for offline use.

Additionally, we wanted documentation to be written in a simple text format that was expressive in addition to being being human-readable. Markdown was requested by a number of people for this reason however providing additional semantic structure such as internal and external cross-references would be difficult.

Using reStructuredText allows us to use a source format that is human-readable and easily maintained as well as embedded in generated source. Sphinx allows us to generate documentation source in the same manner we would generate source code for a programming language and will generate all of the formats we want.

Have you looked at the documentation requirements of functional verification and validation standards such as ISO9001 and ISO26262 to see if this documentation will comply with them?


We were not aware of such standards. We will need to look at what the requirements of these standards to determine if it is necessary or desirable for us to ensure that we them.

Have you considered using the IDL to specify tests that could be generated for different languages?


This was not something that had been considered. This will likely be considerably difficult to do and will definitely increase the scope of the project by an order of magnitude or more.

This is an idea we may want to revisit in the future.

Members of the Robigalia are very interested in the outcomes of this project.


We should definitely discuss how the outcomes here will interact with their work.

I would be very interesting in getting feedback from them regarding the overall goals and the more detailed design I am putting together.

This project has been somewhat put on hold. I would like to see this project continue but it will require a lot of thought regarding how to best structure a solution.

Hi @curtis, Robigalia contributor here.

IIUC, I think we agree that we would like libsel4 to be completely specified by an IDL layer such that we could generate from it a complete, ideally idiomatic rust sel4 library. Furthermore, the IDL should be expressive enough so that we would not need to embed a bunch of assumptions about sel4 into the IDL processing code that would need to be updated for each release like exists currently.

It sounds like you have a more detailed plan or possibly an implementation you’ve been working on – is that something you could share? I would love to work with you on this.

Trying to maintain a Rust interface on top of the sel4 kernel has been a major pain point for us and we would very much like to help improve this situation :smiley:.

Our existing (outdated) work against early versions of the sel4 kernel was a very manual, tedious, and error-prone process. We broke our libsel4 equivalent into two pieces:

  1. sel4-sys, a direct translation of libsel4 into rust:

    • Maintaining our own versions of your python scripts and build system that generate Rust code instead of C

    • Manually translating the non-generated C and assembly code in libsel4 into rust

  2. rust-sel4, an idiomatic rust sel4 interface built on top of sel4-sys

    • Manually maintained

On every sel4 release we had to go through each of your commits that affected either the libsel4 C code or the python scripts and duplicate your work, and then update our idiomatic code to match.

We stopped maintaining this solution around the time you switched to the cmake build system to rethink our approach.

While not directly related to the IDL, it is concerning how fine-grained and cpu-specific the kernel build configuration is. Many build configuration options seem like they could be detected and configured at run-time by the kernel, at least on x86/x64. With the current level of detail in the build configuration, if we wanted to ship a product that could run on a wide variety of x86/x64 cpus we would need to build images for a large combination of possible cpu families and other config options.

Furthermore, it is frustrating that the kernel ABI is dependent on the build configuration. This makes the fine-grained build configuration issue worse because we must also recompile our libraries and downstream dependencies for each kernel configuration. It’s not clear to me if it’s even possible to express this constraint correctly in the Rust build system.


HI @lazycandle55 thanks for reaching out.

I think that if we do end up building a well-designed tool that can generate everything necessary to build the sel4 user-level library then we would have something very close to being able to define generic RPC between threads (the sel4 user library is essentially the same as RPC via an endpoint but the recipient is the kernel rather than another thread).

The current way the user-level library is built and configured is not the most appropriate for dynamic systems or heterogeneous hardware, but I think that can be improved in a tractable manner. I also think that this problem should be resolved before attempting to design an all encompassing IDL.

I hope the following helps lay out some idea for a roadmap we can follow to get to a place we’re all happier with.

Removing configuration from libsel4

(This section is probably a reasonable candidate for an RFC).

There should probably be a set of fixed application binary interfaces for seL4 (one for each seL4 architecture) that are forwards and backwards compatible to some point in time.

libsel4 is presently a collection of type aliases, constants (in the form of preprocessor definitions and enumerations), structures, and functions that wrap system call operations. Some of these implementations are generated from python scripts (, and There is also some effort to appropriately namespace these constants insomuch as is achievable in C.

The ABIs should be made by fixing all of the values and structures that are (or can be made to be) constant across configurations in the ABI and providing any values at boot time via the BootInfo mechanism.


Any enumerations in libsel4 should have all variants given fixed values and should define all possible values for all configurations, even when those values are not accepted by the kernel as valid.

Affected enumerations would include:

Constants & Variables

A number of values that do not change across configuration are defined as C enumerations or C preprocessor definitions. These should be fixed within a given ABI.

A number of values currently defined in libsel4 are dependent on the actual configuration of the kernel that is executing. These include include IRQ numbers, the size of kernel objects, and the size, number, and capacity of address space objects.

Values that are dependent on kernel configuration may continue to be provided via the libsel4 headers but should also be supplied by the kernel using the boot info structures where user-level components can disseminate the information rather than rely on a static set of values for a particular configuration.

Type aliases

There are numerous type aliases in libsel4 but all are constant within an ABI and should all be defined regardless of the kernel configuration.

System calls

Each architecture currently has a number of handwritten implementations for each system call operation which essentially encode the system call interface. These encode the instruction sequence used to invoke a system call as well as the meaning for each register and use of the IPC buffer.

The system call binary interface and type signatures of the wrapper functions do not generally change with regard to configuration but there are some changes for MCS (particularly on ia32). We should simply switch wholesale over to the MCS ABI.


Invocations are generated from the which generates invocation labels and functions to provide implementation of each invocation. The implementations marshal and unmarshal data from the message registers and IPC buffer and call one of the system call wrapper functions.

The inclusion of these definitions is currently dependent on particular configurations (as are the invocation labels that identify which invocation has been used). These should be fixed for an ABI and all invocations should be defined regardless of kernel configuration.

Structures and tagged unions

Structures that aren’t used by the kernel or that are only used by the kernel during boot are currently defined as C structures. Those that are used by the kernel after boot are generally defined as bit-fields and generated with the script. This script also generates proof code used by the C refinement proofs.

There are three structures that depend only on the size of a native machine word (seL4_MessageInfo, seL4_CapRights, and seL4_CNode_CapData) and one architecture-dependent tagged union (seL4_Fault).

The layout and in-memory representation for these needs to be made constant and explicitly defined. Any structures or tagged unions loaded into the IPC buffer need to have that representation also made constant and explicitly defined.

  • All structures and tagged unions generated by the bit-field generator should be pretty well defined already.
  • Most structures used only at user level should not care about binary representation, they are only used to marshal/unmarshal data.
  • Boot info structures are currently defined using C structures but are used by both the kernel and user-level so a more explicit definition of layout may be preferred (they could be made into a bit-fields).

Building an IDL

Building a tool to generate user-level implementations of the seL4 API should become much easier with a fixed set of ABIs that don’t depend on kernel configuration. Even maintaining a in independent hand-crafted implementation would become feasible.

The design of any IDL that is able to generate all components of libsel4 from a single specification would also likely replace the the bit-field generator outright if it can be made to produce the same proof code for Isabelle, although this would not be necessary for it to be useful in its own right (it merely need to be compatible with the existing scripts). Having a written definition of the bitfield generator would probably help here.

With the above the IDL generator should not need to consider configuration beyond a certain target ABI and should only need to deal with generating constants, type aliases (and newtypes), namespacing, enumerations, structures and tagged-unions with bit-fields, and functions that marshal and unmarshal arguments and return values around calls to system call wrappers (although the implementations could also generate inline assembly for a system call).

1 Like

That sounds like it would be very useful.

Looks good to me :smiley:

Is that something I can contribute, or are you working on that?

Would it be appropriate to specify the ABI in the RFC to get consensus on it as part of the RFC process?

So once the ABI is approved and implemented, the kernel would guarantee that anything compiled for sel4 from that point forward will continue to work on all future sel4 kernels, forever?

Say for example we had set an ABI and flipped the switch before MCS was merged. The MCS changes would then all have to be implemented as new syscalls, invocations, etc, keeping all the existing ones and making sure they still work?

It seems like it might be confusing to provide both, although I see the benefit to keeping these constants for people who intend to keep their solution tightly coupled to their kernel configuration.

To make sure I understand this section, you are saying the ABI must specify every structure or tagged union that passes between the kernel and userspace by any means (IPC buffer, bootinfo, etc), but should NOT specify any structures or tagged unions that only exist in libsel4 as a convienence to the user that data from the kernel gets copied into after it is communicated?

Yes, I’d love to see this happen!

This may end up being how the ABI is initially documented although another alternative is for the RFC to simply describe the requirements for the ABI, the process for defining and changing the ABI, and how we distribute a definition of the ABI (human-readable document, machine-readable spec).

It would be unlikely that we could guarantee and form of binary compatibility across major kernel versions. There may be ranges of kernel versions where the API for kernel object invocations change but the API for IPC via Notifications, Endpoints, and Reply objects remains stable. This would allow applications that do not operate on other kernel objects directly to be binary compatible across these ranges of versions. This would almost all user-level components that aren’t part of the operating system.

The strongest guarantee we could probably ensure is that certain identifiers would never be re-used which would allow a user-level component to virtualize seL4 in a manner that does provide a backwards-compatible interface by emulating the removed features.

Correct. As the values are not referenced in shared memory by both the kernel and user-level there is no need for the in-memory representation to be explicitly defined.

The IPC buffer, values marshaled into a combination of general-purpose registers and the IPC buffer, and structures such as BootInfo are communicated between the kernel and user via shared memory and registers. The binary representation of these structures needs to be agreed between the user level and the kernel.

We should probably turn this into a more formal RFC discussion in its own thread. There we can discuss exactly what problem we are trying to solve, what the options are, and how the resolution should be structured.

Continued in this thread.

1 Like