The sel4
library, which abstracts over the seL4 kernel API currently exists across multiple sources including a set of generated bitfield definitions, a generated syscall sum type, a set of invocations generated from type signatures, and multiple code and header files written in C. This means that generating the C implementation requires the use of a number of different tools and implementing the API in any other language requires some combination of even more tools and additional manual work to keep up with changes in non-generated C code.
There has been much internal discussion of how to improve this situation, particularly to allow for API implementations in languages other than C (such as CakeML and Rust) to be generated entirely automatically from the same source as the C implementation (preferably in the same manner) as well as to couple the API documentation from the manual more tightly with the API definition.
To this end, we will be working on unifying the definition and documentation into a single source format and providing the tools necessary to generate the API in just about any language. Initially, this will use a collection of XML files for the source of the definition and a command line tool that applies the parsed structure to a set of user-provided templates to generate the entire implementation in one file. Multiple passes can be used to generate implementations depending on multiple languages or that need to split definitions and compilation units (such as C).
We will probably port the XML source to a more refined IDL at a later date.
Components of the seL4 API
The seL4 library does not currently use any sort of namespacing, however it could almost certainly benefit from it in terms of structuring the generated libraries as well as structuring generated documentation.
The current C implementation of libsel4
contains the following kinds of items:
- Constant values (as macros)
- Associated compile-time configuration (as macros)
- Internal implementation macros
- Attribute macros
- Internal debugging macros
- Base integer types
- Newtypes
- Identifier types
- Capability types
- Associated invocations
- Structures
- Bitfield types and associated functions
- Tagged-union bitfields
- Tagged unions
- System calls
- System-call wrappers (as in intermediary for the implementation of invocations)
- Invocations
- Structs
- Global and thread-local variables (with accessors)
- Functions that access particular sections of the IPC buffer to retrieve additional return values from received IPC.
These should probably be unified and revised to some degree in a manner that would be independent of representation on the kernel.
Structure of the seL4 API
Despite the lack of namespacing the the C implementation of the seL4 API, the seL4 Reference Manual informs the following overall structure to the API:
- Capability Spaces
CNode
- Capability rights
- Memory objects
Untyped
- Device untyped (vs data untyped)
- Retyping
- Inter-process communication
-
Endpoint
message passing (synchronous messages) -
Notification
(asynchronous semaphores)
-
- Threads of execution
-
TCB
(thread control block) - Domains
DomainSet
- Faults
- Virtualisation
-
VCPU
(ARM and X86 only)
-
-
- Virtual memory
- Common architecture-dependent objects
Page
ASIDControl
ASIDPool
- MMU (ARM and X86 only)
IOPageTable
-
IOSpace
(Page Directory)
-
X86
EPTPD
EPTPDPT
EPTPT
PageDirectory
PageTable
-
PDPT
(x86_64 only)
-
ARM
ASIDControl
ASIDPool
- Common architecture-dependent objects
- Hardware I/O
- Interrupts
IRQControl
IRQHandler
- x86 I/O
IOPort
IOPortControl
- Interrupts
- System bootstrapping
- API Errors
- Debugging
- System calls
- Debug API
- Benchmarking
- System calls
One goal of this project is to rearrange the API into the above structure using namespacing and to use that to structure generated API documentation. This would also allow much of the seL4 Reference Manual
to be moved into the documentation as is and for the API documentation to provide context to methods and types by placing them in a well documented namespace.
Constraints
To ensure compatibility with languages such as C, where items cannot be referenced before they have been declared, the IDL should enforce similar constraints on declaration.
Some enumeration values may need to be stable with differing configurations?
Some items only exist on certain platforms or when certain configuration options are enabled. Some items have different definitions or values on different platforms or when different combinations of configuration options are enabled.
System call semantics
Each system call has slightly different semantics for the 5-7 registers used during system calls as follows:
- target capability being invoked / badge of received IPC
-
seL4_MessageInfo
describing the IPC - Message register 0 in+out
- Message register 1 in+out
- Message register 2 in+out (not present in ia32)
- Message register 3 in+out (not present in ia32)
- System call identifier
As can be seen, a platform actually only needs a minimum of 3 general purpose registers to perform a system call, although this then requires the IPC buffer to be used to pass any arguments and thus also requires the calling code to look up the address of its IPC buffer.
All invocations are mapped over these semantics. Where there is no associated capability, the 0th register does not need to be set. When the associated system call does not expect arguments or capability transfer, the 1st register doesn’t need to be set. Of any message registers sent or received, the first words should be written to and read from the associated physical registers with further words being written to the IPC buffer.
Invocations use a packing strategy with their arguments where possible to ensure as few words (and memory accesses) as possible are used during the handling of the system call.
Issues with the current model
Some interfaces assume the entire response of a syscall is written into the IPC buffer and stays there (which is unsafe and and unhygienic) and have additional functions to retrieve additional values that are returned. Languages which support multiple returns should return all values and languages which support returning structures should return all values packed into a single structure.
This would prevent situations where subsequent system calls clobber the contents of the thread-local IPC buffer and would ensure that the IPC buffer is only used from the userland side when it is actually needed (like it is in the kernel) rather than introducing unnecessary copying of data back and forth.
Current alternate language implementations
CakeML
There exist handwritten wrappers of the C API that unmarshal the bytearray from the CakeML FFI on the C side in order to call functions from libsel4
. In addition, handwritten CakeML wrappers wrap the FFI and present to present the API functions to CakeML.
Rust
I’m not sure what the Robogalia project currently does to maintain its API implementation but it would be helpful to understand their current setup and use case better.
I would really appreciate any feedback on the above plan, particularly:
- Information on how interfaces for other languages are maintained,
- possible options for an IDL that would could get or build a parser for that can represent all of the items,
- ways in which the items that appear in the API could be consolidated without losing functionality or requiring the kernel to change its representation of them,
- ways in which the proposed structure of the API could be improved, or
- ways in which content from the manual can be improved before being moved into the API documentation.