Pre-RFC: Boot Interface

Summary

This adds a well-defined interface for the kernel to boot itself in a manner that allows any pre-boot software to be readily interchanged on any platform. elfloader will then be modified to use this interface to boot the seL4 kernel and pass control to the kernel.

Motivation

The seL4 kernel currently makes differing degrees of assumptions depending on the architecture for which it is targeted (x86 assumes multiboot, arm and risc-v assume particulars of elfloader) and the responsibilities of the kernel aren’t consistent across architectures and platforms. This can make it difficult make more general use of the kernel with alternative bootloaders or to pass through boot-time configuration to a system build on seL4.

The kernel is generally responsible for configuring the few devices that it uses as the system runs (serial device in debug configurations, IRQ controller, and hardware MMU) as well as for constructing global kernel mappings and initial user-level mappings, objects, and capabilities.

The kernel should depend minimally on the boot process used or any any one particular boot process. Additionally, any initialisation that can be done prior to kernel boot that doesn’t affect the kernel’s ability to maintain platform or device configuration as the system executes should be delegated to the boot code that executes prior to the kernel.

Guide-level explanation

The kernel will make a set of assumptions about the state of the system when it first accepts control:

  • The kernel image is copied into physical memory
  • That a given set of platform registers have been assigned specific values or refer to memory regions with specific contents.
  • For a given platform and configuration
    • The kernel is executing at some particular privilege level
    • The kernel is with virtual addressing enabled with some specific configuration of the platform’s virtual addressing system
      • x86 may assume specific GDT/IDT entries
      • Specific values for relevant addressing registers
    • Either that the kernel has been entered on only one core or that it has been entered on all available cores

The software that executes prior to the kernel starting supplies the kernel on each node with a numeric identifier for the node that is executing, a boot configuration structure, and a table of mapping regions.

The configuration structure specifies:

  • the starting virtual address of the initial user-level task,
  • the number of entries in the mapping region table, and
  • a single word value that is passed to user-level via the bootinfo struct (generally a pointer to an extended boot information structure mapped into the initial task).

The mapping region table describes a series of physical address regions that are any of:

  • a reserved region that should not be used by either the kernel or be made accessible to user-level,
  • a mandatory kernel code / read-only data and read/write data virtual-physical mapping (which the kernel may expect to fall into fixed locations),
  • a mandatory kernel read/write data
  • a user-level virtual-physical mapping, or
  • an unused physical memory region.

The physical regions must satisfy the following constraints:

  • The physical address range of one region may not overlap the range of any other region
  • User-level mappings must be aligned to the smallest page size and be a multiple of that size
  • The two kernel-level mappings must each fall entirely within a single page of a size defined by the architecture (large on riscv32, ia32, aarch32, and x86_64; huge on riscv64 and aarch64)
  • The mapping region table itself must
    • Not fall within either kernel mapping region
    • Not fall within any user-level mapping region
    • Be aligned to the smallest page size

Any boot information passed to the initial task from the bootloader or boot code prior to seL4 is defined entirely by that bootloader or boot code.

Reference-level explanation

When the kernel starts, if it is responsible for starting additional cores it does so. All cores wait until the boot core has initialised the kernel state and devices.

The boot core will check the region table to ensure that the regions satisfy the necessary constraints. It will then re-map itself using the kernel global mappings. It can then map and initialise any devices it uses directly.

The kernel will then create an initial set of untypeds from the regions of memory not covered by the reserved regions, kernel mappings, user-level mappings, or the region table itself.

From the initial untypeds, the kernel will bootstrap the initial task creating all the necessary kernel objects and mapping the user-level mappings into place.

Finally the kernel will reclaim the memory used by the region table and, if it was marked as reclaimable, the boot portion of the kernel code and produce untypeds.

TODO: Drawbacks

Outline any arguments that have been made against this proposal and discuss why we may not want to accept it. Also discuss any complications that may arise from the proposed change that may require specific consideration.

Rationale and alternatives

  • Most of this is how the kernel already functions
  • Allows for pre-kernel boot software other than elfloader to be used to boot the kernel (e.g. simple embedded firmware)
  • The boot process on x86 currently allows any user-level image to be specified at boot time but this is not currently possible when using elfloader, alternative loaders could be used to overcome this limitation in elfloader
  • Provides a general manner for boot information (such as DTB, multiboot information, or boot parameters) to be passed to the initial task (which is not currently well supported by the kernel)
  • Allows the kernel code to be executed from flash, better supporting memory-constrained applications
  • Clearly documents the assumptions made by the kernel at boot in a manner that can be easily checked

TODO: Prior art

Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are:

  • For ecosystem proposals: Does this feature exist in similar systems and what experience have their community had?
  • For community proposals: Is this done by some other community and what were their experiences with it?
  • What lessons can we learn from what other communities have done here?
  • Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.

This section is intended to encourage you as an author to think about the lessons from other systems, provide readers of your RFC with a fuller picture. If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other systems.

Note that while precedent set by other systems is some motivation, it does not on its own motivate an RFC.

TODO: Unresolved questions

  • What needs to be resolved in further discussion before the RFC is approved?
  • What needs to resolved during the implementation of this RFC?
  • What related questions are beyond the scope of this RFC that should be addressed beyond its implementation?

  • Specific formats for informational structures and tables on different platforms
  • How will the assumptions of the kernel be specified/documented?

@RamboBones it would be great to get your opinion here

I very much like this suggestion.

Some random things to consider:

1/ Could we version to boot interface to allow for evolution of exactly what is in it.

2/ If we versioned the boot interface could we simply document existing behaviour as the initial version without needing any elf loader or kernel changes for the first version.

3/ I think it is implied, but I could have misunderstood: would this provide an opportunity to remove some complexity from the kernel boot code, while also allowing the loader to have more control over the placement of data structures in memory? Specific examples could be:

  • the init_freemem function; it seems this could be simplified significantly if the loader needed to specify appropriate regions of memory.
  • Could CONFIG_ROOT_CNODE_SIZE_BITS go away and be replaced with the loader specifying an explicit region of memory for the root cnode instead?

I imagine the version of the boot interface would be tied to the kernel. This would mean a change in the boot interface would result in a change in the kernel version in the same way that a change in the syscall and invocation API or in libsel4 would require a change in kernel version.

That would essentially be the boot interface for the existing kernel version.

The

Yes, the intent is to enable the boot code to be simplified in the kernel. It might be worth extending it to allow the bootloader to specify the final location of the bootinfo struct in virtual memory. It also means that structures that the kernel passes through transparently, such as the bootloader’s device tree, don’t need to be handled explicitly and don’t need to be copied into memory by the kernel.

Another intent is to modify the virtual-physical translation of the kernel to allow it to execute directly from flash without needing to be copied into physical memory to ensure that it is contiguous and also to enable this for user-level. This would benefit particularly memory-constrained embedded devices.

Yes, this should allow routines such as init_freemem to be simplified and rely on less platform-specific code.

The intent was not to go so far as specifying specific caps for the initial task. It’s also possible that the boot code itself could determine how big the initial CNode would need to be without the bootloader specifying.

Any caps and kernel objects for the initial task will still be allocated from the initial untypeds created from the unused physical memory inidicated by the boot loader.

I like the suggestion for a clear interface so that it is possible to substitute our bootloaders with custom ones as others have mentioned a desire to do.

This proposal doesn’t make it clear if this new interface will be the only interface that can be used to boot the kernel. Currently multiple supported interfaces exist such as the multiboot interface used on x86 and the guide level explanation talks about changes that would remove this support. While I am not opposed to the kernel having a single boot interface in the future that calls into verified initialization code, I don’t think it is a good idea to remove existing interfaces right now. I’m in favor of establishing a clear interface designed for third-party boot loaders to be used with, but I don’t think that requires the kernel to only have a singular interface or removal of support for the existence of other entry points or should prevent the addition of future ones if there is a strong enough motivation for them.

I also don’t agree that we should be removing other supported interfaces, but I don’t think those additional interfaces should be provided by the kernel itself and should be moved into some pre-kernel boot code such as elfloader that then uses the kernel’s own boot interface.

I think that any alternate boot interfaces should be provided by some software that adapts that interface to the kernel’s singular boot interface.

Just so I’m clear: boot code means the code in the kernel. And we’d use loader code for whatever passes control to the boot code?

This is probably a future directions thing, and I don’t want to derail incremental improvement here, but CONFIG_ROOT_CNODE_SIZE_BITS is only used in the boot.c code, wouldn’t it be great to turn it from a compile time config to something that the loader can configure appropriately?

Additionally, I think you could (and probably should) have the loader determine the specific memory to be used for the cnode, rather than the boot code needing an allocator, but that’s probably even further down the road.

The loader needs to allocate memory for the initial CNode, BootInfo strucutre, TCB, initial task virtual address space translation tables, scheduling context, and IPC Buffer with the number and size of the translation tables depending on the virtual address regions that need to be mapped at user-level.

I think it’s difficult to avoid having the boot code act as an allocator in some fashion, and getting the loader code to specify all of these essentially turns that loader plus the boot code into the equivalent of the existing verified capdl loader in many ways.

The intent for this interface is to do the minimal work to bootstrap a system only enough to pass control to a task executing within the confines of seL4 that can then bootstrap anything remaining. As such, the extent to which an initial CNode is provided and configured should only be sufficient to get to this point and if a larger CNode is needed, that task can allocate a larger CNode after the fact. The same can be said for any other additional resources.

To add to this, the only reason that the loader needs to be able to specify any passthrough mappings to user-level is that user-level cannot map those regions without losing the data stored there as they are zeroed when retyped from untyped. Were this not the case, it would be possible for userlevel to bootstrap itself from a read-only mapping after boot (although with some additional work).

The only additional configuration I can see being actually necessary is any configuration for the kernel itself that bounds what it does throughout the life of the system (e.g., specifying which memory it can safely use for kernel objects/non-device untyped).