Summary
This proposes to introduce a new object that represents the FPU into seL4.
Motivation
Currently, seL4 treats the FPU as part of the thread context, saved in the TCB on preemption. As FPU state is large, the kernel uses lazy switching for the FPU, meaning on a context switch FPU access is disabled, resulting in an exception when a thread attempts to use it; the kernel will then save and restore the FPU state.
As a result, TCB objects are large (2 KiB on aarch64), and this space is wasted for threads that don’t touch the FPU. This scheme also introduces a timing channel (e.g. cve-2018-3665). By making the FPU access explicit where threads have to present an FPU capability in order to use the FPU, we can reduce TCB size by a half. And by switching to an eager-restore method, we can also mitigate the timing channel.
This will be implemented for all architectures.
Guide-level explanation
Threads which need to use the FPU will now need to have an FPU capability bound to the TCB. To do so, the object must first be created via an seL4_UntypedRetype()
call. Once the FPU capability is created, it can be binded to a TCB (one at a time) with seL4_TCB_BindFPU()
. If the TCB no longer needs the FPU, an unbind call can also be made via seL4_TCB_UnbindFPU()
, which will free the FPU object to be used by another thread. Deleting the TCB object will also unbind the FPU.
Usage of the FPU by TCBs which don’t have the appropriate FPU capability bound to them will raise a fault, transmitted to the thread’s fault handler.
Reference-level explanation (aarch64)
fpu_t
This struct represents the FPU object created from untypeds. It is arch-specific. As an example, the aarch64 FPU registers span 512B + some more for the status/control registers. In order to maintain a compact size, we move a register out into the TCB.
typedef struct fpu {
/* Last quad-word located in TCB */
uint64_t vregs[62];
uint32_t fpsr;
uint32_t fpcr;
/* Backlink from fpu to TCB */
struct tcb *fpuBoundTCB;
} fpu_t;
tcb_fpu_t
This struct resides in the TCB, and stores necessary information about the FPU as well as the last quad-word we took out of the FPU object.
typedef struct tcb_fpu {
/* Object created from retyping an untyped */
fpu_t *tcbBoundFpu;
/* Last quad-word register in the fpu */
uint64_t q31[2];
} tcb_fpu_t;
The existing FPU save/restore methods will be modified to take into account the missing quad-word register in the FPU object.
TODO: Drawbacks
A side-effect of this change is that most threads will end up having an FPU object bound to them, and since FPU access is currently implicit, it will have to be made explicit by the developer (more work). Even if there is no intended FPU usage, the compiler may sometimes generate instructions which “borrow” the FPU registers. A work around for this can be forcing the compiler to use general registers only. This would, however, prevent FPU usage where it is actually needed.
TODO: Rationale and alternatives
- Why is this design the best in the space of possible designs?
- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?
TODO: Prior art
Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are:
- For ecosystem proposals: Does this feature exist in similar systems and what experience have their community had?
- For community proposals: Is this done by some other community and what were their experiences with it?
- What lessons can we learn from what other communities have done here?
- Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.
This section is intended to encourage you as an author to think about the lessons from other systems, provide readers of your RFC with a fuller picture. If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other systems.
Note that while precedent set by other systems is some motivation, it does not on its own motivate an RFC.
TODO: Unresolved questions
- What needs to be resolved in further discussion before the RFC is approved?
- What needs to resolved during the implementation of this RFC?
- What related questions are beyond the scope of this RFC that should be addressed beyond its implementation?