The current implementation of MCS with ARM virtualization can lead to a kernel bug.
Specifically:
1/ The handleInterruptEntry
code in MCS performs an MCS budget check as the first thing it does.
2/ It is possible for an MCS budget check to requeue the current thread onto the schedule queue. At this point we have the thread both as the active thread and on a scheduling queue.
3/ On ARM with virtualization the interrupt source could be a VPPI or VGICMaintenence interrupt. Such an interrupt is treated as a fault, and an IPC is sent from the VCPU’s thread to it’s fault handler. This occurs as a a blocking synchronous IPC.
4/ After sending a blocking synchronous IPC the VCPU’s thread shall be blocked (either on the send or reply phase). However, as the MCS budget check code could have placed the thread on the scheduling queue at this point the thread is both blocked and on a scheduling queue.
Note: As far as I understand the IPC and scheduling code, threads are assume to not be on the scheduling queue at kernel entry, and therefore there is no code path to remove them from the scheduling queue if the thread is blocked during the kernel.
5/ Sometime later the scheduler will run and pick the blocked thread from the scheduling queue, at which point the kernel crashes.
Obviously this is a nasty case to trigger, and is timing sensitive. The VPPI/VGICMaint interrupt needs to come in just as the current budget is expiring.
As a work around for my immediate use case I’ve updated to the handleInterruptEntry to not perform the MCS budget check in the case where the interrupt could be a VGICMaint or VPPI interrupt. This avoids the problem and I can run without any crashes.
However, it’s not clear what the correct fix is.
I think at least some of the following should be involved:
1/ It should be invariant that on kernel exit (and entry) the current thread is not on a scheduling queue. We should assert such a variant in more places (that would have made this bug easier to find, so will likely make future bugs easier to find!).
2/ It should be invariant that prior to any code that can cause a thread to block it must not be on a scheduling queue. Examples are sendIPC, but likely others. This should also be checked more often. In fact, if the currently active thread is blocked, it should be asserted that it is not currently on the scheduled queue.
3/ The budget check on handleInterruptEntry needs to be rethought. I don’t really have a good proposal here, as I don’t understand the MCS invariants well enough. I would mention that it is odd that the code path here is slightly different to the mcsIRQ code path (or at least odd to me!).
I assume this current behaviour is not caught with formal verification as the VPPI and VGICMaintenence paths are not considered for formal verification.
If someone with more depth of understanding of MCS can provide direction I’m happy to handle to pull request side of things.