MCS charging interaction with ARM VCPU faults causes a kernel crash

The current implementation of MCS with ARM virtualization can lead to a kernel bug.

Specifically:

1/ The handleInterruptEntry code in MCS performs an MCS budget check as the first thing it does.

2/ It is possible for an MCS budget check to requeue the current thread onto the schedule queue. At this point we have the thread both as the active thread and on a scheduling queue.

3/ On ARM with virtualization the interrupt source could be a VPPI or VGICMaintenence interrupt. Such an interrupt is treated as a fault, and an IPC is sent from the VCPU’s thread to it’s fault handler. This occurs as a a blocking synchronous IPC.

4/ After sending a blocking synchronous IPC the VCPU’s thread shall be blocked (either on the send or reply phase). However, as the MCS budget check code could have placed the thread on the scheduling queue at this point the thread is both blocked and on a scheduling queue.

Note: As far as I understand the IPC and scheduling code, threads are assume to not be on the scheduling queue at kernel entry, and therefore there is no code path to remove them from the scheduling queue if the thread is blocked during the kernel.

5/ Sometime later the scheduler will run and pick the blocked thread from the scheduling queue, at which point the kernel crashes.

Obviously this is a nasty case to trigger, and is timing sensitive. The VPPI/VGICMaint interrupt needs to come in just as the current budget is expiring.

As a work around for my immediate use case I’ve updated to the handleInterruptEntry to not perform the MCS budget check in the case where the interrupt could be a VGICMaint or VPPI interrupt. This avoids the problem and I can run without any crashes.

However, it’s not clear what the correct fix is.

I think at least some of the following should be involved:

1/ It should be invariant that on kernel exit (and entry) the current thread is not on a scheduling queue. We should assert such a variant in more places (that would have made this bug easier to find, so will likely make future bugs easier to find!).

2/ It should be invariant that prior to any code that can cause a thread to block it must not be on a scheduling queue. Examples are sendIPC, but likely others. This should also be checked more often. In fact, if the currently active thread is blocked, it should be asserted that it is not currently on the scheduled queue.

3/ The budget check on handleInterruptEntry needs to be rethought. I don’t really have a good proposal here, as I don’t understand the MCS invariants well enough. I would mention that it is odd that the code path here is slightly different to the mcsIRQ code path (or at least odd to me!).

I assume this current behaviour is not caught with formal verification as the VPPI and VGICMaintenence paths are not considered for formal verification.

If someone with more depth of understanding of MCS can provide direction I’m happy to handle to pull request side of things.

Both are invariants on the MCS kernel without HYP extensions, and at least the “all threads in ready- and release-queues are runnable” is proved on the abstract spec level. The current-thread-not-in-queue at entry/exit is also true, but I don’t know if we’ve proved it. They both should (“must be made to”) be true on MCS/ARM_HYP but as you mentioned that configuration has not been under verification yet. It’d be perfectly fine to add asserts to that effect, and if they are broken, that’s definitely a bug.

The usual pattern for the all-in-queue-are-runnable invariant is that it is re-established almost immediately when broken, i.e. it should be true during most of the execution, not only at entry and exit. Threads should only be put into these queues when they are runnable, and they should be removed pretty much immediately when they become not-runnable.

I think you’re right that something needs to be done on handleInterruptEntry, but @curtis, @anna.lyons, @kent-mcleod2, and @corey.lewis probably know more about this than I.

Thanks for confirming Gerwin.

I assume assertions are only executing in release mode and not debug mode, so adding additional assertions like this don’t have any real costs, and are “a good thing” if it aids in code readability?

I’d suggest this be raised as a bug on the seL4 repository on GitHub.

Yes they do help in readability, and have no cost in release mode (they are only running in debug mode)

This discussion moved to: MCS charging interaction with ARM VCPU faults causes a kernel crash · Issue #346 · seL4/seL4 · GitHub

I find that invariant curious. If I read this correctly, it means that when a new thread is picked by the scheduler it would have to be explicitly dequeued from the run queue, despite it cearly being runnable. This seems to be an unnecessary operation. Why?
It’s not really a performance issue as long as it only happens when the scheduler actually runs, but it stil seems strange

I assume you mean the current-thread-not-in-queue. This is one is not actually proved, it only happens to be true (the function switchtToThread removes it from the queue). Might be by accident of implementation, I don’t really remember why that was done. We could try to remove it and see what breaks in the proofs.

Thinking about it (while bike riding :wink: ) I think the described behaviour actually makes sense: It means you never have to dequeue on the IPC fast path. So maybe it should be upgraded to an invariant.