Multiple Domains Per Thread Proof Impact

The first level time-sliced domain scheduler is a nice fit for some real-time applications, but the limitation of having each thread belong to only one domain complicates the system when a server cannot belong to multiple domains. [1] While it is clear that the seL4 information flow proof depends on this restriction [2], it is not clear how dire relaxing this restriction would be to the main proof.

For example, if the restriction was relaxed so that a thread could belong to up to MAX_DOMAINS_PER_THREAD domains, and the information flow proof was declared to only be valid for cases where this value was set to 1, what level of proof effort would be required to reestablish the functional correctness of this scheduling change?

Hoping that someone might have a quick, reasonable guess, since the verification effort of MCS seems to be touching on a similar area.

[1] Q. Kang, C. Yuan, X. Wei, Y. Gao and L. Wang, “A User-Level Approach for ARINC 653 Temporal Partitioning in seL4,” 2016 International Symposium on System and Software Reliability (ISSSR) , Shanghai, 2016, pp. 106-110, doi: 10.1109/ISSSR.2016.025.
[2] T. Murray et al ., “seL4: From General Purpose to a Proof of Information Flow Enforcement,” 2013 IEEE Symposium on Security and Privacy , Berkeley, CA, 2013, pp. 415-429, doi: 10.1109/SP.2013.35.

The main question for that is how it would be implemented. A TCB needs to be able to figure out in which domain it is. Currently this is a one-byte field in the tcb structure. If a thread can belong to multiple domains, this would have to become an array or list.

A one-element array would be close enough to the current state for verification to not be a big deal, I think. Without looking through the code in detail, I can’t tell wether there are any surprises waiting in other parts of the kernel, i.e. if there is some code that needs to be significantly generalised to deal with the fact that there might be multiple domains for a thread. But my gut feeling so far is that this would not be high effort.

To understand the use case better: you’d want to use this for a trusted server that needs to serve multiple domains?

Would it be possible to run one server in each domain, but for instance, to share memory between then domains for coordination? (essentially implementing the shared server as a distributed server)

This might even make it easier to reason why that server does not break information flow between those domains (I assume we do care about that, because if we don’t care wether the server breaks information flow, we’re probably don’t need domains in the first place and could go directly with MCS or strict priorities).

Yes, a trusted server serving multiple domains.

Thanks for the distributed server suggestion. It could work, though we would have to think more about it.

To be clear, we are more concerned in this case about strict time-slice partitioning for safety than we are about information flow. Adapting the domain scheduler to this use case is mostly straight-forward, but we are not sure MCS can be adapted without a userspace scheduling component that replaces the sporadic server algorithm.

For safety that makes more sense, although you’d still have to argue that the trusted server does not take up too much time.

It sounds to me like MCS would be the right mechanism for it, I don’t think it’d need user-level scheduling, but I don’t know enough about its practical applications yet to make a recommendation. @anna.lyons or @curtis might be able to say more about this, though.

The present implementation maintains scheduler queues per domain; with threads selected from queues within that domain being made to execute. Any operation which moves a thread into or out of a scheduling queue would need to add or remove it from the appropriate scheduling queues on every domain for which that thread is included modifying the scheduling state of those domains.

I think the implementation would mostly generalise to this well, there may be some places that would need extra work. It’s also possible that these may interact in an undesirable manner with a task being appended to the end of the priority level’s queue in all domains after it completes execution in any one domain.

As for MCS, the scheduler only enforces sporadic server in the kernel, so any strict time-slicing such as exists in the domain scheduler requires essentially a user-level implantation of the domain scheduler. Having said that, is there a reason why the constraints you get from the sporadic server algorithm wouldn’t give you the kind of guarantees you need for safety?

There are further problems with attempting to make threads part of more than one domain. Each TCB contains the pointers that form the list for scheduler (tcbSchedNext, tcbSchedPrev). This is an artifact of the kernel not doing any dynamic memory allocation. Attempting to put threads into multiple queues means you now need N pointers for N potential queues.

I’d suggest using multiple threads in the same address space that run in different domains and using the fact that the threads can’t be scheduled at the same time to solve concurrency problems.

I think MCS can handle schedulability and timing quite well, but I am not sure it would be capable of addressing the non-interference requirements. Since threads can “interfere” with each other in all kinds of unintended ways (e.g. cache, memory controllers, etc…), we have to be reasonably sure that only related threads can be running in a given time-slice, and this is what the domain scheduler provides that MCS does not.

This sounds like the best path forward currently.

However, ARINC653 does have an extension that allows the schedule to be changed during runtime to a selection of pre-defined schedules. From this discussion it sounds like making multiple possible domain schedules might not be terribly hard, as long as domain membership does not change, but then the proof would have to be able to handle an array of candidate domain schedules instead of a single pre-defined one.

I think that would be quite doable, yes. At least as long as the number of domains remains the same, and we don’t want to create new schedules dynamically (that would also be doable, but would need a mechanism to provide storage for the new schedule).

We could re-use the existing DomainCap for the authority for that operation, possibly extending it slightly (e.g. a flag that says whether you can switch domains).

You leak information when you switch, of course, so in the information flow theorem you’d have to demand you don’t have this cap, but that’s similar to other global operations that you want a trusted component for.