Pre-RFC: Binding notifications to endpoints

Currently, a service that needs to receive many notifications has a problem. The only scalable way to do so is to use an endpoint, but it is a bad idea for to make an IPC call to an untrusted endpoint, so this only works if the service is trusted.

I came up with the idea of notification queues, but Curtis Millar came up with a better suggestion: allow notifications to be bound to endpoints, rather than threads. Any number of notifications could be bound to an endpoint, so this scales to arbitrarily many notification objects. The only limit is available memory.

Sadly, I won’t be able to implement this in the near term, and even if I could, I have zero experience with Isabelle, so I wouldn’t be able to help with the verification changes. That said, I do believe that this is a good idea and would be willing to write an RFC for this.

Thoughts?

@kent.mcleod has pointed out that VCPUs currently rely on notifications being bound to threads to emulate IRQ injection. That could possibly be changed out for something more direct. Other than that, I suspect the verification changes would be quite substantial.

That’s unfortunate. Do you believe they are a good idea, considering the amount of effort required? In the meantime, what workaround do you recommend?

It’s hard to say what the verification impact would be, but adding notification binding to TCBs was one of the ones we underestimated by quite a bit, so I’d be wary of this one. We’d need to look at an implementation, preferably in Haskell to judge the impact.

Before we go there, I haven’t fully understood the problem scenario yet, though, and if it would be possible to solve it with the current API. Why is the only scalable way to receive many notifications using an endpoint? Is it because the notification data word is limited?

Are you able to elaborate on this @curtis? x86 and Arm have different VCPU designs but I don’t think either would prevent implementations that allow binding notifications to endpoints.

Would introducing notification-endpoint binding be combined with removing TCB-notification binding?

Is it because the notification data word is limited?

Yea. The issue is that the fan-out of a notification is limited by the number of badge bits. Increasing this requires a TCB and notification for every 28/64 channels you want to disambiguate.

I’m guessing that the likely real-world examples are similar to the ones that motivated more efficient mechanisms than the POSIX poll and select on Unix OSes where you could have many thousands of clients of a server that all use it sporadically (https://en.wikipedia.org/wiki/C10k_problem).

There’s also a need for interrupt handling where if you want to bind multiple IRQ handlers under different badges you also get constrained by the limited badge space.

We’ve usually assumed that for the first argument most connections that involve a notification will also have shared memory and extending beyond 28/64 badge bits can be handled by sharing badges and using shared memory to check which client has work pending (kind of like how message-signalled-interrupts scale multiple interrupt sources beyond the smaller number of interrupt lines).

The issue with hardware interrupt handlers would benefit from notification-endpoint binding because forwarding an interrupt through multiple notifications and TCBs makes latency worse for interrupt handling.

That still requires O(n) polling, though, just as select() and poll() do. Binding notifications to endpoints allows O(1) polling on arbitrarily many connections, just as epoll(), kqueue(), and completion-based APIs do. A userspace implementation is possible, but would have higher latency and lower throughput.

Those are exactly the examples I am thinking of. For instance, a natural implementation of a socket is a pair of ring buffers (one for each direction) and a pair of notifications (again, one for each direction). However, that won’t scale well with the current kernel APIs.

The idea I had in mind would be to replace notifications with an object that can be queued on an endpoint without blocking the thread that queued it. Adding the ability to ‘bind a notification to an endpoint’ is another alternative that would also work and not involve removing notification objects.

Binding notifications to endpoints would also be a much smaller change to the API. We’d probably just enqueue notifications on their bound endpoint with higher priority than blocked threads or add priorities to notification objects.

Those are exactly the examples I am thinking of. For instance, a natural implementation of a socket is a pair of ring buffers (one for each direction) and a pair of notifications (again, one for each direction). However, that won’t scale well with the current kernel APIs.

But if you had a network service providing a socket API to clients that uses a series of ring buffers and notifications, you only need one notification channel for each direction that allows the two “processes” to notify each other that work is available, and then you can use shared memory to indicate which sockets the work relates to can’t you?

That makes sense. So binding a notification object to an endpoint would return immediately.

You can, so that might not have been the best example. A better example would be the network service itself: it may very well have hundreds of clients, all of whom distrust both each other and the network service.

More generally, a service should be able to efficiently serve an arbitrarily large number of clients (limited only by available resources) without those clients having to trust the service.