Selecting right strategy for realizing processor diagnostic-self-testing at runtime for safety

ot180 · August 12, 2022, 9:02am

As I’m new to seL4, locking for the right strategy to realize processor diagnostic-self-testing at runtime for a safety application. As I understand “normal” seL4 task operate on EL0 (ARM) and therefore have only limited access to assembler commands and no access to special registers etc. (e.g. verifying cache functionality, mmu functionality, …)
To my current understanding I cannot realize this with a normal seL4 task, but an extension of the kernel is required to establish a, let’s call it system-call - callback, to attach such function to a seL4 system, operating at kernel EL level or below.
What do you think?

axel · August 12, 2022, 9:28pm

Can you be more specific about the functions you need? There are some tracing features available in the kernel that can be enabled. There are also some branches that try to improve things here to support more trace points. There is also a syscall that can be enabled to execute a function in kernel context.

ot180 · August 14, 2022, 6:06pm

Obviously I’m to brief. We are evaluating seL4 for use in highly dependable systems, according to IEC61508 and EN50128 with safety integrity level SIL3/4. From these regulations not only approved software is required, but also detailed runtime diagnostics of all hardware components involved in safety functions.
For the processor unit this means e.g.:

verifying each processor command
verifying each processor register
verifying effectiveness of memory protection/separation (MMU)
verifying caches (L1, L2, L3…) to be free of stuck-at errors (data/addresses)
…
and other components like
verifying system timer (clock) to stay in its allowed drift window of some ppm
verifying memory (DDR ram) for stuck-at/cross-over/SEU errors (data/addresses)
…

Today with our own kernel, we establish the software modules related to these runtime diagnostics (verification) as modules called at kernel level.

If there is a syscall to let such modules executing at kernel level, as you say, this sounds reasonable way.
(Of course I know that such modules need be trusted and separation demands are somewhat eroded. But as told in the formal verification papers, you have to rely on proper hardware function…)

gerwin.klein · August 15, 2022, 10:42am

Interesting requirements. Currently there are no such system calls in seL4, but it sounds like most of these tests (if done in a traditional way) could be implemented in a high-privilege user-level thread without kernel involvement. In fact, some of these are done in the sel4test app for testing the kernel (e.g. MMU/kernel providing separation, some cache tests). All of these sel4test runs are implemented on user level. Same should be possible for RAM, clock, user-level registers and user-level processor commands.

For testing privileged instructions and registers, we would likely need to add a system call that does these. Do you have any pointers to example implementations of such tests on the architecture and platform you are interested in?

(I do feel the need to point out that the traditional way of establishing these things does not actually mean that much. E.g. you are not guaranteed to get separation from the MMU at runtime just because it provided separation in a few test cases. But I can see how it is useful to at least have a diagnostic test to make sure you are not in a state where it just always fails.)

ot180 · August 16, 2022, 6:45am

Yes you are right, I think that as much as possible should be done on a high priority user-level task for each core. Even for user-land memory having a separate non-cacheable view on the physical memory will be a way to go.
For privileged commands and resources, exception handling, kernel memory etc. some kernel level solution might be necessary.
(If you have a bunch of “very traditional” notified-bodies and national regulative authorities, not only reasonable technical solutions but also effort to justify these to all of them are considered.
Nevertheless I still open for new approaches.)

david · October 3, 2022, 8:56am

I might be missing something here but have you considered running your testing procedure as a sort of additional boot loader stage (i.e. run the diagnostics that aren’t feasible to run from inside seL4 as bare metal code and only then proceed to boot into seL4)? If your regulations permit that it might be easier than having to modify the kernel.

Topic		Replies	Views
How do I add a new device driver to seL4? seL4 kernel	0	816	January 3, 2019
Welcome to Discourse	0	440	January 2, 2019
19 November 2019 Release Announcements	0	1223	February 18, 2020
Formal security properties Verification	6	272	March 24, 2023
How can I port seL4 to a new ARM hardware platform? seL4 kernel	2	752	February 27, 2019

Selecting right strategy for realizing processor diagnostic-self-testing at runtime for safety

Related topics