seL4test get stuck in FPU0001 test on qemu-arm-virt with givc3 enable

HI, I’m recently doing some works on a platform with gicv3, and I decide to run some tests on the qemu arm platform for better debugging.
The default qemu configuration sets gicv2 as the default IRQ controller. So I modified some configurations in the plat config files and simulators scripts to switch to gicv3:
First, I added some code in elfloader plat init section to enable the power feilds of gicv3.

// tools/seL4/elfloader-tool/src/plat/qemu-arm-virt/platform_init.c:
+static void wakeup_pe()
+{
+    /* only wake up the core 0 since not using smp */
+    volatile unsigned int* p = (volatile unsigned int*)(0x80a0000 + 0x14);
+    while(*p & 4)
+    {
+        *p = *p & ~2;
+    }
+}

/* Reset the virtual offset for the platform timer to 0 */
void platform_init(void)
{
+    wakeup_pe();
    reset_cntvoff();
}

And the I added the gic-version=3 to the simulator configuration file, to make simulator start qemu arm virt with gicv3.

// tools/seL4/cmake-tool/helpers/simulation.cmake:150
            set(sim_machine "virt,gic-version=3")

Last I modified the plat config file in seL4 kernel.

// kernel/src/plat/qemu-arm-virt/config.cmake:

            if(NOT DEFINED QEMU_MACHINE)
                set(QEMU_MACHINE "virt")
+                list(APPEND QEMU_MACHINE "gic-version=3")
......

    declare_default_headers(
        TIMER_FREQUENCY 62500000
        MAX_IRQ 159
        NUM_PPI 32
        TIMER drivers/timer/arm_generic.h
-        INTERRUPT_CONTROLLER arch/machine/gic_v2.h
+       INTERRUPT_CONTROLLER arch/machine/gic_v3.h
        CLK_MAGIC 4611686019llu
        CLK_SHIFT 58u
        KERNEL_WCET 10u
    )

endif()

add_sources(
    DEP "KernelPlatformQEMUArmVirt"
-     CFILES src/arch/arm/machine/gic_v3.c src/arch/arm/machine/l2c_nop.c
+    CFILES src/arch/arm/machine/gic_v3.c src/arch/arm/machine/l2c_nop.c
)

Now compile and run:

$ ../init-build.sh -DPLATFORM=qemu-arm-virt && ninja
$ ./simulate
$ ./simulate: QEMU command: qemu-system-aarch64 -machine virt,gic-version=3 -cpu cortex-a53 -nographic  -m size=2G  -kernel images/sel4test-driver-image-arm-qemu-arm-virt
........
$ Test FPU0000 passed
$ Starting test 36: FPU0001
$ Running test FPU0001 (Ensure multiple threads can use FPU simultaneously)
# got stuck here

So can some one tells me did I miss some points or are there some bugs in qemu with gicv3? Thanks!

I used gdb to find the points:

Breakpoint 6, fpu_worker (p1=268502608, p2=65536, p3=0, p4=0) at /host/projects/sel4test/apps/sel4test-tests/src/tests/fpu.c:78
78          return num_preemptions;
(gdb) c
Continuing.

Breakpoint 6, fpu_worker (p1=268502600, p2=65536, p3=0, p4=0) at /host/projects/sel4test/apps/sel4test-tests/src/tests/fpu.c:78
78          return num_preemptions;
(gdb) p num_preemptions
$12 = 0
(gdb) 

Here may be always 0 returned from the fpu_worker() threads which means it was never preempted.
Here is the code:

// projects/sel4test/apps/sel4test-tests/src/tests/fpu.c:
static int
fpu_worker(seL4_Word p1, seL4_Word p2, seL4_Word p3, seL4_Word p4)
{
    volatile double *state = (volatile double *)p1;
    int num_iterations = p2;
    static volatile int preemption_counter = 0;
    int num_preemptions = 0;

    while (num_iterations >= 0) {
        int counter_init = preemption_counter;

        /* Do some random calculation (where we know the result). */
        double a = fpu_calculation();

        /* It's workaround to solve precision discrepancy when comparing
         * floating value in FPU from different sources */
        *state = a;
        a = *state;

        /* Determine if we were preempted mid-calculation. */
        if (counter_init != preemption_counter) {
            num_preemptions++;
        }
        preemption_counter++;

        num_iterations--;
    }

    return num_preemptions;
}

So, why the result came up while gicv3 enabled but not with gicv2, How to solve it?

I just had the same with an ODROID_XU4_release_clang_32 sel4test run, Odroid uses GICv2, so it’s not GICv3 related, other than that the timing might be slightly different with GICv3 making it easier to trigger the bug.

Someone else had the same problem: morello: Add initial support for a CPU and a QEMU platform by heshamelmatary · Pull Request #1157 · seL4/seL4 · GitHub

handleFPUFault() runs without the kernel lock held, but that seems okay, as it only touches core specific data and the currently running task.

Edit: Hesham found out that it’s because we don’t get any timer interrupts and opened seL4 doesn't boot with GICv3 on QEMU · Issue #1170 · seL4/seL4 · GitHub to track this issue. That explains why you saw zero pre-emptions.

Odroid failing is unrelated to this. But that board has some other issues too, so it could be either HW issues or the clang locking bug.