Assertion failure in create_initial_thread during kernel boot on new platform

eleanor · August 2, 2023, 10:29pm

I’m trying to get seL4 running on an NVIDIA Jetson Nano and trying to get the stock sel4test project running as a first step. The Nano is very similar to the TX1 (they use the same TRM but the Linux device trees appear to be slightly different). I have the same issue (described below) whether I use the tx1 platform or the new nvnano platform I created (following the steps in the porting instructions and using a new dts and the existing tx1 implementation as a template).

The boot process makes it through the elfloader, and control is successfully passed off to the kernel, but during the execution of create_initial_thread, the following function call leads to an assertion failure:

cteInsert(
        root_cnode_cap,
        SLOT_PTR(pptr_of_cap(root_cnode_cap), seL4_CapInitThreadCNode),
        SLOT_PTR(rootserver.tcb, tcbCTable)
);

The assertion triggered is in cteInsert in cnode.c:

/* Haskell error: "cteInsert to non-empty destination" */
assert(cap_get_capType(destSlot->cap) == cap_null_cap);

destSlot is the third argument to the function, so rootserver.tcb, which is initialized earlier in create_rootserver_objects.

I think I’m working with something that’s close to a supported platform for which verification is in progress (AARCH64 via the TX2), so I’m guessing this is signalling that something is still wrong with my platform configuration, but I’m running low on ideas about what that could be since this is still very early startup, so I thought I’d see if this is something anyone else has seen before, or if there are any suggestions of areas to look at to try and track down what could be setting that memory unexpectedly.

Here’s the startup debug serial output from U-Boot until the failure:

U-Boot 2020.04-g46e4604c78 (Jul 26 2021 - 12:09:42 -0700)

SoC: tegra210
Model: NVIDIA Jetson Nano 2GB Developer Kit
Board: NVIDIA P3541-0000
DRAM:  2 GiB
MMC:   sdhci@700b0000: 1, sdhci@700b0600: 0
Loading Environment from SPI Flash... SF: Detected mx25u3235f with page size 256 Bytes, erase size 4 KiB, total 4 MiB
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0 
Tegra210 (P3541-0000) # fatload mmc 1 0x82000000 sel4test-driver-image-arm-nvnano
5210384 bytes read in 249 ms (20 MiB/s)
Tegra210 (P3541-0000) # go 0x82000000
## Starting application at 0x82000000 ...

ELF-loader started on CPU: ARM Ltd. Cortex-A57 r1p1
  paddr=[80a57000..80f4f10f]
No DTB passed in from boot loader.
Looking for DTB in CPIO archive...found at 80b7f900.
Loaded DTB from 80b7f900.
   paddr=[8023f000..80253fff]
ELF-loading image 'kernel' to 80000000
  paddr=[80000000..8023efff]
  vaddr=[ffffff8080000000..ffffff808023efff]
  virt_entry=ffffff8080000000
ELF-loading image 'sel4test-driver' to 80254000
  paddr=[80254000..8065afff]
  vaddr=[400000..806fff]
  virt_entry=40f0b8
Enabling MMU and paging
Jumping to kernel-image entry point...

Bootstrapping kernel
seL4 failed assertion 'cap_get_capType(destSlot->cap) == cap_null_cap' at /home/eleanor/work/kernel/src/object/cnode.c:426 in function cteInsert
halting...
Kernel entry via Interrupt, irq 0

kent-mcleod2 · August 2, 2023, 10:51pm

Hi,

I had this exact same problem when I was testing out a Jetson Nano. I didn’t get to keep working on the platform beyond getting sel4test passing, so here is my diff for what I needed to change to a tx1 platform to boot on a Jetson Nano board:

I recall KernelAArch64SErrorIgnore was needed to ignore an SError generated shortly after the sel4test app booted (Not sure if it’s required though)
Changing the memory layout definition in overlay-tx1.dts: The kernel assert is generated because the kernel uses the top of available physical memory for creating initial system memory allocations. The Jetson Nano either has no memory, or a reserved region above 0x1_7f200000 and so the available memory for kernel allocations needs to stop there. (This can be confirmed in a jetson nano device tree)

diff --git a/src/plat/tx1/config.cmake b/src/plat/tx1/config.cmake
index e311b23be..4a297009a 100644
--- a/src/plat/tx1/config.cmake
+++ b/src/plat/tx1/config.cmake
@@ -12,6 +12,7 @@ if(KernelPlatformTx1)
     declare_seL4_arch(aarch64)
     set(KernelArmCortexA57 ON)
     set(KernelArchArmV8a ON)
+    set(KernelAArch64SErrorIgnore ON)
     config_set(KernelARMPlatform ARM_PLAT tx1)
     config_set(KernelArmMach MACH "nvidia")
     list(APPEND KernelDTSList "tools/dts/tx1.dts")
diff --git a/src/plat/tx1/overlay-tx1.dts b/src/plat/tx1/overlay-tx1.dts
index 944f13cbc..60c6485e0 100644
--- a/src/plat/tx1/overlay-tx1.dts
+++ b/src/plat/tx1/overlay-tx1.dts
@@ -32,7 +32,7 @@
         * if something strange happens.
         */
        memory {
-               reg = <0x00 0x80000000 0x00 0x7f000000>,
-                               <0x01 0x00000000 0x00 0x80000000>;
+               reg = <0x00000000 0x80000000 0x00000000 0x7ee00000>,
+                               <0x00000001 0x00000000 0x00000000 0x7f200000>;
        };
 };

eleanor · August 3, 2023, 10:20pm

Thanks for the help. That was the issue and I managed to get the kernel to boot on the Nano with that change and by removing the second memory region in the overlay because I’m working on a 2GiB Nano. For the first region, it definitely needs to end before the 0xff000000 that’s in the tx1 overlay, and the 0xfee00000 you’re using works.

Oddly, using the actual Jetson Nano dts from the Linux kernel that I brought in using update-dts.sh to make a new PLATFORM configuration still fails, even with the change to the overlay. I’m not sure why that is yet. I checked in both the generated Linux kernel dts and a decompiled dts from the the dtb that NVIDIA ships with their Tegra4Linux distribution to verify the specific carveouts and I can’t find references to any of these carveouts, but this is definitely progress. Thanks again!

kent-mcleod2 · August 10, 2023, 6:16am

I sometimes find that I need to dump the booted device tree (dtc -O dts -I fs /proc/device-tree) due to overlays applied during system initialization.

Topic		Replies	Views
seL4bench test getting stuck at ((init_arm_kernel_t)kernel_info.virt_entry) seL4 kernel arm	4	66	January 20, 2025
Issue with seL4 & ARMv8 config for MMU seL4 kernel arm	29	141	February 28, 2025
invalidateLocalTLB() is getting stucked in sel4 kerenl New to seL4 arm	0	31	September 1, 2024
seL4tests on Zynq7000 failing New to seL4 arm	6	244	February 3, 2023
Adding rockpro64 to sel4cp hello world example New to seL4 arm	2	176	August 30, 2023

Assertion failure in create_initial_thread during kernel boot on new platform

Related topics