Device Region Untypeds

When reviewing the untyped regions that were created for my platform I discovered that there was a considerable amount of fragmentation.

Following is some debug output from my initial thread that is printing the bootinfo data structure.

The physical RAM ends at 0x00000000a0000000 - 1. So we expect that the kernel provides untyped memory caps to cover the region from 0x00000000a0000000 to the end of the 40-bit physical address space (0x0000010000000000). But, we don’t get there! Instead we get progressively closer, but end up at 0x000000fffffffff0, and the size of the untyped is 4-bits (which is not actually useful given the minimum page size is 12-bits.

I could be missing something here and this is intentional. But I think this is incorrect, and the problem stems from the definition of KernelPaddrUserTop in config.cmake.

For ARM this is defined as:

   math(EXPR KernelPaddrUserTop "(1 << 40) - 1")

(Or similar depending on the size). I think the - 1 is the problem. KernelPaddrUserTop should be boundary exclusive not inclusive. X86 seems to get this correct, but ARM and RISC-V has the -1.

If I change this then my untyped list is much more reasonable. (See second example output below).
Only 10 regions are required, not 44. And the bits never get below 29.

The reason for getting down to 4-bit regions is that create_untypeds_for_region is used for both normal memory and device memory. For normal memory it can make sense to have a region as small as seL4_MinUntypedBits but that doesn’t make sense for device memory.

I think the check in this code:

    if (size_bits >= seL4_MinUntypedBits) {
        if (!provide_untyped_cap(root_cnode_cap, device_memory, reg.start, size_bits, first_untyped_slot)) {
            return false;
        }
    }

Should be against the minimum page size in the case where device_memory is true.

I’d appreciate any feedback on this before proposing patches / pull requests. It’s possible that I’m missing something here and this is intentional for some reason I don’t grok at this point.

untypedList[0x00000014]        = slot: 0x0000002c, paddr: 0x00000000a0000000 - 0x00000000c0000000 (device) bits: 0x0000001d
untypedList[0x00000015]        = slot: 0x0000002d, paddr: 0x00000000c0000000 - 0x0000000100000000 (device) bits: 0x0000001e
untypedList[0x00000016]        = slot: 0x0000002e, paddr: 0x0000000100000000 - 0x0000000200000000 (device) bits: 0x00000020
untypedList[0x00000017]        = slot: 0x0000002f, paddr: 0x0000000200000000 - 0x0000000400000000 (device) bits: 0x00000021
untypedList[0x00000018]        = slot: 0x00000030, paddr: 0x0000000400000000 - 0x0000000800000000 (device) bits: 0x00000022
untypedList[0x00000019]        = slot: 0x00000031, paddr: 0x0000000800000000 - 0x0000001000000000 (device) bits: 0x00000023
untypedList[0x0000001a]        = slot: 0x00000032, paddr: 0x0000001000000000 - 0x0000002000000000 (device) bits: 0x00000024
untypedList[0x0000001b]        = slot: 0x00000033, paddr: 0x0000002000000000 - 0x0000004000000000 (device) bits: 0x00000025
untypedList[0x0000001c]        = slot: 0x00000034, paddr: 0x0000004000000000 - 0x0000008000000000 (device) bits: 0x00000026
untypedList[0x0000001d]        = slot: 0x00000035, paddr: 0x0000008000000000 - 0x000000c000000000 (device) bits: 0x00000026
untypedList[0x0000001e]        = slot: 0x00000036, paddr: 0x000000c000000000 - 0x000000e000000000 (device) bits: 0x00000025
untypedList[0x0000001f]        = slot: 0x00000037, paddr: 0x000000e000000000 - 0x000000f000000000 (device) bits: 0x00000024
untypedList[0x00000020]        = slot: 0x00000038, paddr: 0x000000f000000000 - 0x000000f800000000 (device) bits: 0x00000023
untypedList[0x00000021]        = slot: 0x00000039, paddr: 0x000000f800000000 - 0x000000fc00000000 (device) bits: 0x00000022
untypedList[0x00000022]        = slot: 0x0000003a, paddr: 0x000000fc00000000 - 0x000000fe00000000 (device) bits: 0x00000021
untypedList[0x00000023]        = slot: 0x0000003b, paddr: 0x000000fe00000000 - 0x000000ff00000000 (device) bits: 0x00000020
untypedList[0x00000024]        = slot: 0x0000003c, paddr: 0x000000ff00000000 - 0x000000ff80000000 (device) bits: 0x0000001f
untypedList[0x00000025]        = slot: 0x0000003d, paddr: 0x000000ff80000000 - 0x000000ffc0000000 (device) bits: 0x0000001e
untypedList[0x00000026]        = slot: 0x0000003e, paddr: 0x000000ffc0000000 - 0x000000ffe0000000 (device) bits: 0x0000001d
untypedList[0x00000027]        = slot: 0x0000003f, paddr: 0x000000ffe0000000 - 0x000000fff0000000 (device) bits: 0x0000001c
untypedList[0x00000028]        = slot: 0x00000040, paddr: 0x000000fff0000000 - 0x000000fff8000000 (device) bits: 0x0000001b
untypedList[0x00000029]        = slot: 0x00000041, paddr: 0x000000fff8000000 - 0x000000fffc000000 (device) bits: 0x0000001a
untypedList[0x0000002a]        = slot: 0x00000042, paddr: 0x000000fffc000000 - 0x000000fffe000000 (device) bits: 0x00000019
untypedList[0x0000002b]        = slot: 0x00000043, paddr: 0x000000fffe000000 - 0x000000ffff000000 (device) bits: 0x00000018
untypedList[0x0000002c]        = slot: 0x00000044, paddr: 0x000000ffff000000 - 0x000000ffff800000 (device) bits: 0x00000017
untypedList[0x0000002d]        = slot: 0x00000045, paddr: 0x000000ffff800000 - 0x000000ffffc00000 (device) bits: 0x00000016
untypedList[0x0000002e]        = slot: 0x00000046, paddr: 0x000000ffffc00000 - 0x000000ffffe00000 (device) bits: 0x00000015
untypedList[0x0000002f]        = slot: 0x00000047, paddr: 0x000000ffffe00000 - 0x000000fffff00000 (device) bits: 0x00000014
untypedList[0x00000030]        = slot: 0x00000048, paddr: 0x000000fffff00000 - 0x000000fffff80000 (device) bits: 0x00000013
untypedList[0x00000031]        = slot: 0x00000049, paddr: 0x000000fffff80000 - 0x000000fffffc0000 (device) bits: 0x00000012
untypedList[0x00000032]        = slot: 0x0000004a, paddr: 0x000000fffffc0000 - 0x000000fffffe0000 (device) bits: 0x00000011
untypedList[0x00000033]        = slot: 0x0000004b, paddr: 0x000000fffffe0000 - 0x000000ffffff0000 (device) bits: 0x00000010
untypedList[0x00000034]        = slot: 0x0000004c, paddr: 0x000000ffffff0000 - 0x000000ffffff8000 (device) bits: 0x0000000f
untypedList[0x00000035]        = slot: 0x0000004d, paddr: 0x000000ffffff8000 - 0x000000ffffffc000 (device) bits: 0x0000000e
untypedList[0x00000036]        = slot: 0x0000004e, paddr: 0x000000ffffffc000 - 0x000000ffffffe000 (device) bits: 0x0000000d
untypedList[0x00000037]        = slot: 0x0000004f, paddr: 0x000000ffffffe000 - 0x000000fffffff000 (device) bits: 0x0000000c
untypedList[0x00000038]        = slot: 0x00000050, paddr: 0x000000fffffff000 - 0x000000fffffff800 (device) bits: 0x0000000b
untypedList[0x00000039]        = slot: 0x00000051, paddr: 0x000000fffffff800 - 0x000000fffffffc00 (device) bits: 0x0000000a
untypedList[0x0000003a]        = slot: 0x00000052, paddr: 0x000000fffffffc00 - 0x000000fffffffe00 (device) bits: 0x00000009
untypedList[0x0000003b]        = slot: 0x00000053, paddr: 0x000000fffffffe00 - 0x000000ffffffff00 (device) bits: 0x00000008
untypedList[0x0000003c]        = slot: 0x00000054, paddr: 0x000000ffffffff00 - 0x000000ffffffff80 (device) bits: 0x00000007
untypedList[0x0000003d]        = slot: 0x00000055, paddr: 0x000000ffffffff80 - 0x000000ffffffffc0 (device) bits: 0x00000006
untypedList[0x0000003e]        = slot: 0x00000056, paddr: 0x000000ffffffffc0 - 0x000000ffffffffe0 (device) bits: 0x00000005
untypedList[0x0000003f]        = slot: 0x00000057, paddr: 0x000000ffffffffe0 - 0x000000fffffffff0 (device) bits: 0x00000004

With fix:

untypedList[0x00000014]        = slot: 0x0000002c, paddr: 0x00000000a0000000 - 0x00000000c0000000 (device) bits: 0x0000001d
untypedList[0x00000015]        = slot: 0x0000002d, paddr: 0x00000000c0000000 - 0x0000000100000000 (device) bits: 0x0000001e
untypedList[0x00000016]        = slot: 0x0000002e, paddr: 0x0000000100000000 - 0x0000000200000000 (device) bits: 0x00000020
untypedList[0x00000017]        = slot: 0x0000002f, paddr: 0x0000000200000000 - 0x0000000400000000 (device) bits: 0x00000021
untypedList[0x00000018]        = slot: 0x00000030, paddr: 0x0000000400000000 - 0x0000000800000000 (device) bits: 0x00000022
untypedList[0x00000019]        = slot: 0x00000031, paddr: 0x0000000800000000 - 0x0000001000000000 (device) bits: 0x00000023
untypedList[0x0000001a]        = slot: 0x00000032, paddr: 0x0000001000000000 - 0x0000002000000000 (device) bits: 0x00000024
untypedList[0x0000001b]        = slot: 0x00000033, paddr: 0x0000002000000000 - 0x0000004000000000 (device) bits: 0x00000025
untypedList[0x0000001c]        = slot: 0x00000034, paddr: 0x0000004000000000 - 0x0000008000000000 (device) bits: 0x00000026
untypedList[0x0000001d]        = slot: 0x00000035, paddr: 0x0000008000000000 - 0x0000010000000000 (device) bits: 0x00000027

I think that the - 1 is an artifact from 32-bit architectures where 2^32 would overflow and so UINTPTR_MAX ended up being used as the limit. So that plus other associated overflow issues resulting in the top most address becoming unaddressible and fragmenting the untypeds as this address is approached. Thinking about it now, I wonder if we could just switch to 64bit variables for 32bit and switch to boundary exclusive KernelPaddrUserTop definitions as you suggest. This would fit with seL4_UserTop definitions also.

This change makes sense to me too.

Switching to 64-bit variable fixes this. But shouldn’t we better make the code cope with the “-1” ,as the same problem might happen again at the end of the 64-bit space also, if some (future) platform puts memory or devices there.

I think I would prefer that. Changing all the proofs that deal with KernelPaddrUserTop sounds unpleasant (doable if really needed, but unpleasant). If we can instead change the logic in this loop to work correctly with an inclusive top address, the changes wouldn’t affect the proofs.

Yes, that would indeed make sense.

Just to be clear, x86_64 already has top as exclusive (or at least it appears that way).

It’s surprising to me (but I know nothing on proofs) that changing this could be impactful given it’s already that way on one architecture.

If the proposal is for changing it just for Arm 64bit platforms, then there is actually no impact, and I don’t really mind either way. If it changes Arm 32 or RISC-V, there there would be impact (because the C-level proofs are separate per architecture)

Likely just overlooked in refactoring. The history here is that the arm and x86 code were completely unique (different authors with no attempt at code reuse between architectures). We slowly pulled them back together where it made sense. Likely this define was never made consistent as the proofs do include this constant in address space code as well as boot code.

Thanks! The history is helpful

The 12.1.0 has fixed this on ARM/Aarch64. But on Aarch32 and RISC-V the problem still exists. I will provide more patches then, since I’m running into this at RISC-V now. It seems that for all architectures except ia32 this is a 2^n value, just ia32 has 0xffff0000 there. This make me wonder if ia32 shouldn’t get a special handling here in the code to somehow exclude the top 64k…?