Looking at the CLH big lock in the kernel, it uses sel4_atomic_exchange
to add threads to the queue of cores. On x86 this uses gcc’s __atomic_exchange_n
however on arm (aarch32 and aarch64) this is hardcoded to use the load/store exclusive instructions.
I’m playing around with the lock on aarch64 and noted that __atomic_exchange_n
is implemented on my toolchain (7.4.0). Even better, if I set the KernelArmMachFeatureModifiers
to include +lse
it will generate code that uses the latest armv8, large system extension instructions, which has more performance atomics.
I’m happy to put up a PR, but I’m curious as to why arm does not use __atomic_exchange_n
. Any ideas? cc @amirreza.zarrabi.