Why does the CLH lock not use GCC for ARM atomics?

Looking at the CLH big lock in the kernel, it uses sel4_atomic_exchange to add threads to the queue of cores. On x86 this uses gcc’s __atomic_exchange_n however on arm (aarch32 and aarch64) this is hardcoded to use the load/store exclusive instructions.

I’m playing around with the lock on aarch64 and noted that __atomic_exchange_n is implemented on my toolchain (7.4.0). Even better, if I set the KernelArmMachFeatureModifiers to include +lse it will generate code that uses the latest armv8, large system extension instructions, which has more performance atomics.

I’m happy to put up a PR, but I’m curious as to why arm does not use __atomic_exchange_n. Any ideas? cc @amirreza.zarrabi.

It is due to the implementation of the exclusive ops in arm; __atomic_exchange_n needs to have some form of the loop so that it can check the result of the exclusive set operation to confirm the validity of atomic operation. In our case using __atomic_exchange_n can results in undefined delay (up to seconds sometimes) as we touch ipi flag (as it can be in a single exclusive reservation granule) by other cores.

Thanks for the response!

Is this true for aarch64 with LSE? Why is it not true for x86?

ld/st-ex could fail for various reasons, for instance, interrupts. The aarch64 + LSE uses “swp”, so my guess is that it will not have the issue.

x86 uses “xchg” instruction.