Forward vs. Backward Simulation in the Refinement Proof

I have heard refinement, forward simulation, and backward simulation all mentioned in the broader context of semantic preservation between two processes. The distinction confuses me a bit.

For example, in A formally verified compiler back-end (about CompCert) Leroy equates refinement to backward simulation: “To account for this degree of freedom,
we can consider a backward simulation, or refinement, property:” CompCert eventually chose forward simulation for their proof, but is it considered a refinement in this case?

The TLA+ community also frequently uses refinement in proofs, but it appears to me that they generally use backward simulation based on the refinement section in Leslie Lamport’s Computation and State Machines paper and Leroy’s definition above. I say this because their correctness statements are generally of the form Implementation => Specification instead of the other way around.

In the sel4 refinement paper, forward simulation is shown to imply refinement, and that is what’s chosen for the correctness statement. So my question is, why was forward simulation chosen for the refinement statement in sel4 vs. other kinds of simulation? And if possible, are there discrepancies between any of these definitions in the verification community?

For background, I’m working on a certifying compiler that’s being built around the notion of semantic preservation. When speaking about it, I’ve been finding myself getting confused between the differences in properties of simulations vs. refinement. I chose forward simulation as the correctness property to generate during compilation, inspired by sel4, Cogent, and CompCert, but I want to make sure that I’m both talking about and implementing it correctly.

The terminology around simulations is unfortunately a bit of a mess.

Firstly, equating refinement with simulation is a bit fast and loose. Refinement is not the same as simulation. Refinement is implied by certain forms of simulations, it may also be equivalent to some, depending on the details. Refinement at least is fairly uncontroversial in terminology: it is behaviour inclusion, or equivalently (in the other direction) property preservation. Simulations are more proof techniques to establish that, and there are a lot of different kinds.

Forward simulation implies refinement in a very straightforward way, and if your concrete execution is reasonably easy to work with, can be shown directly for specific programs/specs. For seL4 that was the case.

For compilers, the concrete execution is assembly and it usually not very nice to work with, whereas the abstract execution is often much nicer, e.g. in a language you can do induction on etc. and IIRC, the simulation in CompCert does indeed go from source program execution to assembly program execution.

Without further reasoning, this would be the wrong direction, because you do not get property preservation from it (properties about source programs would not imply properties about target programs). But if both sides are deterministic and terminate equivalently, you do get property preservation, and backward simulation is equivalent to forward simulation.
Since Compcert uses co-induction to talk about potentially non-terminating execution, it has an argument about equivalence of termination between source and assembly program. Since under those conditions forward simulation and backward simulation are the same strength, and backward simulation is much nicer to work with in a compiler setting, that is the one to use.

That’s basically what I’m finding, glad to hear it’s not just me. I’ve been working my way through this paper as well which seems to at least attempt to provide a categorization of them: Forward and Backward Simulations Part I: Untimed Systems. Which, interestingly, does include refinement as a topic.

I’m actually working on a source-to-source compiler and not compiling to something low level like assembly, which sounds similar to the sel4 and Cogent use cases. That’s why I chose forward simulation as the correctness property, but I’m still trying to build a deeper intuition about the tradeoffs of each kind. The info here is super helpful towards that of course. Thanks.