Both options are certainly correct, but they are not quite equivalent, due to the slightly broader applicability of stand-alone fences (they are equivalent in terms of what you want to accomplish, but the stand-alone fence could technically apply to other things as well -- imagine if this code is inlined). An example of how a stand-alone fence is different from a store/fetch fence is explained in this post by Jeff Preshing.
The check-then-fence pattern in option #2 does not have a name as far as I know. It's not uncommon, though.
In terms of performance, with my g++ 4.8.1 on x64 (Linux) the assembly generated by both options boils down to a single load instruction. This is hardly surprising given that x86(-64) loads and stores all have acquire and release semantics at the hardware level anyway (x86 is known for its quite strong memory model).
For ARM, though, where memory barriers compile down to actual individual instructions, the following output is produced (using gcc.godbolt.com with -O3 -DNDEBUG):
You can see that the only difference is where the synchronization instruction (dmb) is placed -- inside the loop for poll1, and after it for poll2. So poll2 really is more efficient in this real-world case :-)
Asked in February 2016Viewed 3,232 timesVoted 6Answered 1 times