Memory Barriers

Why Memory Barriers?

Because the CPU and/or the compiler can reorder the instructions written in program order. Modern processors and Compilers try to optimize the program by reordering the instructions all the time. But the observed effects (on load and stores on memory locations) are consistent.

Sequential Consistency

is defined as the result of any execution is the same as if the read and write operations by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program [Lamport, 1979].
which means:

  • The instructions executed by the same CPU in order as they were written.
  • all the threads should observe the effect of loads/stores to any shared variable.

Sequential consistency is very important especially in multi-threaded programs because when a thread changes the shared variable, the other threads should see the variable in a consistent and valid state.

Total Store Ordering

Modern processors have a buffer where they store the updates to the memory location  called as store buffer. The reason, updates do not go to main memory directly is writes to main memory are very slow and value from store buffer can be reused because of spatial locality.

Architectures with Strong Store Ordering guarantees: x86, SPARC
Architectures with weak Store Ordering guarantees: ARM, POWER, alpha etc.

Types of Barriers:

  • Compiler Barriers
    • This is to prevent compiler from reordering the instructions.
    • in Modern c++ (c++ 11) following compiler barriers are introduced:
    • A barrier can be applied by calling std::atomic_thread_fence() with argument for memory orders as  memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst etc.
  • Mandatory Barriers (Hardware)
    • Example instructions are mfence, lfence or sfence.
    • General barriers
    • Read-Write barriers
  • SMP (Simultaneous Multiprocessors) Conditional Barries:
    • This is used during multiprocessing example: used mainly in Linux Kernel.
  • Acquire/Release Barriers
  • Data Dependency Barriers
  • Devise (I/O) Barriers



Author: Saurabh Purnaye

VP - Low Latency Developer @jpmchase... Linux, C++, Python, Ruby. pursuing certificate in #QuantFinance and Passed CFA L1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: