Memory Barriers

Why Memory Barriers?

Because the CPU and/or the compiler can reorder the instructions written in program order. Modern processors and Compilers try to optimize the program by reordering the instructions all the time. But the observed effects (on load and stores on memory locations) are consistent.

Sequential Consistency

is defined as the result of any execution is the same as if the read and write operations by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program [Lamport, 1979].
which means:

  • The instructions executed by the same CPU in order as they were written.
  • all the threads should observe the effect of loads/stores to any shared variable.

Sequential consistency is very important especially in multi-threaded programs because when a thread changes the shared variable, the other threads should see the variable in a consistent and valid state.

Total Store Ordering

Modern processors have a buffer where they store the updates to the memory location  called as store buffer. The reason, updates do not go to main memory directly is writes to main memory are very slow and value from store buffer can be reused because of spatial locality.

Architectures with Strong Store Ordering guarantees: x86, SPARC
Architectures with weak Store Ordering guarantees: ARM, POWER, alpha etc.

Types of Barriers:

  • Compiler Barriers
    • This is to prevent compiler from reordering the instructions.
    • in Modern c++ (c++ 11) following compiler barriers are introduced:
    • A barrier can be applied by calling std::atomic_thread_fence() with argument for memory orders as  memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst etc.
  • Mandatory Barriers (Hardware)
    • Example instructions are mfence, lfence or sfence.
    • General barriers
    • Read-Write barriers
  • SMP (Simultaneous Multiprocessors) Conditional Barries:
    • This is used during multiprocessing example: used mainly in Linux Kernel.
  • Acquire/Release Barriers
  • Data Dependency Barriers
  • Devise (I/O) Barriers

 

 

Author: Saurabh Purnaye

Sr. Developer @NYSE @ICE_Markets. Low Latency Trading, Linux, C++, Python, Ruby. pursuing certificate in #QuantFinance and Passed CFA L1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s