Atomics & lock-free data structures c++

  • The modern microprocessor pipeline is 14 stages deep during which the programming instructions reordered all the times for optimization purposes.
  • Linux 2.6 supports full preemption, i.e. the thread can be suspended in between.
  • Atomic types are the locations in main memory to which the access is exclusive to a thread/process.
  • Barriers are used for ordering the accesses to the memory locations.
  • Atomic operations are provided at hardware level in order to make the operations indivisible.
  • The Implementation is highly dependent upon the hardware. x86 has strictest rules around memory ordering.
  • Atomic operations with memory fences prevents reordering of the instructions in order to make the operation indivisible.
  • Atomic operations are expensive because the OS and hardware can not do all the necessary optimizations.

<atomic> header provides various atomic types. Following is non-exhaustive list of atomic types

atomic_bool std::atomic<bool>
atomic_char std::atomic<char>
atomic_schar std::atomic<signed char>
atomic_uchar std::atomic<unsigned char>
atomic_int std::atomic<int>
atomic_uint std::atomic<unsigned>
atomic_short std::atomic<short>
atomic_ushort std::atomic<unsigned short>
atomic_long std::atomic<long>
atomic_ulong std::atomic<unsigned long>
atomic_llong std::atomic<long long>
atomic_ullong std::atomic<unsigned long long>
atomic_char16_t std::atomic<char16_t>
atomic_char32_t std::atomic<char32_t>
atomic_wchar_t std::atomic<wchar_t>

Operations on Atomic types

These operations take a argument for memory order. Which can be one of std::memory_order_relaxed, std::memory_order_acquire,  std::memory_order_release, std::memory_order_acq_rel,  std::memory_order_consume or  std::memory_order_seq_cst

  • load: this is read operation on an atomic type.
  • store: this is a write operation on an atomic type.
  • exchange: this is read-modify-write operation on an atomic type. All the compare operations are used as compare_exchange(expected, desired, <optional memory order>). On successful exchange it return true else it returns false.
    • compare_exchange
    • compare_exchange_weak: these are really for the architectures where the read-modify-write operation is not guaranteed atomic. It can generate spurious errors and it is advised to use in a loop. It has same effect as of compare_exchange_strong on x86 platform.
    • compare_exchange_strong: it is guaranteed to return false on failure and guaranteed to return true on success.
  • fetch_ versions of add, or etc
  • overriden operators like +=, -=, *=, |=

Lock Based implementation of a Multi-producer, Multi-consumer Queue.


It is important to see the lock based data structures before implementing lock-free data structures.

Code for Lock based queue.
std::queue is wrapped in a struct Queue. 
internal_queue variable maintains the queue.
Work to do is an instance of struct Work.
Queue has pop and push functions.
Push function takes the work instance by rvalue instance and 
pushes it onto the internal queue.
Pop function returns a shared pointer to Work. The shared pointer
 is used to avoid the ABA problem.
Pop function is a non blocking function.

If the queue is empty, the thread just yields.
A sleep is added in order to simulate slowness in the work..

Disadvantages of this kind of queue:
 * Whole Queue is locked during any operations in order to make 
sure there is synchronized access.

#include <queue> // std::queue
#include <mutex> // for locking
#include <thread>
#include <memory> // smart pointers
#include <utility> 
#include <functional> // passing and storing lambda 
#include <iostream> // for std::cout
#include <chrono>

static long work_number = 0;

struct Work{
 // take lambda as a work and call lambda to process.
  Work(std::function<void(void)> lambda):f(lambda){};
  void operator()(){
  std::function<void(void)> f; 

struct Queue{
  Queue() = default;
  // Queue is not copyable or copy constructible.
  Queue(const Queue &) = delete;
  Queue & operator=(const Queue &) = delete;
  std::shared_ptr<Work> pop();
  void push(Work &&);
  std::mutex mtx;
  std::queue<Work> internal_queue;

 void Queue::push(Work && work){
  std::lock_guard<std::mutex> lg(mtx);

 std::shared_ptr<Work> Queue::pop(){
  Work w([](){});
    std::lock_guard<std::mutex> lg(mtx);
    if(internal_queue.size() > 0){
      w = std::move(internal_queue.front());
     std::this_thread::yield(); // let the other threads work
 return std::make_shared<Work>(w);

struct Producer{
 Producer(Queue & q):q(q){};
 void produce(){
     Work w([&](){std::cout << "work number : " << ++work_number << " is called.." << "\n";});
 Queue & q;

struct Consumer{
  Consumer(Queue & q):q(q){};
   void consume(){
     std::shared_ptr<Work> w_ptr = q.pop();
     Work * w = w_ptr.get();
 Queue & q;

int main(){
 Queue q;
 std::thread producer_thread1([&q](){
   Producer p(q);

std::thread producer_thread2([&q](){
  Producer p(q);

std::thread consumer_thread1([&q](){
 Consumer c(q);

std::thread consumer_thread2([&q](){
 Consumer c(q);

std::thread consumer_thread3([&q](){
 Consumer c(q);

 return 0;

Lock Free Data Structures

In the above implementation, the data structure is a lock based one and hence no two threads can access is concurrently.

Author: Saurabh Purnaye

Sr. Developer @NYSE @ICE_Markets. Low Latency Trading, Linux, C++, Python, Ruby. pursuing certificate in #QuantFinance and Passed CFA L1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s