Supercharged Computing

Featured

Linux Performance Optimization

3 main levels for tuning

CPU/BIOS tuning:
- Power
- P-states and C-states
- Hyper-threading
- Frequency/Turbo mode
- Governance
- Termal mode
  Further reading:
  https://community.mellanox.com/docs/DOC-2297
  https://community.mellanox.com/docs/DOC-2817
  https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
OS tuning:
- CPU/NUMA affinity
- IRQ Affinity
- Swapiness
- Scheduling
- HUGE pages
  Further Reading:
  Red Hat Enterprise Linux Network Performance Tuning Guide https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/index https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf
  https://community.mellanox.com/docs/DOC-2135
  https://community.mellanox.com/docs/DOC-2797
Network Tuning:
- TCP/IP tunables
- Kernel Bypass Technologies
  Further Reading
  Kernel-bypass networking for fun and profit
  https://www.youtube.com/watch?v=noqSZorzooc
  Intel PDPK https://www.intel.com/content/www/us/en/communications/data-plane-development-kit.html
  Fundamentals of RDMA: Free training courses on Mellanox Academy site: https://academy.mellanox.com/course/view.php?id=138

Configuration using tuned

tuned is available from RHEL7
Make changes in sysctl as well as /sys directory
there are profiles available for specific needs like network-latency, latency-performance, network-performance, throughput-performance, desktop and balanced etc

Screen Shot 2018-03-14 at 7.49.16 PM

Examples of the profiles

Screen Shot 2018-03-14 at 7.50.11 PM

Latency Performance Settings in Linux Tuned

Settings	Meaning
force_latency=1	processor state C1
governor=performance	CPU is at higher performance state
energy_perf_bias= performance	Higher performance state
min_perf_pct=100	comes from the P-state drivers. The interfaces provided by the cpufreq core for controlling frequency the driver provides sysfs files for controlling P state selection. These files have been added to /sys/devices/system/cpu/intel_pstate.
kernel.sched_min_granularity_ns=10000000	Minimal preemption granularity for CPU-bound tasks
vm.dirty_ratio=10	The generator of dirty data starts writeback at this percentage
vm.dirty_background_ratio=3	Start background writeback (via writeback threads) at this percentage
vm.swappiness=10	The swappiness parameter controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. 0 tells the kernel to avoid swapping processes out of physical memory for as long as possible 100 tells the kernel to aggressively swap processes out of physical memory and move them to swap cache
kernel.sched_migration_cost_ns=5000000	The total time the scheduler will consider a migrated process “cache hot” and thus less likely to be re-migrated
net.core.busy_read=50	This parameter controls the number of microseconds to wait for packets on the device queue for socket reads. It also sets the default value of the SO_BUSY_POLL option.
sysctl.net.core.busy_poll=50	This parameter controls the number of microseconds to wait for packets on the device queue for socket poll and selects
kernel.numa_balancing=0	disable NUMA balancing
net.ipv4.tcp_fastopen=3	Linux supports configuring both overall client and server support via /proc/sys/net/ipv4/tcp_fastopen (net.ipv4.tcp_fastopen via sysctl). The options are a bit mask, where the first bit enables or disables client support (default on), 2nd bit sets server support (default off), 3rd bit sets whether data in SYN packet is permitted without TFO cookie option. Therefore a value of 1 TFO can only be enabled on outgoing connections (client only), value 2 allows TFO only on listening sockets (server only), and value 3 enables TFO for both client and server.

More information about Tunables:

Dependent on NUMA:
- Reclaim Ratios
  - /proc/sys/vm/swappiness
  - /proc/sys/vm/min_free_kbytes
Independent of NUMA:
- Reclaim Ratios
  - /proc/sys/vm/vfs_cache_pressure
  - Writeback Parameters
    - /proc/sys/vm/dirty_background_ratio
    - /proc/sys/vm/dirty_ratio
  - Readahead parameters
    - /sys/block/queue/read_ahead_kb

top utility

available on all the machines
default tool for measurement that has lot of information.

Parameters important for Optimization

Swappiness

Controls how aggressively the system reclaims anonymous memory versus
page cache memory:
- Anonymous memory – swapping and freeing
- File pages – writing if dirty and freeing
- System V shared memory – swapping and freeing
Default is 60
If it is decreased: more aggressive reclaiming of page cache memory
If it is Increased: more aggressive swapping of anonymous memory
Should be set to 0 as per the low latency optimization guide http://wiki.dreamrunner.org/public_html/Low_Latency_Programming/low-latency-programming.html

Memory reclaim watermarks:

Screen Shot 2018-03-14 at 10.54.08 PM

zone_reclaim_mode

Controls NUMA specific memory allocation policy
When set and node memory is exhausted:
- Reclaim memory from local node rather than allocating
  from next node
- Slower initial allocation, higher NUMA hit ratio
When clear and node memory is exhausted:
- Allocate from all nodes before reclaiming memory
- Faster initial allocation, higher NUMA miss ratio
To see current setting: cat /proc/sys/vm/zone_reclaim_mode
Turn ON: echo 1 > /proc/sys/vm/zone_reclaim_mode
Turn OFF: echo 0 > /proc/sys/vm/zone_reclaim_mode
It is recommended that the settings should be off for large in memory database server.

CPU Tuning

p-states
- It’s an Advanced Configuration and Power Interface (ACPI) defined processor performance state.
- P0, P1, P2..p11 etc are values for P-states. For highest performance the p-states should be set to P0

c-states

It allow the CPU package to shut down cores and parts of the CPU microarchitecture to save energy while balancing response times.

C0, C1, C3, C6 etc are the values for c-states. Here is the chart for more information

Mode	Name	What it does	CPUs
C0	Operating State	CPU fully turned on	All CPUs
C1	Halt	Stops CPU main internal clocks via software; bus interface unit and APIC are kept running at full speed.	486DX4 and above
C1E	Enhanced Halt	Stops CPU main internal clocks via software and reduces CPU voltage; bus interface unit and APIC are kept running at full speed.	All socket LGA775 CPUs
C1E	—	Stops all CPU internal clocks.	Turion 64, 65-nm Athlon X2 and Phenom CPUs
C2	Stop Grant	Stops CPU main internal clocks via hardware; bus interface unit and APIC are kept running at full speed.	486DX4 and above
C2	Stop Clock	Stops CPU internal and external clocks via hardware	Only 486DX4, Pentium, Pentium MMX, K5, K6, K6-2, K6-III
C2E	Extended Stop Grant	Stops CPU main internal clocks via hardware and reduces CPU voltage; bus interface unit and APIC are kept running at full speed.	Core 2 Duo and above (Intel only)
C3	Sleep	Stops all CPU internal clocks	Pentium II, Athlon and above, but not on Core 2 Duo E4000 and E6000
C3	Deep Sleep	Stops all CPU internal and external clocks	Pentium II and above, but not on Core 2 Duo E4000 and E6000; Turion 64
C3	AltVID	Stops all CPU internal clocks and reduces CPU voltage	AMD Turion 64
C4	Deeper Sleep	Reduces CPU voltage	Pentium M and above, but not on Core 2 Duo E4000 and E6000 series; AMD Turion 64
C4E/C5	Enhanced Deeper Sleep	Reduces CPU voltage even more and turns off the memory cache	Core Solo, Core Duo and 45-nm mobile Core 2 Duo only
C6	Deep Power Down	Reduces the CPU internal voltage to any value, including 0 V	45-nm mobile Core 2 Duo only

Latency for C states

C-STATE

RESIDENCY

WAKE-UP LATENCY

C0

ACTIVE

ACTIVE

C1

1 μs

1 μs

C3

106 μs

80 μs

C6

345 μs

104 μs

How to Optimize

Use real production system for measurement
Note the state of system before change
Apply the optimizations later going to come in the post.
Compare state of system after applying the optimizations
Tune until you get desired results.
Document the changes.

Types of Tools needed

Tools that gather system information or Hardware
- lscpu
- uname
- systemctl list -units -t service
Tools for measuring Performance
Tools for Microbenchmarking

Optimization include Benchmarking, Profiling and Tracing

Benchmarking
- this is comparing performance with industry standard
- Tools used for benchmarking
  - vmstat used for benchmarking virtual memory
  - iostat used for benchmarking io
  - mpstat multiple processor
Profiling gathering information about hot spots
Tracing

Based on the Benchmarking, Profiling and Tracking the system is tuned as follows;

using echo into /proc
- The settings are not persistent. Stay until reboot.
- /proc files system has pid directories for processes, configuration files and sys directory
  - /proc/sys directory has some important directories like /kernel, /net, /vm , /user etc.
  - /proc/sys/vm has settings for with virtual memory.
Using sysctl command
- sysctl it’s a service
- Persistent settings.
- It’s started at the boot time’s
- It reads settings from a configuration file at /etc directory
- There are some more settings in
  - usr/lib/sysctl.d stores the default
  - /run/sysctl.d stores the runtime
- /etc/sysctl.d stores runtime settings
Using loading/unloading of kernel modules and configure kernel parameters
- Using modinfo to gather information about available parameters
- Using modprobe for modifying parameters
- modprobe modulename key=val
- for persistent settings, /etc/modprobe.d modulename.conf

Detecting and fixing Memory Issues

daan-mooij-674206-unsplash

There are two main tools I like to use for any memory related stuff in my c++ code. Following is an example code I had written as my implementation of a shared pointer. I think my code works as expected but not sure how much memory I am leaking if any.

#include <iostream>

template <typename T>
class MySharedPtr{
  public:
    MySharedPtr(T val){
      resource_ptr = new T(val);
      c_ptr = new int(1);
    };

  MySharedPtr<T>(const MySharedPtr<T> & ptr){
    std::cout << "copy constructor \n";
    resource_ptr = ptr.resource_ptr;
    (*ptr.c_ptr)++;
    this->c_ptr = ptr.c_ptr;
  }

  MySharedPtr<T>& operator=(const MySharedPtr<T> & ptr){
    std::cout << "copy assignment operator\n";
    if(this != &ptr){
      (*ptr.c_ptr)++;
      this->c_ptr = ptr.c_ptr;
      this->resource_ptr = ptr.resource_ptr;
     }
    return *this;
  }

  // delete move constructors...
  MySharedPtr(MySharedPtr && ptr) = delete;
  MySharedPtr & operator=(MySharedPtr && ptr) = delete;

  MySharedPtr & operator *(){
    return resource_ptr;
  }

  MySharedPtr* operator ->(){
    return resource_ptr;
  }

  virtual ~MySharedPtr(){
    std::cout << *c_ptr << '\n';
    if(*c_ptr > 0){
      (*c_ptr)--;
    }else{
      delete c_ptr;
      delete resource_ptr;
    }
  }

  int use_count(){
    return *c_ptr;
  }

  private:
    int * c_ptr;
  T * resource_ptr;
};

int main(){
  int x;
    MySharedPtr<decltype(x)> sp(1);
  {
    MySharedPtr<decltype(x)> sp1(2);
    sp1 = sp;
    std::cout << "use count is " << sp.use_count() << '\n';
  }
  
  sp = sp;
  std::cout << "use count is " << sp.use_count() << '\n';
  return 0;
}

Google Sanitizer

The command:

/usr/local/Cellar/gcc/8.2.0/bin/g++-8 -std=c++17 -g -o2 -fsanitize=address -fno-omit-frame-pointer shared_ptr.cpp -o shared_ptr && ASAN_OPTIONS=detect_leaks=1 ./shared_ptr

The output:

copy assignment operator
use count is 2
2
copy assignment operator
use count is 1
1

=================================================================
==25533==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x526b18 in operator new(unsigned long) (/home/aarna/devel/practice/smart_ptrs+0x526b18)
    #1 0x52b50c in MySharedPtr<int>::MySharedPtr(int) /home/aarna/devel/practice/smart_ptrs.cpp:9:19
    #2 0x52b112 in main /home/aarna/devel/practice/smart_ptrs.cpp:62:28
    #3 0x7f6507f483d4 in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x526b18 in operator new(unsigned long) (/home/aarna/devel/practice/smart_ptrs+0x526b18)
    #1 0x52b47b in MySharedPtr<int>::MySharedPtr(int) /home/aarna/devel/practice/smart_ptrs.cpp:8:26
    #2 0x52b134 in main /home/aarna/devel/practice/smart_ptrs.cpp:64:32
    #3 0x7f6507f483d4 in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x526b18 in operator new(unsigned long) (/home/aarna/devel/practice/smart_ptrs+0x526b18)
    #1 0x52b50c in MySharedPtr<int>::MySharedPtr(int) /home/aarna/devel/practice/smart_ptrs.cpp:9:19
    #2 0x52b134 in main /home/aarna/devel/practice/smart_ptrs.cpp:64:32
    #3 0x7f6507f483d4 in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x526b18 in operator new(unsigned long) (/home/aarna/devel/practice/smart_ptrs+0x526b18)
    #1 0x52b47b in MySharedPtr<int>::MySharedPtr(int) /home/aarna/devel/practice/smart_ptrs.cpp:8:26
    #2 0x52b112 in main /home/aarna/devel/practice/smart_ptrs.cpp:62:28
    #3 0x7f6507f483d4 in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274

SUMMARY: AddressSanitizer: 16 byte(s) leaked in 4 allocation(s).

The explanation: The address sanitizer has precisely pointed out the line numbers. It is very useful in coming up with a fix.

Valgrind

Command:

valgrind --tool=memcheck  --leak-check=full --show-leak-kinds=all --track-origins=yes  ./smart_ptrs

The output:

[aarna@localhost practice]$ valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes ./smart_ptrs
==25564== Memcheck, a memory error detector
==25564== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==25564== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==25564== Command: ./smart_ptrs
==25564==
==25564==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==25564==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==25564==Process memory map follows:
0x000000400000-0x00000055d000 /home/aarna/devel/practice/smart_ptrs
0x00000075c000-0x00000075d000 /home/aarna/devel/practice/smart_ptrs
0x00000075d000-0x000000760000 /home/aarna/devel/practice/smart_ptrs
0x000000760000-0x000001447000
0x000004000000-0x000004022000 /usr/lib64/ld-2.17.so
0x000004022000-0x000004036000
0x00000403f000-0x000004055000
0x000004221000-0x000004222000 /usr/lib64/ld-2.17.so
0x000004222000-0x000004223000 /usr/lib64/ld-2.17.so
0x000004223000-0x000004224000
0x000004224000-0x000004225000
0x000004a24000-0x000004a25000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_core-amd64-linux.so
0x000004a25000-0x000004c24000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_core-amd64-linux.so
0x000004c24000-0x000004c25000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_core-amd64-linux.so
0x000004c25000-0x000004c26000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_core-amd64-linux.so
0x000004c26000-0x000004c34000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
0x000004c34000-0x000004e34000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
0x000004e34000-0x000004e35000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
0x000004e35000-0x000004e36000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
0x000004e36000-0x000004f1f000 /usr/lib64/libstdc++.so.6.0.19
0x000004f1f000-0x00000511e000 /usr/lib64/libstdc++.so.6.0.19
0x00000511e000-0x000005126000 /usr/lib64/libstdc++.so.6.0.19
0x000005126000-0x000005128000 /usr/lib64/libstdc++.so.6.0.19
0x000005128000-0x00000513d000
0x00000513d000-0x00000523e000 /usr/lib64/libm-2.17.so
0x00000523e000-0x00000543d000 /usr/lib64/libm-2.17.so
0x00000543d000-0x00000543e000 /usr/lib64/libm-2.17.so
0x00000543e000-0x00000543f000 /usr/lib64/libm-2.17.so
0x00000543f000-0x000005456000 /usr/lib64/libpthread-2.17.so
0x000005456000-0x000005655000 /usr/lib64/libpthread-2.17.so
0x000005655000-0x000005656000 /usr/lib64/libpthread-2.17.so
0x000005656000-0x000005657000 /usr/lib64/libpthread-2.17.so
0x000005657000-0x00000565b000
0x00000565b000-0x000005662000 /usr/lib64/librt-2.17.so
0x000005662000-0x000005861000 /usr/lib64/librt-2.17.so
0x000005861000-0x000005862000 /usr/lib64/librt-2.17.so
0x000005862000-0x000005863000 /usr/lib64/librt-2.17.so
0x000005863000-0x000005865000 /usr/lib64/libdl-2.17.so
0x000005865000-0x000005a65000 /usr/lib64/libdl-2.17.so
0x000005a65000-0x000005a66000 /usr/lib64/libdl-2.17.so
0x000005a66000-0x000005a67000 /usr/lib64/libdl-2.17.so
0x000005a67000-0x000005a7c000 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
0x000005a7c000-0x000005c7b000 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
0x000005c7b000-0x000005c7c000 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
0x000005c7c000-0x000005c7d000 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
0x000005c7d000-0x000005e40000 /usr/lib64/libc-2.17.so
0x000005e40000-0x00000603f000 /usr/lib64/libc-2.17.so
0x00000603f000-0x000006043000 /usr/lib64/libc-2.17.so
0x000006043000-0x000006045000 /usr/lib64/libc-2.17.so
0x000006045000-0x00000639c000
0x00000639c000-0x00000679c000
0x000058000000-0x00005823b000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/memcheck-amd64-linux
0x00005843b000-0x00005843e000 /opt/rh/devtoolset-7/root/usr/lib64/valgrind/memcheck-amd64-linux
0x00005843e000-0x000059e40000
0x001002001000-0x001008b0a000
0x001008b8c000-0x001008bac000
0x001008bac000-0x001008bae000
0x001008bae000-0x001008cae000
0x001008cae000-0x001008cb0000
0x001008cb0000-0x001008cb1000 /tmp/vgdb-pipe-shared-mem-vgdb-25564-by-aarna-on-localhost.localdomain
0x001008cb1000-0x00100b0fd000
0x00100b1fd000-0x00100b3fd000
0x00100b4fd000-0x00100b5fd000
0x00100b7f2000-0x00100ba1b000
0x00100bbdb000-0x00100bcdb000
0x001ffeffd000-0x001fff001000
0x7ffd1941d000-0x7ffd1943e000 [stack]
0xffffffffff600000-0xffffffffff601000 [vsyscall]
==25564==End of process memory map.
==25564==
==25564== HEAP SUMMARY:
==25564== in use at exit: 32 bytes in 1 blocks
==25564== total heap usage: 1 allocs, 0 frees, 32 bytes allocated
==25564==
==25564== 32 bytes in 1 blocks are still reachable in loss record 1 of 1
==25564== at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==25564== by 0x586454F: _dlerror_run (dlerror.c:141)
==25564== by 0x5864057: dlsym (dlsym.c:70)
==25564== by 0x50233B: __interception::GetRealFunctionAddress(char const*, unsigned long*, unsigned long, unsigned long) (in /home/aarna/devel/practice/smart_ptrs)
==25564== by 0x4DD462: __asan::InitializeAsanInterceptors() (in /home/aarna/devel/practice/smart_ptrs)
==25564== by 0x41A49F: __asan::AsanInitInternal() [clone .part.1] (in /home/aarna/devel/practice/smart_ptrs)
==25564== by 0x400FCA2: _dl_init (dl-init.c:116)
==25564== by 0x4001029: ??? (in /usr/lib64/ld-2.17.so)
==25564==
==25564== LEAK SUMMARY:
==25564== definitely lost: 0 bytes in 0 blocks
==25564== indirectly lost: 0 bytes in 0 blocks
==25564== possibly lost: 0 bytes in 0 blocks
==25564== still reachable: 32 bytes in 1 blocks
==25564== suppressed: 0 bytes in 0 blocks
==25564==
==25564== For counts of detected and suppressed errors, rerun with: -v
==25564== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The explanation:

The output by valgrind shows entire picture of virtual memory and how source files and code is laid out in the memory. It detects the loss of 32 bytes of heap memory still in use at the exit.

The fix:

The output shown by the google address sanitizer was useful at micro-level. It precisely pointed the line number and I was able to apply my first fix to the memory leak in virtual destructor as follows:

virtual ~MySharedPtr(){
  if(*c_ptr > 0){
    (*c_ptr)--;
  }
  
  if(*c_ptr == 0){
    if(resource_ptr != nullptr){
      delete resource_ptr;
      resource_ptr = nullptr;
    }
    delete c_ptr;
    c_ptr = nullptr;
  }
}

I still have following leaks:

=================================================================
==9130==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x1026a53d0 in wrap__Znwm (libasan.5.dylib:x86_64+0x6e3d0)
#1 0x102324526 in MySharedPtr<int>::MySharedPtr(int) shared_ptr.cpp:9
#2 0x10232428a in main shared_ptr.cpp:68
#3 0x7fff7661b014 in start (libdyld.dylib:x86_64+0x1014)

Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x1026a53d0 in wrap__Znwm (libasan.5.dylib:x86_64+0x6e3d0)
#1 0x1023244a4 in MySharedPtr<int>::MySharedPtr(int) shared_ptr.cpp:8
#2 0x10232428a in main shared_ptr.cpp:68
#3 0x7fff7661b014 in start (libdyld.dylib:x86_64+0x1014)

SUMMARY: AddressSanitizer: 8 byte(s) leaked in 2 allocation(s).

Going through the code I fixed my copy assignment operator by deleting the resource pointer and the counter pointer as follows:

MySharedPtr<T>& operator=(const MySharedPtr<T> & ptr){
  std::cout << "copy assignment operator\n";
  if(this != &ptr){
    delete resource_ptr;  // deleting fix 1
    delete c_ptr;         // deleting fix 2
    (*ptr.c_ptr)++;
    this->c_ptr = ptr.c_ptr;
    this->resource_ptr = ptr.resource_ptr;
   }
   return *this;
}

And my code generated error free output:

copy assignment operator
use count is 2
copy assignment operator
use count is 1

Template Metaprogramming Patterns

SFINAE – Substitution Failure is Not An Error

Template meta programming is fun. Lot of simple algorithms can be done at compile time and in very powerful way. The code uses a template meta-programming pattern called as SFINAE – Substitution failure is not an error.

Following are three examples for common problems like Fibonacci, Factorial and GCD calculations using SFINAE.

#include <iostream>

// find fibonacci using template meta-programming
template <long long N>
struct fibonacci{
  static constexpr long long value = fibonacci<N-1>::value + fibonacci<N-2>::value;
};

template<>
struct fibonacci<1>{
  static constexpr long long value = 1;
};

template<>
struct fibonacci<0>{
  static constexpr long long value = 0;
};

// end code for fibonacci

// -------------------------------------------------------------------------------------------
// find gcd using template meta programming
template<unsigned M, unsigned N>
struct gcd{
  static constexpr auto value = gcd<N,M%N>::value;
};

template<unsigned M>
struct gcd<M,0>{
  static_assert(M!=0);
  static constexpr auto value = M;
};
// -------------------------------------------------------------------------------------------

// factorial using template meta-programming
template<long long N>
struct factorial{
  static constexpr long long value = N*(factorial<N-1>::value);
};

template<>
struct factorial<0>{
  static constexpr long long value = 1;
};

int main(){
  std::cout << gcd<15,5>::value << '\n'; // 5
  std::cout << fibonacci<80>::value << '\n'; // 23416728348467685
  std::cout << factorial<20>::value << '\n'; // 2432902008176640000
}

More information about SINAF

CRTP – Curiously Recurring Template Pattern

The CRTP pattern was discovered by an accident. There is another name for this pattern, “Static Polymorphism”.The base class is generally templatized class and derived class is subclassed from base class with type as templatized class. The reason this is called as CRTP is because a method from derived class is called from base class statically. The definition is rather complicated but the code should clear it.

#include <iostream>

struct Exchange{
  std::string name;
};

template <typename T>
struct Order{
  int transact(Exchange * exchange, size_t quantity){
    return static_cast<T *>(this)->transact(quantity);
  }
};

struct BuyOrder: Order<BuyOrder>{
  int transact(Exchange * exchange, size_t quantity){
    buy();
    return quantity;
  }

  void buy(){
    // code for buy goes here..
  }
};

struct SellOrder: Order<SellOrder>{
  int transact(Exchange * exchange, size_t quantity){
    sell();
    return quantity;
  }

  void sell(){
    // code for sell goes here..
  }
};

int main(){
  Exchange e;
  BuyOrder bo;
  bo.transact(e, 100);
  SellOrder so;
  so.transact(e, 100);
}

Modern c++ coding Thread pool

First attempt is to code the thread pool is using simple conditional variables and mutex. Once the multi-producer, multi-consumer thread pool is working as expected, its easy to convert into lock free buffer.

The Threadpool class is a struct with following member variables.

struct Work with following member variables
1. _work to store the lambda/function pointer
2. _args_ary to store the array
3. is_processed boolean flag.
std::array of a fixed size to hold the work.
static const variable to indicate the max_size of work array.
head_idx and tail_idx to guide the next index.
a condition variable cv
mutex m

There are 2 main methods for enqueuing and popping the items out and working on it. Since its a circular queue, next index calculated by adding one to current index and taking remainder from max_size. i.e.

next_idx = (current_index + 1 ) % max_size

Here is an example of SPSC – single producer single consumer locked queue. The queue is bounded and blocking. Means, if either the producer or the consumer is slow, it is going to block using the condition variable and mutex.

#include <iostream>
#include <thread>
#include <array>
#include <functional>
#include <vector>
#include <random>
#include <chrono>

using func_type = std::function<int(int,int)>;
using args_ary = std::array<int,2>;

struct ThreadPool{
  // Enqueue function to get the function and arguments to the function.
  // consumer can invoke function with args passed on to it. 
  bool enqueue(func_type fn, args_ary args){
    size_t next_idx = (head_idx + 1) % max_size;

    std::unique_lock<std::mutex> lk(mtx);
    cv.wait(lk,[&](){
      return work[next_idx].processed();
    });

    Work w{fn, args};
    work[next_idx] = w;
    head_idx = next_idx;
    cv.notify_all();
    return true;
  }

  // process or pop function to pop the latest record and process it.
  bool process(){
    size_t next_idx = (tail_idx + 1) % max_size;
    std::unique_lock<std::mutex> lk(mtx);
    cv.wait(lk,[&](){
      return !work[next_idx].processed();
    });

    Work w = work[next_idx];
    work[next_idx].set_processed();

    if(!w.invalid()){
      int output = w.process();
      std::cout << output << '\n';
    }
    tail_idx = next_idx;
    cv.notify_all();
    return true;
}

private:
  struct Work{
    private:
      func_type _work;
      args_ary _args;
      bool is_processed = false;

    public:
      Work():is_processed(true){};
      Work(func_type work, args_ary args):_work{work},_args{args},is_processed(false){};
      bool processed(){
        return is_processed;
      }
      int process(){
        return _work(_args[0], _args[1]);
      }
      bool invalid(){
        return _work == nullptr;
      }
      void set_processed(){
        is_processed = true;
      }
   };

   static const size_t max_size = 1000;
   std::array<Work,max_size> work;
   size_t head_idx = 0, tail_idx = 0;
   std::condition_variable cv;
  std::mutex mtx;
};

int random_num(){
  std::mt19937 rng;
  rng.seed(std::random_device()());
  std::uniform_int_distribution<std::mt19937::result_type> dist(1,10000); // distribution in range [1,10000]
  return dist(rng);
}

int main(){
  ThreadPool tp;

  std::thread producer1([&](){
    int count = 0;
    while(true){
      std::function<int(int,int)> fn;
      if (count % 3 == 0){
        fn = [](int a, int b){ std::cout << "Adding \t\t" << a << '\t'<< b << '\t'; return a+b;};
      }else if(count % 3 == 1){
        fn = [](int a, int b){ std::cout << "Subtracting \t" << a << '\t'<< b << '\t'; return a-b;};
      }else{
        fn = [](int a, int b){std::cout << "Multiplying \t" << a << '\t'<< b << '\t'; return a*b;};
      }
      tp.enqueue(fn, args_ary{random_num(),random_num()});
      std::this_thread::sleep_for(std::chrono::milliseconds(10));
      ++count;
     }
  });

  std::thread consumer1([&](){
    while(true){
      tp.process();
    }
  });

  if(producer1.joinable()){
    producer1.join();
  }

  if(consumer1.joinable()){
    consumer1.join();
  }
  return 0;
}

Here is a complete working example for MPMC queue.

#include <iostream>
#include <thread>
#include <array>
#include <functional>
#include <vector>
#include <random>
#include <chrono>

using func_type = std::function<int(int,int)>;
using args_ary = std::array<int,2>;

struct ThreadPool{
  ThreadPool(){
    for(int i = 0; i < max_size; ++i){
      work[i] = Work();
    }
  }

  bool enqueue(func_type fn, args_ary args){
    size_t next_idx = (head_idx + 1) % max_size;
    while(work[next_idx].processed()){
      std::unique_lock<std::mutex> lk(mtx);
      cv.wait(lk,[&](){
        return work[next_idx].processed();
      });

      Work w = work[next_idx];
      if(w.invalid() || w.processed()){
        work[next_idx].set_work(fn);
        work[next_idx].set_args(args);
        work[next_idx].reset();
        cv.notify_all();
        break;
      }
   }
   head_idx = next_idx;
   return true;
}

  bool process(int remainder, int total_threads){
    size_t next_idx = (tail_idx + 1) % max_size;

    std::unique_lock<std::mutex> lk(mtx);
    cv.wait(lk,[&](){
      return work[next_idx].unprocessed();
    });

    Work w = work[next_idx];
    work[next_idx].set_processed();

    if(!w.invalid()){
      int output = w.process();
      std::cout << output << '\n';
    }
    tail_idx = next_idx;
    cv.notify_all();
    return true;
  }

  private:
    struct Work{
    private:
      func_type _work;
      args_ary _args;
      bool is_processed = false;

    public:
      Work():is_processed(true){}; 
      int process(){
        return _work(_args[0], _args[1]);
      }
    
    bool invalid(){
      return _work == nullptr;
    }
    
    void set_processed(){
      is_processed = true;
    }

    void set_work(func_type work){
      _work = work;
    }

    void set_args(args_ary args){
      _args = args;
    }
    void reset(){
      is_processed = false;
    }

    bool unprocessed(){
      return is_processed == false;
    }

    bool processed(){
      return is_processed == true;
    }
};

  static const size_t max_size = 1000;
  std::array<Work,max_size> work;
  size_t head_idx = 0, tail_idx = 0;
  std::condition_variable cv;
  std::mutex mtx;
};

int random_num(){
  std::mt19937 rng;
  rng.seed(std::random_device()());
  std::uniform_int_distribution<std::mt19937::result_type> dist(1,10000); // distribution in range [1,10000]
  return dist(rng);
}

int main(){
  ThreadPool tp;
  auto adder = [](int a, int b){ std::cout << "Adding \t\t" << a << '\t'<< b << '\t'; return a+b;};
  auto subtractor = [](int a, int b){ std::cout << "Subtracting \t" << a << '\t'<< b << '\t'; return a-b;};
  auto multiplicator = [](int a, int b){std::cout << "Multiplying \t" << a << '\t'<< b << '\t'; return a*b;};

  std::thread producer1([&](){
    while(true){
      tp.enqueue(adder, args_ary{random_num(),random_num()});
      //std::this_thread::sleep_for(std::chrono::milliseconds(10));
    } 
  });

  std::thread producer2([&](){
    while(true){
      tp.enqueue(subtractor, args_ary{random_num(),random_num()}); 
      //std::this_thread::sleep_for(std::chrono::milliseconds(10));
  } 
});

std::thread producer3([&](){
  while(true){
    tp.enqueue(multiplicator, args_ary{random_num(),random_num()}); 
      //std::this_thread::sleep_for(std::chrono::milliseconds(10));
  } 
});

std::thread consumer1([&](){
  while(true){
    tp.process(0, 3);
  }
});

std::thread consumer2([&](){
  while(true){
    tp.process(1, 3);
  }
});

std::thread consumer3([&](){
  while(true){
    tp.process(2, 3);
  }
});


  if(producer1.joinable()){
    producer1.join();
  }

  if(producer2.joinable()){
    producer2.join();
  }

  if(producer3.joinable()){
    producer3.join();
  }

  if(consumer1.joinable()){
    consumer1.join();
  }

  if(consumer2.joinable()){
    consumer2.join();
  }
  if(consumer3.joinable()){
    consumer3.join();
  }

  return 0;
}

Output

Adding      9316 8539 17855
Subtracting 3529 1274 2255
Multiplying 7821 2404 18801684
Adding      9455 8300 17755
Subtracting 9882 5696 4186
Multiplying 202 9795 1978590
Adding      5794 7979 13773
Subtracting 4331 576 3755
Multiplying 963 4548 4379724
Adding      7238 9714 16952
Subtracting 9868 5185 4683
Multiplying 9139 4941 45155799
Adding      6842 7031 13873
Subtracting 3043 2312 731

Lockfree code

A Single producer, single consumer Lockfree queue can be written as follows

#include <iostream>
#include <thread>
#include <array>
#include <functional>
#include <vector>
#include <random>
#include <chrono>

using func_type = std::function<int(int,int)>;
using args_ary = std::array<int,2>;

struct ThreadPool{
  // Constructor
  ThreadPool(){
    for(int i = 0; i < max_size; ++i){
      work[i] = Work();
    }
  }

  bool enqueue(func_type fn, args_ary args){
    size_t current_head_idx = head_idx.load(std::memory_order_acquire);
    std::atomic<size_t> next_idx = (current_head_idx + 1) % max_size;

    Work w = work[next_idx];
    if(w.invalid() || w.processed()){
      work[next_idx].set_work(fn);
      work[next_idx].set_args(args);
      work[next_idx].reset();
      head_idx.store(next_idx, std::memory_order_release);
    }
    return true;
  }

  bool process(){
    size_t current_tail_idx = tail_idx.load(std::memory_order_acquire);
    std::atomic<size_t> next_idx = (current_tail_idx + 1) % max_size;

    if(!work[next_idx].processed()){
      int output = work[next_idx].process();
      std::cout << output << '\n';
      work[next_idx].set_processed();
      tail_idx.store(next_idx);
     }
    return true;
}

  private:
    struct Work{
      private:
        func_type _work;
        args_ary _args;
        bool is_processed = false;

      public:
        Work():is_processed(true){};
        Work(func_type work, args_ary args):_work{work},_args{args},is_processed(false){};
        bool processed(){
          return is_processed == true;
        }

       int process(){
         return _work(_args[0], _args[1]);
       }

       bool invalid(){
         return _work == nullptr;
       }

       void set_processed(){ 
         is_processed = true;
       }

       void set_work(func_type work){
         _work = work;
       }

       void set_args(args_ary args){
         _args = args;
       }
       
       void reset(){
         is_processed = false;
       }
    };

    static const size_t max_size = 1000;
    std::array<Work,max_size> work;
    std::atomic<size_t> head_idx{0}, tail_idx{0};
};

int random_num(){
  std::mt19937 rng;
  rng.seed(std::random_device()());
  std::uniform_int_distribution<std::mt19937::result_type> dist(1,10000); // distribution in range [1,10000]
  return dist(rng);
}

int main(){
  ThreadPool tp;

  std::thread producer1([&](){
    int count = 0;
    while(true){
      std::function<int(int,int)> fn;
      if (count % 3 == 0){
        fn = [](int a, int b){ 
               std::cout << "Adding \t\t" << a << '\t'<< b << '\t'; 
               return a+b;
              };
      }else if(count % 3 == 1){
        fn = [](int a, int b){ 
               std::cout << "Subtracting \t" << a << '\t'<< b << '\t'; 
               return a-b;};
      }else{
        fn = [](int a, int b){
              std::cout << "Multiplying \t" << a << '\t'<< b << '\t'; 
              return a*b;};
        }
       tp.enqueue(fn, args_ary{random_num(),random_num()});
       ++count;
    }
});

  std::thread consumer1([&](){
    while(true){
     tp.process();
    }
  });

  if(producer1.joinable()){
    producer1.join();
  }

  if(consumer1.joinable()){
    consumer1.join();
  }
  return 0;
}

Sizes of data types in c++

This post explains sizes of different data types, structures and classes under various circumstances.

#include <iostream>
#include <atomic>
#include <string>
#include <vector>
#include <set>
#include <queue>
#include <forward_list>
#include <memory>
// An empty class without any data members
class EmptyClass{};

// class with member variable int
class X{
  int a;
};

// class with member variable int and double
class Y{
  int a;
  double b;
};

// class with static variable int
class StaticMember{
  static int a;
};

// packed structure
class __attribute__((packed)) YPacked{
int a;
double b;
};

// packed large structure
class __attribute__((packed)) LargeStruct {
int a;
double b;
float c;
char d;
double l;
double m;
int n;
int o;
double p;
double q;
double t;
double u;
double v;
double w;
char s;
double x;
float y;
int z;
};

// class with a single virtual function
class VFunClass{
  public:
    virtual void fun(){std::cout << "class with virtual function\n";};
};

// class with a virtual destructor.
class VDesttructorClass{
  public:
    virtual ~VDesttructorClass(){
      std::cout << "in the virtual destructor\n";
    }
};

// Multi level class hierarchy
class Base{};

class Derived: public Base{

};

class Derived1: public Base{
};

class Derived2: public Base{
};

class MostDerived: public Derived, public Derived1, public Derived2{};

class VBase{};

class VDerieved: public virtual VBase{

};

class VDerieved1: public virtual VBase{
};


class VMostDerieved: public VDerieved, public VDerieved1{

};

int main(){
  std::string s = "test string";
  std::cout << "sizeof(s) "<< sizeof(s) << '\n'; // 24
  std::cout << "s.size() " << s.size() << '\n'; // 11


  std::vector<int> v{1,2,3,4,5};
  std::cout << "sizeof(v) "<< sizeof(v) << '\n'; // 24
  std::cout << "v.size() " << v.size() << '\n'; // 5

  std::set<int> set{1,2,3,4,5};
  std::cout << "sizeof(set) "<< sizeof(set) << '\n'; // 24
  std::cout << "set.size() " << set.size() << '\n'; // 5

  std::queue<int> queue;
  for(int i = 0; i < 5; ++i) queue.push(i);
  std::cout << "sizeof(queue) "<< sizeof(queue) << '\n'; // 48
  std::cout << "queue.size() " << queue.size() << '\n'; // 5

  std::forward_list<int> fl;
  auto it = fl.before_begin();
  it = fl.emplace_after ( it, 100 );
  it = fl.emplace_after ( it, 200 );
  it = fl.emplace_after ( it, 300 );
  std::cout << "sizeof(fl) "<< sizeof(fl) << '\n'; // 8

// Size of X is 4 because there is a single data member in the class/struct
  std::cout << "sizeof(X) "<< sizeof(X) << '\n'; // 4

// Size of Y should be 12 i.e. addition of int (4) + double (8)= 12
// but its size if 16 because the compiler aligns the class/struct
// by multiples of its max member.
  std::cout << "sizeof(Y) "<< sizeof(Y) << '\n'; // 16

// Size of a packed structure = sum of sizes of member variables
  std::cout << "sizeof(YPacked) "<< sizeof(YPacked) << '\n'; // 12

  std::unique_ptr<int> uniq_ptr = std::make_unique<int>(10);
  std::cout << "sizeof(uniq_ptr) " << sizeof(uniq_ptr) << '\n'; // 8

  std::shared_ptr<int> sh_ptr = std::make_shared<int>(10);
  std::cout << "sizeof(sh_ptr) " << sizeof(sh_ptr) << '\n'; // 16

  std::weak_ptr<int> w_ptr = std::make_shared<int>(10);
  std::cout << "sizeof(w_ptr) " << sizeof(w_ptr) << '\n'; // 16

  std::cout << "sizeof(EmptyClass) "<< sizeof(EmptyClass) << '\n'; // 1 
  std::cout << "sizeof(VFunClass) " << sizeof(VFunClass) << '\n'; // 8
  std::cout << "sizeof(VDesttructorClass) " << sizeof(VDesttructorClass) << '\n'; // 8

// Size of most Derived class in the hierarchy with 3 NON virtual 
// Base classes has 2 * 1 bype per class Hence the size if 3
  std::cout << "sizeof(MostDerived) " << sizeof(MostDerived) << '\n'; // 3

// Size of most Derived class in the hierarchy with 2 virtual Base classes
// has 2 * 8 bype (v_ptr). Hence the size if 16.
std::cout << "sizeof(VMostDerived) " << sizeof(VMostDerived) << '\n'; // 16

// Size of class with only static variable is 1 i.e.
// it does not include size of static in the size of class.
  std::cout << "sizeof(StaticMember) " << sizeof(StaticMember) << '\n'; // 1

  std::atomic<int> i = 10;
  std::cout << "sizeof(std::atomic<int>) " << sizeof(i) << '\n'; // 4

//size of packed atomic structure is 16
  std::atomic<double> d;
  std::cout << "sizeof(std::atomic<double>) " << sizeof(std::atomic<double>) << '\n'; // 16

//size of packed atomic structure is 16
  std::atomic<YPacked> yp;
  std::cout << "sizeof(std::atomic<YPacked>) " << sizeof(std::atomic<YPacked>) << '\n'; // 16
}

CRTP a.k.a. Static Polymorphism

The size of CRTP class is 1 compared to 8 in virtual function/class.

template <typename T>
class Logger{
  public:
    void log(){
      static_cast<T*>(this)->log();
    }
};

class FileLogger: public Logger<FileLogger>{
  public:
    void log(){
      std::cout << "In the class FileLogger\n";
    }
};

int main(){
  std::cout << sizeof(FileLogger) << '\n'; // Size is 1
  FileLogger fl;
  r43fl.log();
}

Sublime Build System

{
"cmd": "g++ -std=c++17 \"${file}\" -o \"${file_path}/${file_base_name}\" && \"${file_path}/${file_base_name}\"",
   "file_regex": "^(..[^:]*):([0-9]+):?([0-9]+)?:? (.*)$",
   "working_dir": "${file_path}",
   "selector": "source.c, source.c++",
   "shell": true,
   "variants":
   [
    
     {
       "name": "Run",
      "cmd": "g++ -std=c++17 \"${file}\" -o \"${file_path}/${file_base_name}\" && \"${file_path}/${file_base_name}\""
     }
   ]
  }

Vector back inserter example + algorithm

The problem is pretty simple: there is a list of time ranges for meetings and need to show the meetings as continuous block in calendar.

70940aec3ea953621db1b8faf619a5ca_2

As per modern c++ I have used std::back_inserter_iterator for vector instead of passing vector by value or by reference. This approach is much cleaner, modern and saves a lot of copying of data compared to old style.

// i/p: [(0, 1), (3, 5), (4, 8), (10, 12), (9, 10)]
// o/p: [(0, 1), (3, 8), (9, 12)]

#include <iostream>
#include <vector>
#include <algorithm>

struct Meeting; // define meeting

using it = std::vector<Meeting>::iterator;
// use an back inserter iterator to pass as function argument instead
// of passing vector by reference or creating vector on heap
using bi = std::back_insert_iterator<std::vector<Meeting>>;

struct Meeting{
  int start_time, end_time;
  // default constructor
  Meeting() = default;
 // parameterized constructor
  Meeting(int start_time, int end_time):start_time{start_time},end_time{end_date}{}
  
  // override < operator used for sorting.
  bool operator <(const Meeting &other) const{
    return start_date < other.start_date;
  }

static void merge(it start, it finish, bi merged_meeting_bi){
  int start_time = start->start_time;
  int end_time = start->end_time;
  while(start < finish){
    start++;
    if(start->start_time > end_time){
      merged_meeting_bi = {start_time, end_time}; // create meeting using parameterized constructor
      start_time = start->start_time; 
      end_time = start->end_time;
    }else{
      end_time = std::max(start->end_time, end_time);
     }
   }
  }
};

int main(){
  std::vector<Meeting> meetings{{0, 1}, {3, 5}, {4, 8}, {10, 12}, {9, 10}};
  std::vector<Meeting> merged_meetings;
  auto bi = std::back_insert_iterator<std::vector<Meeting>>(merged_meetings);
  auto start = meetings.begin();
  auto finish = meetings.end();
  // sort by first time
  std::sort(start, finish);
  
for(const auto &meeting : meetings){
std::cout << meeting.start_time << ' ' << meeting.end_time << '\n';
}
Meeting::merge(start, finish, bi);

for(const auto & meeting: merged_meetings){
std::cout << meeting.start_time << ' ' << meeting.end_time << '\n';
}
return 0;
}

Dutch National Flag – Algorithm + TDD

The algorithm is pretty simple and achievable in O(n) time where n is the number of elements. Instead of using indexes, I am using 3 set of iterators and added bunch of test cases.

#include <gtest/gtest.h>
#include <algorithm> // random_shuffle
#include <vector>
#include <numeric> // accumulate

using it = std::vector<int>::iterator; // using typedef

void dnf(it less, it equal, it greater, int pivot){
  while(equal <= greater){
    if(*equal < pivot){
      std::iter_swap(less, equal);
      less++;
      equal++;
    }else if(*equal == pivot){
      equal++;
    }else{
      std::iter_swap(equal, greater);
      greater--;
    }
  }
}

TEST(RandomTest, dnfTest){
  std::vector<int> v{2, 2, 0, 1, 1, 0, 2, 1, 0, 1, 0, 1, 0, 2, 2};
  int pivot = 1;
  for(int i = 0; i < 1000; i++){
    dnf(v.begin(), v.begin(), v.end()-1, 1);
    std::string output = std::accumulate(std::next(v.begin()), v.end(), 
                               std::to_string(*v.begin()),
                               [](std::string s, int i){return s+= std::to_string(i);});

    EXPECT_EQ(output, "000001111122222");
  }
}

TEST(RandomTestTwoNumbers, dnfTest){
  std::vector<int> v{0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1};
  int pivot = 1;
  for(int i = 0; i < 1000; i++){
    dnf(v.begin(), v.begin(), v.end()-1, 1);
    std::string output = std::accumulate(std::next(v.begin()), v.end(), 
                               std::to_string(*v.begin()),
                               [](std::string s, int i){return s+= std::to_string(i);});

    EXPECT_EQ(output, "000000001111111");
  }
}

TEST(RandomTestTwoNumbersMidPivot, dnfTest){
  std::vector<int> v{0, 0, 0, 0,0,9,9,9,9,9,9,9,9};
  int pivot = 1;
  for(int i = 0; i < 1000; i++){
    dnf(v.begin(), v.begin(), v.end()-1, 1);
    std::string output = std::accumulate(std::next(v.begin()), v.end(), 
                               std::to_string(*v.begin()),
                               [](std::string s, int i){return s+= std::to_string(i);});

    EXPECT_EQ(output, "0000099999999");
  }
}

TEST(RandomTestTwoNumbersHighPivot, dnfTest){
  std::vector<int> v{0, 0, 0, 0,0,9,9,9,9,9,9,9,9};
  int pivot = 10;
  for(int i = 0; i < 1000; i++){
    dnf(v.begin(), v.begin(), v.end()-1, 1);
    std::string output = std::accumulate(std::next(v.begin()), v.end(), 
                               std::to_string(*v.begin()),
                               [](std::string s, int i){return s+= std::to_string(i);});

    EXPECT_EQ(output, "0000099999999");
  }
}

int main(int argc, char**argv, char**envArg)
{
    testing::InitGoogleTest(&argc, argv);
    return(RUN_ALL_TESTS());
}

Iterators – look and say sequence algorithm

Look and say sequence is fun little exercise.. here is the code

#include <iostream>
#include <vector>
#include <string>
#include <numeric>

std::string looknsay(int num){
  // initialize vector with vector of ints.
  // add default seed value to it [1].
  std::vector<std::vector<int>> look_n_say_seq{{1}};
  // iterate over until the number n is reached.
  for(int n = 0; n <= num; n++){
    auto latest_sequence = look_n_say_seq.back();
    std::vector<int> next_sequence;
   // keep a pair of iterators to move across the latest sequence from vector.
    auto i = latest_sequence.begin();
    auto j = latest_sequence.begin();
    while(i < latest_sequence.end()){
      for(;*i==*j && j < latest_sequence.end();++j){}
      // distance will give the count
      next_sequence.push_back(std::distance(i, j));
      // *i will give the actual number to iterate on
      next_sequence.push_back(*i);
      i = j;
    }
    // insert newly created sequence into the vector.
    look_n_say_seq.push_back(next_sequence);
  }
  auto latest_sequence = look_n_say_seq.back();
 // accumulate - join vector of ints to make string and return
  return std::accumulate(std::next(latest_sequence.begin()), 
                         latest_sequence.end(),
                          std::to_string(*latest_sequence.begin()),
                          [](std::string s, int i){return s += std::to_string(i);});
}
int main(){
  std::cout << looknsay(0) << '\n';
  std::cout << looknsay(1) << '\n'; // 11
  std::cout << looknsay(2) << '\n'; // 21
  std::cout << looknsay(3) << '\n'; // 1211
  std::cout << looknsay(4) << '\n'; // 111221
  std::cout << looknsay(5) << '\n'; // 312211
}

Implicit type conversion – tricky scenario

Here are some of the scenarios where the implicit type conversion can be unintuitive and tricky.

implicit conversion between int, float and bool

#include <iostream>

void print(char x){
  std::cout << "in the print method that has char arg\n";

}

void print(int x){
  std::cout << "in the print method that has int arg\n";
}

void print(bool x){
  std::cout << "in the print method that has bool arg\n";
}

int main(){
int x =1;
char ch = 'c';
char * c = &ch;
print(*c); // in the print method that has char arg
print(*c+1); //in the print method that has int arg
return 0;
}

c++ OOP Grammar

The c++ language is a pure object oriented language. c++ has a lot of rules and extensive grammar. It’s very hard to write exhaustive list but I am including here what I understood.

Constructors:

c++compiler provides 5 types of constructors if user has not provide any.
c++ constructors has to be public in order to be able to make instances of the class.
Example:
```
class Vehicle{};
```
1. Default Constructor:
  - c++ compiler provides a default constructor without any parameters.
  - If any of other constructors are defined by user, compiler does not provide the default constructor.
  - if user provides any (parameterized or non-parameterized ) constructor,
    the compiler still provides other 4 constructors.
  - ```
  class Vehicle {
    public: 
      Vehicle(){};
  };
```
2. Copy Constructor:
  - c++ compiler provides a copy constructor that does member wise copy.
  - the parameter must be reference otherwise there is a compiler error.
  - the parameter should be CONST but not necessary.
  - if a copy constructor is provided without default/parameterized constructor, then user must provide default/parameterized constructor.
  - If an object is not created and assigned to another object, then copy assignment is called.
  - ```
  class Vehicle {
    public: 
      Vehicle(const Vehicle &v){};
      Vehicle(const Vehicle& obj){
      std::cout << "copy constructor called..\n";
    }
  };
```
- If a class has reference or const members, then compiler does not provide default constructor. Example:
- ```
class A{
  int & _a;
  public:
    A(int & a):_a{a}{};
    // We need to use initializer list in copy
   // constructor to initialize/copy a reference
   // and a const member variable.
    A(const A& rhs):_a{rhs._a}{};
    A& operator=(const A& rhs){
      _a = rhs._a;
      return *this;
    };
};

class B{
  public:
    int * _b;
    B(int * b):_b{b}{};
};


class C{
  public:
    const int _c;
    C(const int c):_c{c}{};
};


int main(){
  int x = 1;
  A a(x);

  B b(&x);
  B b1 = b;

  std::cout << *b1._b << '\n' ;

  C c(10);
}
```
3. Copy assignment operator
  - c++ compiler provides a copy assignment operator that does member wise copy.
  - the parameter should be CONST but not necessary.
  - if a copy assignment operator is provided without default/parameterized constructor, then default/parameterized constructor is OPTIONAL.
  - If an object is created and then assigned to another object, then copy assignment is called.
  - ```
  class Vehicle {
  public: 
    Vehicle& operator=(constVehicle&& obj){
      std::cout << "move assignment operator called..\n";
      return *this;
    }
  };
```
4. Move constructor
  - c++ compiler provides a move constructor that moves the member variables.
  - If an object is not created and then moved to another object, then move operator is called.
  - if a move constructor is provided without default/parameterized constructor, then user must provide default/parameterized constructor .
5. Move assignment Operator
  - c++ compiler provides a move assignment operator that moves the member variables.
  - If an object is created and then moved to another object, then move operator is called.
  - if a move assignment operator is provided without default/parameterized constructor, then default/parameterized constructor is OPTIONAL.

Following is code with all the examples:

class MyClass{
  public:
  MyClass() = default;

  MyClass(const MyClass& obj){
    std::cout << "copy constructor called..\n";
  }

  MyClass(MyClass && obj){
    std::cout << "move constructor called..\n"; 
  };

  MyClass& operator=(const MyClass& obj){
    std::cout << "copy assignment operator called..\n";
    return *this;
  }

  MyClass& operator=(const MyClass&& obj){
    std::cout << "move assignment operator called..\n";
    return *this;
  }
};

int main(){
  MyClass obj;
  MyClass obj2(obj); // copy assignment operator called..

  MyClass obj3;
  obj3 = obj; // copy assignment operator called..

  MyClass obj4;
  obj4 = std::move(obj); // move assignment operator called..

  MyClass obj5(std::move(obj2)); // move constructor called..

  std::cout << "test \n";
  return 0;
}

Converting Constructor

If a class provides parameterized constructor then it works as converting constructor. It can be used to instantiate the object but it is confusing. Example:

class X{
  int i;
  public:
    X(int i):i(i){}
    void print(){
      std::cout << i << '\n';
    }
};

int main(){
  X x = 20; // confusing!!
  x.print();
  return 0;
}

Use explicit keyword in order to prevent the converting constructor.

Destructor

destructor is provided for class unless user has provided one. Virtual destructors should be provided in order to achieve polymorphic behavior. The subclass object should be deleted with a base class pointer.

With a virtual destructor defined, the subclass destructor is called before base class.

Method overriding

if a sub-class defines a method with same name as from base class, then the method HIDES ALL the methods with the same name from base class (regardless of signatures of the methods.).
if a method is defined as a public method in base class and overridden as private or protected in the sub classes, it still will be publicly available.
Const and non const methods are different from each other.

Methods should be defined as ‘virtual’ in order to achieve polymorphic behavior.
Example:

class BaseClass{
  public:
  virtual void vprint(){
    std::cout << "in the BaseClass\n";
  }
  void print(){
    std::cout << "in the BaseClass\n";
  }
};

class SubClass: public BaseClass{
  public:
  virtual void vprint(){
    std::cout << "in the SubClass\n";
  }
  void print(){
    std::cout << "in the SubClass\n";
  }
};

void global_print(BaseClass & b){
  b.print(); //in the BaseClass
  b.vprint();// in the SubClass
}

int main(){
  BaseClass * b = new SubClass();
  b->print(); //in the BaseClass
  b->vprint(); // in the SubClass
}

override keyword

c++11 introduced override keyword which can be used after method name in order to specify the method is overridden. If the keyword is not used, the virtual method can still be overridden in subclass. The advantages of override keyword:

override keyword ensures the method with same signature is defined in base class else it gives compilation error.
Its good documentation.

Method Overloading

Methods with (same name and ) default values to parameters are consider ambiguous. Example:

   int print(char x){
    std::cout << "in the print method that returns int\n";
   }
   int print(char x = 'c'){
      std::cout << "in the print method that returns int\n";
   }
};

Inheritance

Base class public member variables can not initialized using initializer lists.

class MyBaseClass{
  public:
  int _b;
};

class MyClass : public MyBaseClass{
  public:
  int _m;
  MyClass(int m): _m{m}, _b{0}{} // compiler error: 
};

They should be instantiated by following ways:

class MyBaseClass{
  public:
  int _b;
  MyBaseClass(int b):_b{b}{}
};

class MyClass : public MyBaseClass{
  public:
  int _m;
  MyClass(int m, int b): _m{m}, MyBaseClass(b){}; // base class needs to have a parameterized constructor.
  // or MyClass(int m, int b): _m{m}{
  //    _b = b;
  //  };
};

General method rules

constructors can not be virtual.
destructors should be virtual if the class is going to get inherited and polymorphic behavior is expected.
If a method takes non-const reference then a rvalue constant can not be passed to it. example:

void print(int& i){
  std::cout << i << '\n';
}

int main(){
  print(10);
  return 0;
}

//Solution: use constant reference like const int & i or int && i

Some methods like printing output to console using ostream << operator, the method must be defined as a friend method.

class MyClass{
  int i;
  public:
    MyClass():i{18}{};
    friend std::ostream& operator << (std::ostream & o, MyClass & obj){
      o << obj.i << '\n';
    return o;
    }
};

int main(){
  MyClass m;
  std::cout << m << '\n';
  return 0;
}

class defined with final keyword is not inheritable.

class Class final{}; // this class is not inheritable now

Static Variables

static variables should be initialized outside the class definition. This is generally done in .cpp (implementation file). If the static members of class are not defined, then they generate an error like undefined member or reference. Example:
```
class Class{
public:
  static int count;

  void print(){
    count++;
    std::cout << count << '\n';
  }
};

int Class::count = 10;

int main(){
  Class c;
  c.print();
  std::cout << Class::count<< '\n';
}
```

Static Methods

Static methods can not be virtual because polymorphic behavior is limited to instances.

static methods can not be const else following error is generated.

class StaticConst{
  static print() const {
    std::cout << "inside static const function \n";
  }
};

StaticConst::print(); // static member function cannot have 'const' qualifier

Static methods can have same name and signature as of non-static methods.

Sizeof Rules

if a class/struct does not have any member variables in it, its size is generally 1.
if a class/struct has a virtual method(s) then its size increases by 8 bytes in 64 bit linux/unix system. So a class with no member variables and 1+ any number of virtual methods, the size is 8 bytes.
Static variables size is not considered in the size of class/object.

Example:

class Class{
  public:
  static int count;

  virtual void print(){
    std::cout << count << '\n';
  }
};

int Class::count = 10;

int main(){
  Class c;
  c.print();
  std::cout << sizeof(c)<< '\n'; //8
}

Const member variables

if the const variables are defined in a class, they must be initialized else using them causes compilation error. Example:

class X{
  public:
  const int m, n;

  void print(){
  std::cout << "m = " << m << " n = " << n << '\n';
  }
};

// error: call to implicitly-deleted default constructor of 'X'
  X x;
// default constructor of 'X' is implicitly deleted because field 'm' of const-qualified type 'const int' would not be initialized
    const int m, n;

Automatic type casting

c++ allows method overloading, means the methods can have a same name but should have different signature (different return type and parameters). The automatic type casting has issues in defining method names including constructors.

For an example:

int – float and vice versa
int – double and vice versa
int – char and vice versa
any user defined type conversion

because of the automated type casting, methods(including constructors) with same name and castable types does not compile, example:

following code does not compile:

class MyClass{
  int x;
public:
   int print(char x){
    std::cout << "in the print method that returns int\n";
    return 0;
   }

   char print(char x){
    std::cout << "in the print method that returns char\n";
    return 'c';
   }

   double print(char x){
    std::cout << "in the print method that returns double\n";
    return 0.0;
   }
};

// Solution: use templates in place of defining so many methods with
// automatic type casted methods.

Initializer list

Initializer list must be used when using following types of member variable
- the member variable is const like const int or const char
- the member variable is a reference
- initialization of base class.
- name of variable is same as member function

Default and Delete keywords

c++11 provides default and delete keywords to control the method declaration.

non-parameterized constructors can be defined using default method as

class MyClass{ 
  int x; 
  public:
    MyClass() = default; // this class has both constructors
    MyClass(char x){ };

Delete keyword introduced in c++11 is very useful like
- making objects only movable and not copyable
- implementing very famous design pattern like singleton.
- deleting operator overridden methods for various purposes.
  - delete & operator method to stop users from taking address of objects.

Function Pointers

Way to call a function pointer is little bit cryptic. Example:
There is a struct with one function that returns addition. To store the pointer we can use auto but calling is the function is done as follows:

struct S{
  const int i = 10;
  S(const int _i):i{_i}{};

  int add(int x){
    return x+i;
  }
};

  S s(11);
  auto fptr = &S::add;
  std::cout << (s.*fptr)(11) << '\n'; // prints 22

C-STATE	RESIDENCY	WAKE-UP LATENCY
C0	ACTIVE	ACTIVE
C1	1 μs	1 μs
C3	106 μs	80 μs
C6	345 μs	104 μs