Non Uniform Memory Access

NUMA is a shared memory architecture used in today’s multiprocessing systems. Each CPU is assigned its local memory and can access memory from other CPUs in the system. Local memory access provides the best performance; it provides low latency and high bandwidth. Accessing memory that is owned by the other CPU has a performance penalty, higher latency, and lower bandwidth.

  • Access to local memory is fast, more latency for remote memory
  • Practically all multi-socket systems have NUMA
  • Most servers have 1 NUMA node / socket
  • Some AMD systems have 2 NUMA nodes / socket
  • Sometimes optimal performance still requires manual tuning.

Typical 4 node NUMA system.

Screen Shot 2018-03-14 at 7.51.21 PM

Image credit:

Processors and Memory layout in NUMA system

Screen Shot 2018-03-14 at 11.53.58 PM

Image credit:


Screen Shot 2018-03-14 at 11.58.39 PM

Image credit:

To see the NUMA nodes and which cpus are under numa nodes:

Screen Shot 2018-03-14 at 8.02.02 PM

To see the CPUs under NUMA

Screen Shot 2018-03-14 at 8.05.09 PM

NUMA system represented on my machine.


NUMA related tools

Screen Shot 2018-03-14 at 11.00.34 PM

Numa Settings

  • BalancingTurn off: echo 0 > /proc/sys/kernel/numa_balancing
  • NUMA locality (hit vs miss, local vs foreign)
    • Number of NUMA faults & page migrations
    • /proc/vmstat numa_* fields
  • Location of process memory in NUMA system
    • /proc/<pid>/numa_maps
  • Numa scans, migrations & numa faults by node
    • /proc/<pid>/sched


More info here


Author: Saurabh Purnaye

VP - Low Latency Developer @jpmchase... Linux, C++, Python, Ruby. pursuing certificate in #QuantFinance and Passed CFA L1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: