For x86 Architecture there are 3 types of addresses:
- Logical Address:
- This address is address of an instruction or data in machine language
- This address consist of a segment and an offset i.e. distance from segment start address.
- Linear address or Virtual address:
- This address is a binary number in virtual memory that enables a process to use a location in main memory independently of other processes and to use more space than actually exists in primary storage by temporarily relegating some contents to a hard disk or internal flash drive.
- Physical Address:
- Address of the memory cells in RAM of the computer.
- The main memory (RAM) available for a computer is limited.
- Many processes use a common code in libraries.
- Using Virtual addressing, a CPU and Kernel gives an impression to a process that the memory is unlimited.
- Since 2 out of 3 address are virtual mentioned above, there is a need for address translation from Logical to Linear and Linear to Physical address.
- For this reason each CPU contains a hardware names as Memory Management Unit (MMU).
- Segmentation Unit: converts the Logical address to Linear.
- Paging Unit: converts Linear address to Physical.
- The address translation from linear address is done using two translation tables
- Page Directory
- Page Table
Linear-Address Translation to a 4-KByte Page using IA-32e Paging
Formats of CR3 and Paging-Structure Entries with PAE Paging
Image Credit: By Mdjango, Andrew S. Tanenbaum (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
- Both segmentation and Paging is redundent and hence Linux use segmentation in limited way.
- Segmentation can assign a different linear address space to each process
- Paging can map the same linear address space into different physical address spaces.
- Each segment is described by 8-byte Segment Descriptor.
- These descriptors are defined in Global Descriptor Table (GDT) or in the Local Descriptor Table(LDT).
Image credit: By John Källén (jkl at commons) (Own work) [Public domain], via Wikimedia Commons
- Code Segment Descriptor
- Data Segment Descriptor
- Task State Segment Descriptor
image credit https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
- GDT Global Descriptor Table
- There is one GDT per CPU in Linux.
- There are 18 descriptors in each GDT for various purposes as follows:
- Descriptors for Kernel code and User code.
- Task State Segment (TSS)
- default Local Descriptor Table (LDT)
- Thread-Local Storage (TLS) segments
- Advanced Power Management (APM )
- Plug and Play (PnP ) BIOS services
- Special segment used for handling exceptions.
Image Credit: Lars H. Rohwedder (User:RokerHRO – selfmade work
- paging unit translates linear addresses into physical ones
- For efficiency the linear addresses are divided in fixed length intervals called as pages. These continuous linear addresses within a page are mapped into continuous physical addresses.
- Page frames: main memory is divided into fixed lenght page frames. Each page frame contains a page. Page is just block of data in memory or disk.
- page tables: The data structures that map linear to physical addresses.
- Page Size: 4 KB
- Huge Pages: 2 MB and 1 GB.
- Page address (32 bit) = Directory (10 bits) + Page Table (10 bits) + Offset(12 bit)
- In order to use more than 4 GB memory Intel started using 36 pins for address translation effectively supporting more addresses.
- There are atleast 3 levels of caches supported in modern microprocessor.
- The caches work on principle of spatial locality and principle of temporal locality.
- Cache is devided into cache line generally of 64 bytes.
- Most of caches are N-way associative.
- Cache unit resides between the paging unit and the main memory.
- It includes both a hardware cache memory and a cache controller.
- Each cache line has a tag and some flags that stores the status of cache line.
- CPUs first look for address in cache before looking into main memory.
- Flushing cache is done using write Back mechanism which is more efficient. Only cache entry is updated by CPU and the main memory is updated eventually.
- This is kind of a cache used for storing recently converted addresses between linear to physical.
- The Address Translation unit first looks in TLB for physical address for given linear address if not found, the hardware goes through page tables to find the page.
- Linux stores pages using 4 levels
- Page Global Directory
- Page Upper Directory
- Page Middle Directory
- Page Table
- Each process has its own Page Global Directory and its own set of Page Tables.
- Linear addresses from 0x00000000 to 0xbfffffff can be addressed when the process runs in either User or Kernel Mode.
- Linear addresses from 0xc0000000 to 0xffffffff can be addressed only when the process runs in Kernel Mode.