================= Memory Management ================= `View slides `_ .. slideconf:: :autoslides: False :theme: single-level Lecture objectives: =================== .. slide:: Memory Management :inline-contents: True :level: 2 * Physical Memory Management * Page allocations * Small allocations * Virtual Memory Management * Page Fault Handling Overview Physical Memory Management ========================== .. slide:: Physical Memory Management :inline-contents: True :level: 2 * Algorithms and data structure that keep track of physical memory pages * Independent of virtual memory management * Both virtual and physical memory management is required for complete memory management * Physical pages are being tracked using a special data structure: :c:type:`struct page` * All physical pages have an entry reserved in the :c:data:`mem_map` vector * The physical page status may include: a counter for how many times is a page used, position in swap or file, buffers for this page, position int the page cache, etc. Memory zones ------------ .. slide:: Memory zones :inline-contents: True :level: 2 * DMA zone * DMA32 zone * Normal zone (LowMem) * HighMem Zone * Movable Zone Non-Uniform Memory Access ------------------------- .. slide:: Non-Uniform Memory Access :inline-contents: True :level: 2 * Physical memory is split in between multiple nodes, one for each CPU * There is single physical address space accessible from every node * Access to the local memory is faster * Each node maintains is own memory zones (.e. DMA, NORMAL, HIGHMEM, etc.) Page allocation --------------- .. slide:: Page allocation :inline-contents: True :level: 2 .. code-block:: c /* Allocates 2^order contiguous pages and returns a pointer to the * descriptor for the first page */ struct page *alloc_pages(gfp_mask, order); /* allocates a single page */ struct page *alloc_page(gfp_mask); /* helper functions that return the kernel virtual address */ void *__get_free_pages(gfp_mask, order); void *__get_free_page(gfp_mask); void *__get_zero_page(gfp_mask); void *__get_dma_pages(gfp_mask, order); .. slide:: Why only allocate pages in chunks of power of 2? :inline-contents: True :level: 2 * Typical memory allocation algorithms have linear complexity * Why not use paging? * Sometime we do need contiguous memory allocations (for DMA) * Allocation would require page table changes and TLB flushes * Not able to use extended pages * Some architecture directly (in hardware) linearly maps a part of the address space (e.g. MIPS) .. slide:: The buddy algorithm :inline-contents: True :level: 2 * Free blocks are distributed in multiple lists * Each list contains blocks of the same size * The block size is a power of two .. slide:: Allocating a block of size N :inline-contents: True :level: 2 * If there is a free block in the N-size list, pick the first * If not, look for a free block in the 2N-size list * Split the 2N-size block in two N-size blocks and add them to the N-size list * Now that we have the N-size list populated, pick the first free block from that list .. slide:: Freeing a block of size N :inline-contents: True :level: 2 * If the "buddy" is free coalesce into a 2N-size block * Try until no more free buddy block is found and place the resulting block in the respective list .. slide:: The Linux implementation :inline-contents: True :level: 2 * 11 lists for blocks of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 pages * Each memory zone has its own buddy allocator * Each zone has a vector of descriptors for free blocks, one entry for each size * The descriptor contains the number of free blocks and the head of the list * Blocks are linked in the list using the `lru` field of :c:type:`struct page` * Free pages have the PG_buddy flag set * The page descriptor keeps a copy of the block size in the private field to easily check if the "buddy" is free Small allocations ----------------- .. slide:: Small allocations :inline-contents: True :level: 2 * Buddy is used to allocate pages * Many of the kernel subsystems need to allocate buffers smaller than a page * Typical solution: variable size buffer allocation * Leads to external fragmentation * Alternative solution: fixed size buffer allocation * Leads to internal fragmentation * Compromise: fixed size block allocation with multiple sizes, geometrically distributed * e.g.: 32, 64, ..., 131056 .. slide:: The SLAB allocator :inline-contents: True :level: 2 * Buffers = objects * Uses buddy to allocate a pool of pages for object allocations * Each object (optionally) has a constructor and destructor * Deallocated objects are cached - avoids subsequent calls for constructors and buddy allocation / deallocation .. slide:: Why SLAB? :inline-contents: True :level: 2 * The kernel will typically allocate and deallocate multiple types the same data structures over time (e.g. :c:type:`struct task_struct`) effectively using fixed size allocations. Using the SLAB reduces the frequency of the more heavy allocation/deallocation operations. * For variable size buffers (which occurs less frequently) a geometric distribution of caches with fixed-size can be used * Reduces the memory allocation foot-print since we are searching a much smaller memory area, compared to buddy which can span over a larger area * Employs cache optimization techniques (slab coloring) .. slide:: Slab architecture :inline-contents: True :level: 2 .. image:: ../res/slab-overview.png .. slide:: Cache descriptors :inline-contents: True :level: 2 * A name to identify the cache for stats * object constructor and destructor functions * size of the objects * Flags * Size of the slab in power of 2 pages * GFP masks * One or mores slabs, grouped by state: full, partially full, empty .. slide:: SLAB descriptors :inline-contents: True :level: 2 * Number of objects * Memory region where the objects are stored * Pointer to the first free object * Descriptor are stored either in * the SLAB itself (if the object size is lower the 512 or if internal fragmentation leaves enough space for the SLAB descriptor) * in generic caches internally used by the SLAB allocator .. slide:: Slab detailed architecture :inline-contents: True :level: 2 .. image:: ../res/slab-detailed-arch.png .. slide:: Generic vs specific caches :inline-contents: True :level: 2 * Generic caches are used internally by the slab allocator * allocating memory for cache and slab descriptors * They are also used to implement :c:func:`kmalloc` by implementing 20 caches with object sizes geometrically distributed between 32bytes and 4MB * Specific cache are created on demand by kernel subsystems .. slide:: Object descriptors :inline-contents: True :level: 2 .. image:: ../res/slab-object-descriptors.png .. slide:: Object descriptors :inline-contents: True :level: 2 * Only used for free objects * An integer that points to the next free object * The last free object uses a terminator value * Internal descriptors - stored in the slab * External descriptors - stored in generic caches .. slide:: SLAB coloring :inline-contents: True :level: 2 .. image:: ../res/slab-coloring.png Virtual memory management ========================= .. slide:: Virtual memory management :inline-contents: True :level: 2 * Used in both kernel and user space * Using virtual memory requires: * reserving (allocating) a segment in the *virtual* address space (be it kernel or user) * allocating one or more physical pages for the buffer * allocating one or more physical pages for page tables and internal structures * mapping the virtual memory segment to the physical allocated pages .. slide:: Address space descriptors :inline-contents: True :level: 2 |_| .. ditaa:: :--no-separation: +------------------+ +------------+ | Address space | | |-------------->+------------+ | descriptor | +------------+ | | +------------------+ | | Page +------------+ | +------------+ tables | | +------------------+--------------+ | ... | +------------+ | | +------------+ | ... | v v | |-------+ +------------+ +------------+ +------------+ +------------+ | | | | Area | | Area | | +------------+ | descriptor | | descriptor | | +------------+ +------------+ | | | +-------------+------------------+ +------>+------------+ | | | | v v +------------+ +------------+ +------------+ | | | Area | | Area | +------------+ | descriptor | | descriptor | | ... | +------------+ +------------+ +------------+ | | | +-----------+-----------+ +------------+ | | v v +------------+ +------------+ | Area | | Area | | descriptor | | descriptor | +------------+ +------------+ .. slide:: Address space descriptors :inline-contents: True :level: 2 * Page table is used either by: * The CPU's MMU * The kernel to handle TLB exception (some RISC processors) * The address space descriptor is used by the kernel to maintain high level information such as file and file offset (for mmap with files), read-only segment, copy-on-write segment, etc. .. slide:: Allocating virtual memory :inline-contents: True :level: 2 * Search a free area in the address space descriptor * Allocate memory for a new area descriptor * Insert the new area descriptor in the address space descriptor * Allocate physical memory for one or more page tables * Setup the page tables for the newly allocated area in the virtual address space * Allocating (on demand) physical pages and map them in the virtual address space by updating the page tables .. slide:: Freeing virtual memory :inline-contents: True :level: 2 * Removing the area descriptor * Freeing the area descriptor memory * Updating the page tables to remove the area from the virtual address space * Flushing the TLB for the freed virtual memory area * Freeing physical memory of the page tables associated with the freed area * Freeing physical memory of the freed virtual memory area .. slide:: Linux virtual memory management :inline-contents: True :level: 2 * Kernel * vmalloc * area descriptor: :c:type:`struct vm_struct` * address space descriptor: simple linked list of :c:type:`struct vm_struct` * Userspace * area descriptor: :c:type:`struct vm_area_struct` * address space descriptor: :c:type:`struct mm_struct`, red-black tree Fault page handling =================== .. slide:: Linux virtual memory management :inline-contents: True :level: 2 .. image:: ../res/page-fault-handling.png