SO2 Lecture 12 - Virtualization

View slides

Lecture objectives:

  • Emulation basics
  • Virtualization basics
  • Paravitualization basics
  • Hardware support for virtualization
  • Overview of the Xen hypervisor
  • Overview of the KVM hypervisor

Emulation basics

  • Instructions are emulated (each time they are executed)
  • The other system components are also emulated:
    • MMU
    • Physical memory access
    • Peripherals
  • Target architecture - the architecture that it is emulated
  • Host architecture - the architecture that the emulator runs on
  • For emulation target and host architectures can be different

Virtualization basics

  • Defined in a paper by Popek & Goldberg in 1974
  • Fidelity
  • Performance
  • Security
../_images/ditaa-91f08f7db4b54069e16694eab8d75c06400fc47b.png

Classic virtualization

  • Trap & Emulate
  • Same architecture for host and target
  • Most of the target instructions are natively executed
  • Target OS runs in non-privilege mode on the host
  • Privileged instructions are trapped and emulated
  • Two machine states: host and guest

Software virtualization

  • Not all architecture can be virtualized; e.g. x86:
    • CS register encodes the CPL
    • Some instructions don't generate a trap (e.g. popf)
  • Solution: emulate instructions using binary translation

MMU virtualization

  • "Fake" VM physical addresses are translated by the host to actual physical addresses
  • The guest page tables are not directly used by the host hardware
  • VM page tables are verified then translated into a new set of page tables on the host (shadow page tables)

Shadow page tables

 

../_images/ditaa-8632e22c6d89bd18f97c9cef127444486b5077df.png

Lazy shadow sync

  • Guest page tables changes are typically batched
  • To avoid repeated traps, checks and transformations map guest page table entries with write access
  • Update the shadow page table when the TLB is flushed

I/O virtualization

 

../_images/ditaa-bb69666d75b9670e542682753fb8cc9b77ff8894.png

Paravirtualization

  • Change the guest OS so that it cooperates with the VMM
    • CPU paravirtualization
    • MMU paravirtualization
    • I/O paravirtualization
  • VMM exposes hypercalls for:
    • activate / deactivate the interrupts
    • changing page tables
    • accessing virtualized peripherals
  • VMM uses events to trigger interrupts in the VM

Intel VT-x

  • Hardware extension to transform x86 to the point it can be virtualized "classically"
  • New execution mode: non-root mode
  • Each non-root mode instance uses a Virtual Machine Control Structure (VMCS) to store its state
  • VMM runs in root mode
  • VM-entry and VM-exit are used to transition between the two modes

Virtual Machine Control Structure

  • Guest information: state of the virtual CPU
  • Host information: state of the physical CPU
  • Saved information:
    • visible state: segment registers, CR3, IDTR, etc.
    • internal state
  • VMCS can not be accessed directly but certain information can be accessed with special instructions

VM execution control fields

  • Selects conditions which triggers a VM exit; examples:
    • If an external interrupt is generated
    • If an external interrupt is generated and EFLAGS.IF is set
    • If CR0-CR4 registers are modified
  • Exception bitmap - selects which exceptions will generate a VM exit
  • IO bitmap - selects which I/O addresses (IN/OUT accesses) generates a VM exit
  • MSR bitmaps - selects which RDMSR or WRMSR instructions will generate a VM exit

VM entry & exit

  • VM entry - new instructions that switches the CPU in non-root mode and loads the VM state from a VMCS; host state is saved in VMCS
  • Allows injecting interrupts and exceptions in the guest
  • VM exit will be automatically triggered based on the VMCS configuration
  • When VM exit occurs host state is loaded from VMCS, guest state is saved in VMCS

Extend Page Tables

  • Reduces the complexity of MMU virtualization and improves performance
  • Access to CR3, INVLPG and page faults do not require VM exit anymore
  • The EPT page table is controlled by the VMM
../_images/ditaa-cc9a2e995be74ee99646ea4bf0e551d766fa92ef.png

VPID

  • VM entry and VM exit forces a TLB flush - loses VMM / VM translations
  • To avoid this issue a VPID (Virtual Processor ID) tag is associated with each VM (VPID 0 is reserved for the VMM)
  • All TLB entries are tagged
  • At VM entry and exit just the entries associated with the tags are flushed
  • When searching the TLB just the current VPID is used

Intel VT-d

  • Direct access to hardware from a VM - in a controlled was
  • The physical device must support multiplexing (e.g. SR-IOV)
    • I/O assignments
    • IRQ routing
  • VT-d protects and translates VM physical addresses using an I/O MMU (DMA remaping)

DMA remapping

../_images/dma-remapping1.png

qemu

  • Uses binary translation via Tiny Code Generator (TCG) for efficient emulation
  • Supports different target and host architectures (e.g. running ARM VMs on x86)
  • Both process and full system level emulation
  • MMU emulation
  • I/O emulation
  • Can be used with KVM for accelerated virtualization

KVM

  • VMM implemented inside the Linux kernel
  • Requires hardware virtualization (e.g. Intel VT-x)
  • Shadow page tables or EPT if present
  • Uses qemu or virtio for I/O virtualization
../_images/ditaa-f8fcc760ef5dad50d1038ed3426d0fcce12fd3e6.png

Xen

../_images/xen-overview1.png