Assignment 7 - SO2 Virtual Machine Manager with KVM¶
- Deadline: Tuesday, 29 May 2023, 23:00
- This assignment can be made in teams (max 2). Only one of them must submit the assignment, and the names of the student should be listed in a README file.
In this assignment we will work on a simple Virtual Machine Manager (VMM). We will be using the KVM API from the Linux kernel.
The assignment has two components: the VM code and the VMM code. We will be using a very simple protocol to enable the communication between the two components. The protocol is called SIMVIRTIO.
I. Virtual Machine Manager¶
In general, to build a VMM from scratch we will have to implement three main functionalities: initialize the VMM, initialize the virtual CPU and run the guest code. We will split the implementation of the VMM in these three phases.
1. Initialize the VMM¶
A VM will be represented in general by three elements, a file descriptor used to interact with the KVM API, a file descriptor per VM used to configure it (e.g. set its memory) and a pointer to the VM's memory. We provide you with the following structure to start from when working with a VM.
typedef struct vm {
int sys_fd;
int fd;
char *mem;
} virtual_machine;
The first step in initializing the KVM VM is to interract with the [KVM_API](https://www.kernel.org/doc/html/latest/virt/kvm/api.html]. The KVM API is exposed via /dev/kvm
. We will be using ioctl calls to call the API.
The snippet below shows how one can call KVM_GET_API_VERSION
to get the KVM API Version
int kvm_fd = open("/dev/kvm", O_RDWR);
if (kvm_fd < 0) {
perror("open /dev/kvm");
exit(1);
}
int api_ver = ioctl(kvm_fd, KVM_GET_API_VERSION, 0);
if (api_ver < 0) {
perror("KVM_GET_API_VERSION");
exit(1);
}
Let us now go briefly through how a VMM initializes a VM. This is only the bare bones, a VMM may do lots of other things during VM initialization.
- We first use KVM_GET_API_VERSION to check that we are running the expected version of KVM,
KVM_API_VERSION
. - We now create the VM using
KVM_CREATE_VM
. Note that callingKVM_CREATE_VM
returns a file descriptor. We will be using this file descriptor for the next phases of the setup. - (Optional) On Intel based CPUs we will have to call
KVM_SET_TSS_ADDR
with address0xfffbd000
- Next, we allocate the memory for the VM, we will be using
mmap
for this withPROT_WRITE
,MAP_PRIVATE
,MAP_ANONYMOUS
andMAP_NORESERVE
. We recommend allocating 0x100000 bytes for the VM. - We flag the memory as
MADV_MERGEABLE
usingmadvise
- Finally, we use
KVM_SET_USER_MEMORY_REGION
to assign the memory to the VM.
Make sure you understand what file descriptor to use and when, we use the KVM fd when calling KVM_CREATE_VM, but when interacting with the vm such as calling KVM_SET_USER_MEMORY_REGION we use the VMs file descriptor
TLDR: API used for VM initialization:
- KVM_GET_API_VERSION
- KVM_CREATE_VM
- KVM_SET_TSS_ADDR
- KVM_SET_USER_MEMORY_REGION.
2. Initialize a virtual CPU¶
We need a Virtual CPU (VCPU) to store registers.
typedef struct vcpu {
int fd;
struct kvm_run *kvm_run;
} virtual_cpu;
To create a virtual CPU we will do the following:
1. Call KVM_CREATE_VCPU
to create the virtual CPU. This call returns a file descriptor.
2. Use KVM_GET_VCPU_MMAP_SIZE
to get the size of the shared memory
3. Allocated the necessary VCPU mem size with mmap
. We will be passing the VCPU file descriptor to the mmap
call. We can store the result in kvm_run
.
TLDR: API used for VM
- KVM_CREATE_VCPU
- KVM_GET_VCPU_MMAP_SIZE
We recommend using 2MB pages to simplify the translation process
Running the VM¶
Setup real mode¶
At first, the CPU will start in Protected mode. To do run any meaningful code, we will switch the CPU to [Real mode](https://wiki.osdev.org/Real_Mode). To do this we will need to configure several CPU registers.
- First, we will use
KVM_GET_SREGS
to get the registers. We usestruct kvm_regs
for this task. - We will need to set
cs.selector
andcs.base
to 0. We will useKVM_SET_SREGS
to set the registers. - Next we will clear all
FLAGS
bits via therflags
register, this means settingrflags
to 2 since bit 1 must always be to 1. We alo set theRIP
register to 0.
Setup long mode¶
Read mode is all right for very simple guests, such as the one found in the folder guest_16_bits. But, most programs nowdays need 64 bits addresses, and such we will need to switch to long mode. The following article from OSDev presents all the necessary information about [Setting Up Long Mode](https://wiki.osdev.org/Setting_Up_Long_Mode).
In vcpu.h
, you may found helpful macros such as CR0_PE, CR0_MP, CR0_ET, etc.
Since we will running a more complex program, we will also create a small stack for our program
regs.rsp = 1 << 20;
. Don't forget to set the RIP and RFLAGS registers.
Running¶
After we setup our VCPU in real or long mode we can finally start running code on the VM.
- We copy to the vm memory the guest code, memcpy(vm->mem, guest_code, guest_code_size) The guest code will be available in two variables which will be discussed below.
- In a infinite loop we run the following:
- We call
KVM_RUN
on the VCPU file descriptor to run the VPCU- Through the shared memory of the VCPU we check the
exit_reason
parameter to see if the guest has made any requests:- We will handle the following VMEXITs: KVM_EXIT_MMIO, KVM_EXIT_IO and
KVM_EXIT_HLT
.KVM_EXIT_MMIO
is triggered when the VM writes to a MMIO address.KVM_EXIT_IO
is called when the VM callsinb
oroutb
.KVM_EXIT_HLT
is called when the user does ahlt
instruction.
Guest code¶
The VM that is running is also called guest. We will be using the guest to test our implementation.
- To test the implementation before implementing SIMVIRTIO. The guest will write at address 400 and the RAX register the value 42.
- To test a more complicated implementation,we will extend the previous program to also write "Hello, world!n" on port 0xE9 using the outb instruction.
- To test the implementation of SIMVIRTIO, we will
How do we get the guest code? The guest code is available at the following static pointers guest16, guest16_end-guest16. The linker script is populating them.
## SIMVIRTIO:
From the communication between the guest and the VMM we will implement a very simple protocol called SIMVIRTIO
. It's a simplified version of the real protocol used in the real world called virtio.
Configuration space:
u32 | u16 | u8 | u8 | u8 | u8 | u8 |
---|---|---|---|---|---|---|
magic value R | max queue len R | device status R | driver status R/W | queue selector R/W | Q0(TX) CTL R/W | Q1(RX) CTL R/w |
Controller queues¶
We provide you with the following structures and methods for the SIMVIRTIO
implementation.
typedef uint8_t q_elem_t;
typedef struct queue_control {
// Ptr to current available head/producer index in 'buffer'.
unsigned head;
// Ptr to last index in 'buffer' used by consumer.
unsigned tail;
} queue_control_t;
typedef struct simqueue {
// MMIO queue control.
volatile queue_control_t *q_ctrl;
// Size of the queue buffer/data.
unsigned maxlen;
// Queue data buffer.
q_elem_t *buffer;
} simqueue_t;
int circ_bbuf_push(simqueue_t *q, q_elem_t data)
{
}
int circ_bbuf_pop(simqueue_t *q, q_elem_t *data)
{
}
Device structures¶
#define MAGIC_VALUE 0x74726976
#define DEVICE_RESET 0x0
#define DEVICE_CONFIG 0x2
#define DEVICE_READY 0x4
#define DRIVER_ACK 0x0
#define DRIVER 0x2
#define DRIVER_OK 0x4
#define DRIVER_RESET 0x8000
typedef struct device {
uint32_t magic;
uint8_t device_status;
uint8_t driver_status;
uint8_t max_queue_len;
} device_t;
typedef struct device_table {
uint16_t count;
uint64_t device_addresses[10];
} device_table_t;
We will be implementing the following handles: * MMIO (read/write) VMEXIT * PIO (read/write) VMEXIT
Using the skeleton¶
Debugging¶
Tasks¶
- 30p Implement a simple VMM that runs the code from guest_16_bits. We will be running the VCPU in read mode for this task
- 20p Extend the previous implementation to run the VCPU in real mode. We will be running the guest_32_bits example
- 30p Implement the SIMVIRTIO protocol.
- 10p Implement pooling as opposed to VMEXIT. We will use the macro USE_POOLING to switch this option on and off.
- 10p Add profiling code. Measure the number of VMEXITs triggered by the VMM.
Submitting the assigment¶
The assignment archive will be submitted on Moodle, according to the rules on the rules page.
Tips¶
To increase your chances of getting the highest grade, read and follow the Linux kernel coding style described in the Coding Style document.
Also, use the following static analysis tools to verify the code:
checkpatch.pl
$ linux/scripts/checkpatch.pl --no-tree --terse -f /path/to/your/file.c
sparse
$ sudo apt-get install sparse $ cd linux $ make C=2 /path/to/your/file.c
cppcheck
$ sudo apt-get install cppcheck $ cppcheck /path/to/your/file.c
Penalties¶
Information about assigments penalties can be found on the General Directions page.
In exceptional cases (the assigment passes the tests by not complying with the requirements) and if the assigment does not pass all the tests, the grade will may decrease more than mentioned above.
## References We recommend you the following readings before starting to work on the homework: * [KVM host in a few lines of code](https://zserge.com/posts/kvm/)
TLDR¶
- The VMM creates and initializes a virtual machine and a virtual CPU
- We switch to real mode and check run the simple guest code from guest_16_bits
- We switch to long mode and run the more complex guest from guest_32_bits
- We implement the SIMVIRTIO protocol. We will describe how it behaves in the following subtasks.
- The guest writes in the TX queue (queue 0) the ascii code for R which will result in a VMEXIT
6. the VMM will handle the VMEXIT caused by the previous write in the queue. When the guests receiver the R letter it will initiate the reser procedure of the device and set the device status to DEVICE_RESET 7. After the reset handling, the guest must set the status of the device to DRIVER_ACK. After this, the guest will write to the TX queue the letter C 8. In the VMM we will initialize the config process when letter C is received.It will set the device status to DEVICE_CONFIG and add a new entry in the device_table 9. After the configuration process is finished, the guest will set the driver status to DRIVER_OK 10. Nex, the VMM will set the device status to DEVICE_READY 11. The guest will write in the TX queue "Ana are mere" and will execute a halt 12. The VMM will print to the STDOUT the message received and execute the halt request 13. Finally, the VMM will verify that at address 0x400 and in register RAX is stored the value 42