Search Articles

General #

Virtualization abstracts hardware into different execution environments
Completely isolated from the host. Good: Protects against viruses. Bad: can’t share resources well
The Open Virtual Machine Format (OVF) defines a format for a VM that will run on any VMM that supports the protocol. This is like OCI images
In Type 2 hypervisors, the “disk” each guest manages is really just a file on the host OS
- To copy the guest we can just copy that file

Layers #

The host is the host machine running the virtual machine
The VMM (Virtual Machine Manager) (a.k.a Hypervisor) creates and runs virtual machines by providing an interface that is identical to the host
- Each guest process is provided with a virtual copy of the host (usually each guest is an OS)
- Thus, the VMM imitates the host, so when the guests think they are talking to the host, they are actually talking to the VMM (it sits above the host hardware)

Hardware based (found in mainframes) -> called type 0 hypervisors
Software based (e.g. VMWare ESX) -> called type 1 hypervisors
Applications that run on normal OS’s but provide vmm functionality -> type 2
Allow snapshots to be made of the guest
Some VMMs provide a live migration feature that allows for migrating a VM from one host to another without interruption
- Good for balancing load dynamically, repairing hardware, etc.

Most cloud infrastructure is built with VMs on bare metal
- Customers can then deploy containers onto these VMs

Virtual CPUs (VCPU) does not execute code, but represents the state of the guest “CPU” (even though this doesn’t actually exist)
- This is done so when a VM process is context switched onto a real CPU, the VCPU can load its context information, etc. (essentially performing the work of a process control block (PCB))
Since VMMs run in user-space, they cannot execute kernel level code. However, they emulate a user-space and kernel-space environment (both running in user space). When the kernel-space environment tries to make a syscall, it is trapped to the VMM. The VMM then executes then actually executes this syscall on the host on behalf of the guest and then returns control to the guest -> trap and emulate
- Because of this, priveleged instructions are slower (as opposed to non-privileged instructions which run natively on hardware)
Binary Translation - up until 1998 x86 CPUs had no clear differentiator between privileged and unprivileged instructions. Thus, if the VCPU were in kernel mode, it would watch for privileged instructions, and then translate them into binary instructions that were fully privileged to run on the native hardware
- VMWare sped this up greatly with caches
- VT-x instructions (virutalization support) were added in 2005 to x86 hardware -> all major CPUs now provide virutalization support
With hardware support, you can easily build OS frameworks for thing hypervisors (e.g. MacOS hypervisor.framwork)

At the time of VM creation, the hypervisor assigns the VM parameters (number of CPUs, amount of memory, networking details, storage details, etc.)
When a VM is deleted any disk space is freed up and the VM configuration is removed
To share things like I/O devices between VMs there is a control partition which the VMM then routes requests to.

Standard type of hypervisor in datacenters - OS’es that run natively on hardware and manage the creation of other VMs (e.g. VMWare ESX)
- Guests don’t know they are running on anything but native hardware
Can pack more OS’es on one machine and get better utilization
Type 2 Hypervisors run on standard operating systems (i.e. run on Mac, Linux, etc.) and thus provide fewer virtualization features but are easier to use, test out, and get started with

e.g. JVM
- Programming languages have to run in this virtual env
- The JVM provides APIs that interact with the hardware, and Java code interacts with the JVM provided APIs. The JVM is compiled onto the target hardware, and Java programs run inside of it

The VMM can provide mulitple vCPUs to a guest, and then schedules those vCPUs on actual cpus
- Guests receive only a portion of the CPU cycles, even though they believe they are receiving all of it

Guests are often configured with more memory than the system has. The VMM must present a fixed size of memory to the guest (as those OS’es expect fixed memory) then figure out how much real memory to allocate to each guest
- Each guest believes they are maintaining their own page table, but really the VMM maintains a nested page table that maps the guest page table to the host page table
If guests share the exact same pages (2 guests running the same OS pages, one page can just point to another)

The VMM provides a device driver to each guest which then maps requests to host IO, etc.
- Sometimes a guest will bypass the driver and be given exclusive access to an IO device

VMMs can implement, but normal OS’es cannot
Guest on one system is copied/moved to another machine without any downtime (network connections continue, etc.)

Process:

Source VMM contacts target VMM and establishes it’s allowed to migrate the guest
Target VMM creates a new guest with vCPUs and nested page table, etc.
Source sends all read-only memory pages to target
Source sends clean read-write pages to target
Pages modified during 4 are re-sent
When 4-5 cycle is short, VMM freezes guest, sends final VCPU state, final dirty pages, and tells target to start running the guest
Once target acknowledges guest is running, source shuts down

This requires a network that understands a MAC address can move in an existing connection
Live migration can be used to automatically balance load between VMMs, etc.