COMP3370/COMP5810 Lecture 4: Virtual Machines & Containers

Virtual Machines & Containers
Module Map

The module covers practical aspects, foundational algorithms, technology overview, cloud computing and software development, computing and society, and building a cloud application.

It builds on network technology, network protocols, operating systems, virtualization, Docker, distributed systems, challenges of distribution, distributed algorithms, distributed data structures, distributed file systems, parallelization, and Map Reduce.

Learning Objectives
  • Understand how containers and virtual machines ease software deployment.

  • Describe differences between containers and virtual machines.

  • Given a deployment scenario, you should be able to choose between a container and a virtual machine. Consider factors like resource usage, isolation needs, and deployment speed.

  • Understand the difference between a container engine, a container image, a container orchestrator, and a container. Explore the roles of each in the containerization ecosystem.

  • Be able to create and run a Docker container from the command line. Learn basic Docker commands and Dockerfile syntax.

Why Virtual Machines?

The common problem: "works on my machine". This highlights the challenges in ensuring software runs consistently across different environments. It emphasizes the need for solutions that encapsulate dependencies.

Software Deployment Pains

A program's functionality depends on its environment, which includes:

  • A specific operating system (e.g., Windows binaries cannot be executed on Linux). Understand OS-specific APIs and system calls.

  • Environment variables, executable programs, software libraries, data files. Manage configuration drift and ensure consistency across environments.

It's easy to forget the environmental components a program depends on. For example, determining required software libraries involves trial and error on a fresh Ubuntu installation. This process is time-consuming, and different Ubuntu versions may have varying pre-installed libraries. Tools like ldd can help identify shared library dependencies.

Idea: Instead of using another laptop, work on a simulated computer and then ship that computer bundled with a simulator. This approach ensures consistent execution regardless of the host environment.

Security Concerns

Operating systems are not inherently designed to isolate resources for multiple users. Resource isolation has been a longstanding challenge in OS design.

VMs and containers offer isolation in:

  • Computing resources (e.g., processor, storage). Implement resource quotas and limits to prevent resource exhaustion.

  • Network (isolate traffic of different customers). Use network namespaces and firewalls to segregate network traffic.

Difference Between Containers & VMs

Containers virtualize the operating system, while VMs virtualize hardware. This fundamental difference impacts performance, resource usage, and isolation levels.

How Containers Work

A container engine runs as a normal application on top of an operating system. Docker, containerd, and CRI-O are popular container engines.

Each container is a normal operating system process, thus enjoying memory isolation. This process can spawn children. Explore process hierarchies and resource management within containers.

Further isolation is obtained by using:

  • Namespaces (pid, net, mnt). Understand the different types of namespaces and their impact on isolation.

  • Control groups. Limit resource usage (CPU, memory, I/O) for containers.

  • Union file systems: a bit like version control for file systems. Explore how layered file systems enable efficient image creation and sharing.

How VMs Work

A virtual machine manager (VMM), aka hypervisor, runs on top of the underlying system (OS and/or hardware). The virtual machines running on top each think they run on their own hardware.

Thus, each virtual machine needs its own operating system. Different VMs may have different operating systems. Consider the overhead of running multiple OS instances.

Simulating hardware in software is slow. The underlying OS and HW may have virtualization support: The fake HW is the real HW, plus some protection and exception handlers. Intel VT-x and AMD-V are examples of hardware virtualization extensions.

Three Characteristics of VMMs
  • Fidelity: Software on the VMM executes identically to its execution on hardware, barring timing effects. Minor variations may occur due to virtualization overhead.

  • Performance: An overwhelming majority of guest instructions are executed by the hardware without the intervention of the VMM. Minimize VMM intervention for optimal performance.

  • Safety: The VMM manages all hardware resource. Ensure proper resource allocation and isolation.

Two Types of VMMs
  • Native (bare metal): performance. Examples: Xen, KVM, VMWare ESX, MS Hyper-V. These hypervisors run directly on hardware.

  • Hosted: more practical/less invasive. Examples: VMWare workstation, VirtualBox. These hypervisors run on top of an existing OS.

Comparison: Containers vs. VMs

Containers

  • A container image takes 10s of MBs. Smaller image sizes enable faster distribution and storage efficiency.

  • Fast. Containers start and stop quickly due to lightweight virtualization.

  • The underlying OS can be anything. Container engines abstract the underlying OS.

  • The simulated OS is usually (some variant of) Linux, although this is not a theoretical limitation. Windows containers are also available.

  • Typical example: Docker

VMs

  • A VM image takes 10s of GBs. Larger image sizes consume more storage and increase distribution time.

  • Slow (especially to boot). VMs require booting a full OS, leading to slower startup times.

  • The underlying OS can be anything. Hypervisors support various guest OSes.

  • Since the VM simulates hardware, one can install any actual OS on top of it. Flexibility to run different operating systems on each VM.

  • Typical example: VirtualBox

Docker Concepts
  • Container: a running group of processes, isolated from other processes on the computer; in particular, they have their own file system. Containers provide process and filesystem isolation.

  • Image: a filesystem from which a container can be started. Images are read-only templates used to create containers.

  • Dockerfile: a list of instructions on how to create an image, by applying changes to an existing image. Dockerfiles define the steps to assemble an image.

  • Engine: a service for managing images and containers (e.g., kill a running container, create a new image, etc.). Docker Engine is the core component for container management.

  • Orchestrator (e.g., Kubernetes, Docker Swarm): a tool for deploying containers. Orchestrators automate container deployment, scaling, and management.

Docker Command Line Utilities
  • Run a container- Foreground: docker container run -ti someapp:1.0

    • Background: docker container run --detach --name X someapp:1.0

  • Stop a container- docker container rm --force X

  • Containerize an application- `docker image build -t someapp:1.0 .