Skip to main content

Mount Namespaces

When a container process is started, the process is attached to it's unique namespace. After that, the directories on the disk are mounted to this new mount namespace

rootfs​

Every image has it's rootfs packaged inside it. This gives the first level of isolation from the host system and also controls the size of the image. If the application running doesn't need a lot of operating system libraries, the rootfs can be kept as small as possible.

It basically means, the process in this namespace sees it's own /lib, /bin, /sbin, /usr, /home, folders.

only own rootfs

Containers have only their own rootfs but the kernel is used form the host. So any of the system calls will go via the host kernel.

Isolation​

  1. Docker binary bind mounts all required virtual filesystems from host - /dev, /proc, /sys, etc. These are just mostly read only dynamic kernel information. When executed, the kernel returns whatever information the process's namespace can see.
  2. rootfs is part of the image.
  3. Old root namespace from the host is unmounted from the container's mount namespace.

Mount Namespace Setup​

This is the most important and interesting one. Understanding this makes it easy to understand how Kubernetes pods work.

The 'OverlayFS' what Docker uses is something that's prepared on the host system. The Docker binary does it when we execute the docker run command.

interesting hack
  1. when docker starts the runc process, it first creates a new mount namespace as a copy of the host's mount namespace.
  2. It means, the entire host file system is visible inside this new mount namespace.
  3. Then it will change the rootfs to point to the OverlayFS mount point.
  4. This step will move the old root mount point to another location.
  5. This backup location is used to bind the volumes requested via -v option.
  6. Finally it will un-mount the backup location to hide the host file system.

NOTE - But the bind mounted volume remains visible even thought it was mounted from the backup location. This is because the kernel still knows the actual inode and dentry of the underlying file or directory.