Poor man's Linux containers

Submitted by olaf on 2015-10-29

Lately I was looking into LXC and Docker and how containers work in general. I didn’t like installing yet another package worth several megabytes and learning how this particular kind of container technology works. So I thought, I could just as well learn the basic principles, containers are based upon.

The two technologies involved are control groups and namespaces. Control groups are used to limit resources consumed by processes, like CPU or memory. On the other side, namespaces are used to isolate processes from each other, e.g. give each process (or set of processes) its own mount hierarchy or PID or network namespace. Both technologies can be used independently of each other or together, if you need or wish to. Here, I am looking into namespaces first, because right now I am interested in that part only.

Starting

To create separate namespaces, you can use either clone(2) for new processes or unshare(2) for existing ones. Fortunately, you don’t need to write your own code, but can use unshare(1) from util-linux to play with and see how things work. Util-linux also has nsenter(1), which allows to join existing namespaces.

To create a new environment with its own filesystem --mount and list of processes --pid, run

unshare --mount --pid --fork

as root.

For pid namespaces, the first child process will become the init process. Usually, the shell or command given to unshare will be execed. Option --fork tells unshare to do a fork(2) before, so the command given becomes the init process instead of the first process created by this command.

Preparing

You must first prepare everything and then start whatever program or daemon(s) you want inside the container. These are the preparation steps

  • switch to the new root directory
  • mount a minimal set of filesystems, e.g. /proc, /dev, /dev/pts
  • clean up the leftover mountpoints

The first step is to make sure, that the new root is mounted somewhere. If it is not a mount point, but a directory, you can use a bind mount, e.g.

mount --bind /path/to/image/root /mnt/container0

Then create a directory, where the “old”, now current, root shall reside

mkdir -p /mnt/container0/mnt/old-root

and switch over to the container

cd /mnt/container0
pivot_root . ./mnt/old-root
cd /

Now setup /proc, etc.

mount -t proc none /proc
mount -t devtmpfs -o size=50k,nr_inodes=2k none /dev
mount -t devpts -o newinstance,gid=5,mode=620,ptmxmode=666 none /dev/pts

Cleanup old root

awk '$2 ~ /old-root/ {print $2;}' /proc/mounts | sort -r | xargs -n 1 umount -l
rmdir /mnt/old-root

This unmounts the old root tree recursively. If you have a newer umount(8), umount -l -R /mnt/old-root is also an option.

If you have mount errors during the setup and /etc/mtab is not a link to /proc/mounts, but a regular file, add option -n to the relevant mount or umount commands.

Running

And finally, you can start your daemon like Apache or Mysql or your init process like runit, s6 or even systemd. You can also just run a shell to explore this further, e.g.

exec bash -i

Joining

When everything in your brand new container is running in the background and you want to make sure everything is okay or you’re just curious, use nsenter to join the container. First discover the process id of some process in your container, e.g. Apache or Mysql or your init script and then

nsenter --target 1234 --mount --pid

Now look around, tweak it and play with it.

There’s always more, of course. You can also create a namespace for your networking, especially if you’re running some networking daemon, try to restrict internet access for some dangerous software or just want to build a network of containers. Control groups is yet another topic to be explored and combined to ensure a container isn’t taking down the whole machine.

Post a comment

All comments are held for moderation; Markdown and basic HTML formatting accepted. If you want to stay anonymous, leave name, e-mail and website empty.