A Case for Docker-in-Docker on Kubernetes (Part I)

If you are reading this blog, you are at least curious about running Docker commands from within a Docker container.

Jérôme Petazzoni’s excellent blog article describes the pitfalls of using Docker-in-Docker (DinD) for Continuous Integration (CI) and testing. He advocates bind mounting the host Docker daemon socket inside the Docker container that needs to issue Docker commands. This approach has been called Docker-outside-of-Docker (DooD).

I am going to make a case for why using DinD is a better alternative to DooD when running Docker from a Kubernetes Pod and describe how we do it at Applatix.

But first, why would you want to run a Docker container from within a Docker container? One reason is that your Continuous Integration (CI) app (e.g. Jenkins) may be containerized and you want to provide a build/test container for each CI job you want Jenkins to run.

Another reason is that you may want to build a Docker container image from inside your containerized CI job. Docker is a very useful tool for running other tools, so it is quite natural to invoke Docker as just another tool to get things done.

Let’s see what happens when you bind mount and use the host Docker daemon socket from inside a container. Note that this will simply cause all Docker commands issued from the container to be passed to the host Docker daemon.

We first create a container named “run_docker_cmds” that has the host Docker socket mounted in the container.

$> docker run -it --name run_docker_cmds -v /var/run/:/var/run docker:latest sh
$> docker ps
CONTAINER ID   IMAGE          COMMAND                  CREATED         STATUS         PORTS        NAMES
001ec7961f3f   docker:latest  "docker-entrypoint.sh"   12 hours ago    Up 3 seconds                run_docker_cmds

Next, we run a Docker command from inside this container that starts a Docker container named “sleep_60”. Running “docker ps” on the host shows that this new container shows up as just another container on the host.

# from inside run_docker_cmds container:
# docker run --rm --name sleep_60 ubuntu:latest sleep 60
$> docker ps
CONTAINER ID    IMAGE          COMMAND                CREATED        STATUS          PORTS      NAMES
e8348bbb0914    ubuntu:latest  "sleep 60"             12 hours ago   Up 2 seconds               sleep_60
001ec7961f3f    docker:latest  "docker-entrypoint.sh" 12 hours ago   Up 48 seconds              run_docker_cmds

That was easy! Note, however, that this container is a “sibling” of the container from which we started this container. If you try to create this container with the same name as the parent, it will fail due to a name conflict.

This highlights a general problem of using the host’s Docker daemon from inside a container: poor isolation resulting In containers interfering with each other.

$> docker run --rm --name run_docker_cmds ubuntu:latest sleep 60
docker: Error response from daemon: Conflict. The name "/run_docker_cmds" is already in use by container 
001ec7961f3fd25f6c3e4ad23674993d78ff8d085c94d83b7daf634b17ec0216. You have to remove (or rename) that container 
to be able to reuse that name..

At Applatix, we use Kubernetes as our platform for orchestrating containers in the public cloud. Our users specify jobs as workflows where each step of the workflow is a Docker container (see Figure 1 below). To make it easy for our users, we allow them to use Docker commands from their containers.Is this really a problem in practice? Can’t we just pick names that don’t conflict? It turns out that if you want a general platform for running jobs, it requires a lot of work to get the authors of these jobs to coordinate and avoid name collisions.

Figure1: A workflow running on Applatix

This allows users to pull, build or run Docker images and to use their existing container-based scripts as steps in the workflow. Each step in the workflow is then converted to a Kubernetes Pod, where a Kubernetes Pod can have one or more containers. Users can even run Docker compose applications as steps in our workflow.

To continue our discussion, let’s say a user wants to run a named Docker container that runs apache from within a step of her workflow.

$> docker run --rm --name my_webserver httpd:latest

If the user is running multiple instances of this workflow and this step happens to overlap in time with the same step from another workflow, then one of the workflows would fail to create this container due to a name conflict.

There are many other problems that can be caused by poor isolation. For example, by allowing users to issue commands directly to the host Docker daemon, they can create Docker containers that are privileged or bind mount host paths, potentially creating havoc on the host.

To summarize, exposing the host’s Docker daemon allows user containers to create “sibling” containers on the host. Since these containers acquire named resources from the host, these resources need to be managed to avoid conflicts.

Any CI/CD system that allows users to create arbitrary Docker containers, will quickly run into manageability, security and stability problems caused by poor isolation.

Each approach has its advantages and disadvantages, but when it comes to running Docker containers from Kubernetes Pods, we think the DinD approach is better. To find out why, stay tuned for the next post.

(UPDATE:  See part 2 of this post here )

Get started with Applatix

“Containers and Kubernetes 201” – tutorials and training

Learn More

Open Source Products for containers and cloud

Get Tools

Container challenges?  Try our technical team

Contact Us