Docker - Peeling back the layers

A few weeks ago, I took my first real look at Docker, and learned just enough to write a bad analogy. Now I'm taking a closer look at the anatomy of a docker image, which I've always heard consists of layers, but I've never quite understood what that meant. Time to find out!

Docker - Peeling back the layers

After a few weeks, I'm back on the Docker train. Or is it the Docker whale? Who trains whales to haul cargo anyway? And why don't we see more protests about this? Where's PETA?? So many important questions, but at least the whale's smiling.

A few weeks ago, I took my first real look at Docker, and learned just enough to write a bad analogy. Now I'm taking a closer look at the anatomy of a docker image, which I've always heard consists of layers, but I've never quite understood what that meant.


Old Skool VMs 🤘

My typical experience with using Linux is to download an ISO image, like Ubuntu or Alpine, and spin up a vm in VirtualBox. Then I'll install whatever I need on top of that, like RabbitMQ.

Checkin' out RabbitMQ on a fresh-off-the-press Ubuntu vm

It works, but what if you want to:

  • rollback to before the last update? the last 10 updates?
  • make sure an entire team (or a new hire) could recreate the same exact setup?
  • deploy several slightly different versions for different environments?

You could take a snapshot of the entire VM after every significant update, but that would waste a ton of space and be stupid-kludgy to manage.


Back to Docker 🐳

Instead of creating a VM and saving that behemoth, you can create a script that defines how your VM should be built, and then commit changes to that script to a repo like any other piece of code you've written.

If you wanted the latest stable version of Ubuntu from dockerhub, where all kinds of "images" are stored and made publicly available, just do a docker pull for the ubuntu image you're interested in. The individual hashes you see are changes to the initial image. It has to pull all those changes, or layers, to get you the complete image.

$ docker pull ubuntu:19.10

19.10: Pulling from library/ubuntu
4fc5deeb8d45: Pull complete
d70e07bddb7c: Pull complete
a4bf564c51f3: Pull complete
111c7f4c8fb3: Pull complete
Digest: sha256:98051557b93f45de6ab02001287be81a693df09fe71a1d9fb45056af2671e17d
Status: Downloaded newer image for ubuntu:19.10
docker.io/library/ubuntu:19.10

Then you could do a docker run against that image with whatever command you wanted - even something as silly as just printing the date. It creates a container, runs the command in it, then outputs the result. Nifty. And super useful. 😏

$ docker run ubuntu:19.10 date
Wed Nov 27 19:09:16 UTC 2019

BTW, if you want to see the current images you've downloaded, or the containers you've created from those images, here's a couple handy commands to keep in mind.

$ docker images
REPOSITORY          TAG           IMAGE ID            CREATED             SIZE
ubuntu              19.10         c351ab52170e        16 hours ago        72.9MB
ruby                2.6           d98e4013532b        5 weeks ago         840MB
hello-world         latest        fce289e99eb9        11 months ago       1.84kB

$ docker ps -a
CONTAINER ID   IMAGE          COMMAND      CREATED         STATUS
d39fc40ff8aa   ubuntu:19.10   "date"       20 seconds ago  Exited (0) 19 seconds ago
2421438b1d06   ubuntu:19.10   "/bin/bash"  8 minutes ago   Exited (0) 8 minutes ago
40bd23d0b84a   hello-world    "/hello"     4 days ago      Exited (0) 4 days ago

RabbitMQ

That's all well and good if all you want is a base Ubuntu image, but what if you want something running on top of that like RabbitMQ? As luck would have it, there are plenty of images for you here, depending on what version of Rabbit you'd like and even what version of Linux you'd like to run it on top of!

For example, pulling down 3.8.1-alpine currently gets you this image, which starts by doing a FROM alpine:3.10 which means it's based off the Alpine image. I wouldn't recommend that though, since there's also a comment in the Dockerfile that says "Alpine Linux is not officially supported by the RabbitMQ team -- use at your own risk!"

Instead, you could pull down 3.8.1 which starts FROM ubuntu:18.04 instead. Then it runs other commands to setup OpenSSL, OTP (Erlang), and finally RabbitMQ. If you decide you need to administer Rabbit, the 3.8.1-management image builds on top of 3.8.1, so it does all the above and then enables and sets up the rabbitmq management plugin too!

You can get a better sense of what's happening if you pull each layer of the final image, instead of just pulling the final image by itself. Here I pulled Ubuntu, then the RabbitMQ image that depends on it, and finally the management image that builds on that. Notice how each subsequent image reports that some of the required layers are already available, and they aren't pulled down again.

$ docker pull ubuntu:18.04

18.04: Pulling from library/ubuntu
7ddbc47eeb70: Pull complete
c1bbdc448b72: Pull complete
8c3b70e39044: Pull complete
45d437916d57: Pull complete
Digest: sha256:6e9f67fa63b0323e9a1e587fd71c561ba48a034504fb804fd26fd8800039835d
Status: Downloaded newer image for ubuntu:18.04
docker.io/library/ubuntu:18.04

$ docker pull rabbitmq:3.8.1

3.8.1: Pulling from library/rabbitmq
7ddbc47eeb70: Already exists
c1bbdc448b72: Already exists
8c3b70e39044: Already exists
45d437916d57: Already exists
916459a32f87: Pull complete
aba97e76a6d7: Pull complete
6cfc7646d503: Pull complete
5e8c71984192: Pull complete
16722d38aada: Pull complete
b3a7c7a8fb05: Pull complete
Digest: sha256:1c000709124c4c3e3da1657ef81de7a300e92cee249163831df141ddb4145762
Status: Downloaded newer image for rabbitmq:3.8.1
docker.io/library/rabbitmq:3.8.1

$ docker pull rabbitmq:3.8.1-management

3.8.1-management: Pulling from library/rabbitmq
7ddbc47eeb70: Already exists
c1bbdc448b72: Already exists
8c3b70e39044: Already exists
45d437916d57: Already exists
916459a32f87: Already exists
aba97e76a6d7: Already exists
6cfc7646d503: Already exists
5e8c71984192: Already exists
16722d38aada: Already exists
b3a7c7a8fb05: Already exists
6b5a827c0c9c: Pull complete
32cfea652b55: Pull complete
Digest: sha256:5e144493152208e189763a61b81db5600a53531826c48ceffecf8a2f7efbac19
Status: Downloaded newer image for rabbitmq:3.8.1-management
docker.io/library/rabbitmq:3.8.1-management

Is it becoming obvious how convenient this system is? Instead of one giant steaming stew of a machine that "just works" but is impossible to exactly replicate, you're forced to slow down and document the exact steps to reproduce a given machine. Complex images are really layers over previous images, which could themselves be layers over even more previouser (??) images.

Scripting

In order to build your own images, with your own commands and setup and whatever else, you need to create a file (typically named "Dockerfile") with everything in it.

For example, I created a file with a single line in it:

from rabbitmq:3.8.1-management

And when I wipe out the 3.8.1-management image I pulled down before, and then call the Dockerfile, we get output that looks very familiar.

$ docker build .

Sending build context to Docker daemon  2.048kB
Step 1/1 : from rabbitmq:3.8.1-management
3.8.1-management: Pulling from library/rabbitmq
7ddbc47eeb70: Already exists
c1bbdc448b72: Already exists
8c3b70e39044: Already exists
45d437916d57: Already exists
916459a32f87: Already exists
aba97e76a6d7: Already exists
6cfc7646d503: Already exists
5e8c71984192: Already exists
16722d38aada: Already exists
b3a7c7a8fb05: Already exists
6b5a827c0c9c: Pull complete
32cfea652b55: Pull complete
Digest: sha256:5e144493152208e189763a61b81db5600a53531826c48ceffecf8a2f7efbac19
Status: Downloaded newer image for rabbitmq:3.8.1-management
 ---> 36ed80b6a1b1
Successfully built 36ed80b6a1b1

Each layer of your image would (unlike my tiny example above) include setup and configuration steps, environment variables, additional commands, yadda yadda. You can even add comments to the file too, so it's clear why you did what you did.

It's all one big piece of documentation on how to exactly replicate an environment, except the documentation itself can be run to create that environment... and you can commit it to a repo to track the changes.

Tip of the Iceberg

I've only touched the very tip of the iceberg so far, but I feel like I have a much better understanding now of all this "containers are built from layers" stuff I've heard about. I hope it helped you too!

If you came here looking for tips on icebergs, I have none. If you're reading this on a cruise ship and you're a member of the bridge, glance up once in awhile and avoid them. Just sayin'.