After a few weeks, I'm back on the Docker train. Or is it the Docker whale? Who trains whales to haul cargo anyway? And why don't we see more protests about this? Where's PETA?? So many important questions, but at least the whale's smiling.
A few weeks ago, I took my first real look at Docker, and learned just enough to write a bad analogy. Now I'm taking a closer look at the anatomy of a docker image, which I've always heard consists of layers, but I've never quite understood what that meant.
Old Skool VMs 🤘
It works, but what if you want to:
- rollback to before the last update? the last 10 updates?
- make sure an entire team (or a new hire) could recreate the same exact setup?
- deploy several slightly different versions for different environments?
You could take a snapshot of the entire VM after every significant update, but that would waste a ton of space and be stupid-kludgy to manage.
Back to Docker 🐳
Instead of creating a VM and saving that behemoth, you can create a script that defines how your VM should be built, and then commit changes to that script to a repo like any other piece of code you've written.
If you wanted the latest stable version of Ubuntu from dockerhub, where all kinds of "images" are stored and made publicly available, just do a docker pull for the ubuntu image you're interested in. The individual hashes you see are changes to the initial image. It has to pull all those changes, or layers, to get you the complete image.
$ docker pull ubuntu:19.10 19.10: Pulling from library/ubuntu 4fc5deeb8d45: Pull complete d70e07bddb7c: Pull complete a4bf564c51f3: Pull complete 111c7f4c8fb3: Pull complete Digest: sha256:98051557b93f45de6ab02001287be81a693df09fe71a1d9fb45056af2671e17d Status: Downloaded newer image for ubuntu:19.10 docker.io/library/ubuntu:19.10
Then you could do a docker run against that image with whatever command you wanted - even something as silly as just printing the date. It creates a container, runs the command in it, then outputs the result. Nifty. And super useful. 😏
$ docker run ubuntu:19.10 date Wed Nov 27 19:09:16 UTC 2019
BTW, if you want to see the current images you've downloaded, or the containers you've created from those images, here's a couple handy commands to keep in mind.
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu 19.10 c351ab52170e 16 hours ago 72.9MB ruby 2.6 d98e4013532b 5 weeks ago 840MB hello-world latest fce289e99eb9 11 months ago 1.84kB $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS d39fc40ff8aa ubuntu:19.10 "date" 20 seconds ago Exited (0) 19 seconds ago 2421438b1d06 ubuntu:19.10 "/bin/bash" 8 minutes ago Exited (0) 8 minutes ago 40bd23d0b84a hello-world "/hello" 4 days ago Exited (0) 4 days ago
That's all well and good if all you want is a base Ubuntu image, but what if you want something running on top of that like RabbitMQ? As luck would have it, there are plenty of images for you here, depending on what version of Rabbit you'd like and even what version of Linux you'd like to run it on top of!
For example, pulling down 3.8.1-alpine currently gets you this image, which starts by doing a
FROM alpine:3.10 which means it's based off the Alpine image. I wouldn't recommend that though, since there's also a comment in the Dockerfile that says "Alpine Linux is not officially supported by the RabbitMQ team -- use at your own risk!"
Instead, you could pull down 3.8.1 which starts
FROM ubuntu:18.04 instead. Then it runs other commands to setup OpenSSL, OTP (Erlang), and finally RabbitMQ. If you decide you need to administer Rabbit, the 3.8.1-management image builds on top of 3.8.1, so it does all the above and then enables and sets up the rabbitmq management plugin too!
You can get a better sense of what's happening if you pull each layer of the final image, instead of just pulling the final image by itself. Here I pulled Ubuntu, then the RabbitMQ image that depends on it, and finally the management image that builds on that. Notice how each subsequent image reports that some of the required layers are already available, and they aren't pulled down again.
$ docker pull ubuntu:18.04 18.04: Pulling from library/ubuntu 7ddbc47eeb70: Pull complete c1bbdc448b72: Pull complete 8c3b70e39044: Pull complete 45d437916d57: Pull complete Digest: sha256:6e9f67fa63b0323e9a1e587fd71c561ba48a034504fb804fd26fd8800039835d Status: Downloaded newer image for ubuntu:18.04 docker.io/library/ubuntu:18.04 $ docker pull rabbitmq:3.8.1 3.8.1: Pulling from library/rabbitmq 7ddbc47eeb70: Already exists c1bbdc448b72: Already exists 8c3b70e39044: Already exists 45d437916d57: Already exists 916459a32f87: Pull complete aba97e76a6d7: Pull complete 6cfc7646d503: Pull complete 5e8c71984192: Pull complete 16722d38aada: Pull complete b3a7c7a8fb05: Pull complete Digest: sha256:1c000709124c4c3e3da1657ef81de7a300e92cee249163831df141ddb4145762 Status: Downloaded newer image for rabbitmq:3.8.1 docker.io/library/rabbitmq:3.8.1 $ docker pull rabbitmq:3.8.1-management 3.8.1-management: Pulling from library/rabbitmq 7ddbc47eeb70: Already exists c1bbdc448b72: Already exists 8c3b70e39044: Already exists 45d437916d57: Already exists 916459a32f87: Already exists aba97e76a6d7: Already exists 6cfc7646d503: Already exists 5e8c71984192: Already exists 16722d38aada: Already exists b3a7c7a8fb05: Already exists 6b5a827c0c9c: Pull complete 32cfea652b55: Pull complete Digest: sha256:5e144493152208e189763a61b81db5600a53531826c48ceffecf8a2f7efbac19 Status: Downloaded newer image for rabbitmq:3.8.1-management docker.io/library/rabbitmq:3.8.1-management
Is it becoming obvious how convenient this system is? Instead of one giant steaming stew of a machine that "just works" but is impossible to exactly replicate, you're forced to slow down and document the exact steps to reproduce a given machine. Complex images are really layers over previous images, which could themselves be layers over even more previouser (??) images.
In order to build your own images, with your own commands and setup and whatever else, you need to create a file (typically named "Dockerfile") with everything in it.
For example, I created a file with a single line in it:
And when I wipe out the 3.8.1-management image I pulled down before, and then call the Dockerfile, we get output that looks very familiar.
$ docker build . Sending build context to Docker daemon 2.048kB Step 1/1 : from rabbitmq:3.8.1-management 3.8.1-management: Pulling from library/rabbitmq 7ddbc47eeb70: Already exists c1bbdc448b72: Already exists 8c3b70e39044: Already exists 45d437916d57: Already exists 916459a32f87: Already exists aba97e76a6d7: Already exists 6cfc7646d503: Already exists 5e8c71984192: Already exists 16722d38aada: Already exists b3a7c7a8fb05: Already exists 6b5a827c0c9c: Pull complete 32cfea652b55: Pull complete Digest: sha256:5e144493152208e189763a61b81db5600a53531826c48ceffecf8a2f7efbac19 Status: Downloaded newer image for rabbitmq:3.8.1-management ---> 36ed80b6a1b1 Successfully built 36ed80b6a1b1
Each layer of your image would (unlike my tiny example above) include setup and configuration steps, environment variables, additional commands, yadda yadda. You can even add comments to the file too, so it's clear why you did what you did.
It's all one big piece of documentation on how to exactly replicate an environment, except the documentation itself can be run to create that environment... and you can commit it to a repo to track the changes.
Tip of the Iceberg
I've only touched the very tip of the iceberg so far, but I feel like I have a much better understanding now of all this "containers are built from layers" stuff I've heard about. I hope it helped you too!
If you came here looking for tips on icebergs, I have none. If you're reading this on a cruise ship and you're a member of the bridge, glance up once in awhile and avoid them. Just sayin'.