If you're just starting out with Docker, it's super easy to follow the examples, get started and run a few things. However, moving to the next step, making your own Dockerfiles, can be a bit confusing. One of the more common points of confusion seems to be:
Docker Download Image Offline
In the above guide, we learned how to install Docker CE CentOS 8. We also learned how to download Nginx, MongoDB, Alpine and Redis image from the Docker Hub and create a container for each. I hope this will help you to download your desired Docker image and create a container. The heaviest contents are usually images. If you use the default storage driver overlay2, then your Docker images are stored in /var/lib/docker/overlay2. There, you can find different files that represent read-only layers of a Docker image and a layer on top of it that contains your changes. To download a new Docker image, use the command: docker pull imagename If you don’t know the exact name of the image, search for it in Docker’s repository with: docker search ubuntu. After working with Docker for some time, you will collect a local registry of images. Display a list of all Docker images on the system with: docker images.
Where are my Docker images stored?
I know this certainly left me scratching my head a bit. Even worse, as a n00b, the last thing you want to do is publish your tinkering on the public Docker Index.
Checkout my awesome new Docker image
Yeah. Not really what I want to do.
Even worse, for quite a while there was no way to delete something that you had published, so your shamefully awkward learning process was up there for good. Luckily, deleting published Docker repositories is now quite easy.
So let me start with this small assurance; Nothing you do will become public especially if:
- You haven't made an account on the public index.
- You haven't run
docker loginto authenticate via the command-line client.
- You don't run
docker push, to push an image up to the index.
One of the things that contributes to much of the confusion around Docker is the language that's used. There's a lot of terminology which seem to overlap, or is a bit ambiguous, used somewhat incorrectly, or has a well-established meaning that is different from how Docker uses it.
I'll try to clear those up here, in a quick vocabulary lesson.
Image vs Dockerfile
This one is the least confusing, but it's an important distinction. Docker uses images to run your code, not the Dockerfile. The Dockerfile is used to build the image when you run
If you go browsing around on the Docker Index, you'll see lots of images listed there, but weirdly, you can't see the Dockerfile that built them. The image is an opaque asset that is compiled from the Dockerfile.
When you run
docker push to publish an image, it's not publishing your source code, it's publishing the image that was built from your source code.
Registry vs Index
The next weird thing is the idea of a Registry and an _Index_, and how these are separate things.
An index manages user accounts, permissions, search, tagging, and all that nice stuff that's in the public web interface.
A registry stores and serves up the actual image assets, and it delegates authentication to the index.
When you run
docker search, it's searching the index_, not the _registry. In fact, it might be searching multiple registries that the index is aware of.
When you run
docker push or
docker pull, the index determines if you are allowed to access or modify the image, but the registry is the piece that stores it or sends it down the wire to you after the index approves the operation. Also, the index figures out which registry that particular image lives in and forwards the request appropriately.
Beyond that, when you're working locally and running commands like
docker images, you're interacting with something that is neither an index or a registry, but a little of both.
Docker's use of this word is similar to its use at Github, and other source control systems, but also, kind of not.
Three common head-scratching questions are:
- What's the difference between a repository and a registry?
- What's the difference between a repository and an image?
- What's the difference between a repository and an index username?
In fact, this is a problem, because a repository is all of those things and not really any of them either. Further, when you run
docker images you get output like this:
So, the list of images seems to be a list of repositories? Huh? Actually the images are the GUIDs, but that's not how you interact with them.
Let's start over with this.
When you run
docker build or
docker commit, you can specify a name for the image. The name is usually in the format of
username/image_name, but it doesn't have to be. It could be anything, and it could even be the same as something well known and published.
However, when the time comes to
docker push, the index will look at the name, and will check to see if it has a matching repository. If it does, it will check to see if you have access to that repository_, and if so, allow you to push the new version of the _image to it. So, a registry holds a collection of named repositories_, which themselves are a _collection of images tracked by GUIDs. This is also where tags come in. You can tag an image, and store multiple versions of that image with different GUIDs in a single named _repository_, access different tagged versions of an image with a special syntax like
If we look at the output from
docker images again, now it makes a little more sense. We have five different versions of the image named
ubuntu, each one tagged slightly differently. The repository holds all of those under that name
ubuntu. So, while it may seem like
ubuntu is an image name, it's actually a repository name, indicating where it came from, or where it should go during a push.
Further, the repository name has a specific schema to it. An index can parse out the username from first part, and figure out where it is.
So, this the confusing part: Suppose there's a Docker image called
The official 'repository name' is
thoward/scooby_snacks, even though we would normally think of the repository as just being
scooby_snacks (eg, in GitHub, or elsewhere).
In fact, when the Docker documentation refers to a _repository_, it sometimes means the whole thing, username included, and sometimes only means the part after the username.
That's because some repositories don't have usernames (like
ubuntu). The username is very important to handle separately, because it's used for authentication by the _index_, so that part of the repository name has its own semantics separate from the name, when it's there.
Local Storage on the Docker Host
So far I've been explaining the intricacies of remote storage, and how that relates to the confusing vocabulary, but running
docker images shows you only what is local to your machine.
Where is this stuff? The first place to look is in
Open up the file
repositories to find a JSON list of the repositories on your host:
Hey, that matches the output from
Checkout what's in
Not terribly friendly, but we can see how Docker is keeping track of these, based on the
repositoriesJSON file which holds a mapping of repository names and tags, to the underlying image GUIDs.
We have two images from the
ubuntu repository, with the tags 12.04, precise, and latest all corresponding to the image with id
8dbd9e392a96 for short).
So what's actually stored there?
The entries here are:
json- holds metadata about the image
layersize- just a number, indicating the size of the layer
layer/- sub-directory that holds the rootfs for the container image
Pretty easy. This is the magic behind being able to refer to an image by its repository name, even if you're not interacting with the remote Docker Index or Docker Registry. Once you've pulled it down to your workstation, Docker can work with it by name using these files. This is also where things go when you're developing a new Dockerfile.
Let's try an example. Make a
Dockerfile with the following contents:
This basically doesn't do anything except say that we're including the
ubuntu image as our base layer, but that's enough to get started.
docker build -t scooby_snacks .. What that will do is look in the directory we specified (
.) for a file called
Dockerfile and then build it, and use the name
scooby_snacks for the repository.
Oh no! It said 'Uploading context'… Did we just upload it to the public registry?
Whew! Not there. So why did it say that?
I have no idea, but you can ignore it. Where did it really end up?
Well, looks like Docker just 'uploaded' it to
It should also show up in
There it is! Docker was smart enough to realize that we didn't change anything, so it kept the same image id, and didn't bother copying the
ubuntu image. Pretty sweet.
Next, we'll make a small change so that Docker will have to build a new layer.
Dockerfile to have these contents:
docker build -t scooby_snacks . to rebuild.
There should be a new directory under
Docker gave it a new image ID:
It has also been updated in
Let's see what
/var/lib/docker/graph/91acef3a5936769f763729529e736681e5079dc6ddf6ab0e61c327a93d163df9 looks like now:
Our tiny change had a big impact! Notice that Docker only kept the differences from the base image. This is the key to the layer concept.
We can now run our new image and try it out. We'll just run an interactive
bash prompt for now.
The effects of our
RUN touch scooby_snacks.txt command in the
Dockerfile are exactly as expected.
Until now, we've been doing everything locally and not interacting with the outside world at all. This is great, we can work up a perfect
Dockerfile before we go live. That said, I'm pretty happy with this one now, and I'm ready to publish it.
If you haven't already, make sure you make an account, and then login with
Publish the image with
docker push scooby_snacks
Oops. Docker Index won't let us publish without our username in the repository name. No big deal.
Rebuild this with the correct username using
docker build -t thoward/scooby_snacks .
Nice! The message 'Using cache' means Docker was smart enough to know that we didn't really change the image, so it didn't bother rebuilding it.
Let's try publishing again, but this time with the correct repository name:
Now that it's published, it should show up with
There it is. Next, let's cleanup a bit and delete the old root level one:
Ok, that one is gone.
Also, to be honest, this is not a very interesting image to share publicly, and we don't want to look like n00bs, so let's delete it as well.
This time, since Docker realized it was the last reference to that image ID,
docker rmi has an additional message indicating that it deleted it instead of just 'untagging' it.
But wait! It is still public at the Docker Index, isn't it? Let's check:
Hmm.. Well this is handy, before we delete it, we can try
docker pull and fetch it down like a 'real' image and run it.
Deleting a Published Repository
Unfortunately, to delete it from the public index/registry, we have to use the web interface, not the command-line.
First, login via the web then navigate to the repository athttps://index.docker.io/u/thoward/scooby_snacks/.
Click on 'Settings' tab, then 'Delete Repository' tab, then the 'Delete Repo' button.
Back on the command-line we can verify it's gone with
docker search scooby_snacks
But of course, since we never deleted the local version of it after we pulled it back down, it's still going to show up in
docker images, since we have a local copy:
So to completely remove it we need to run
docker rmi again.
Not to worry, we can always rebuild it with our
Important Security Lesson
It's really important to consider the security implications of what we just saw though.
Even if a Docker image is deleted from the Docker Index it may still be out there on someones machine. There's no way to change that.
Also, as we saw when looking at the files we have locally, it's not quite an 'opaque binary' image. All the information from the Dockerfile was in the JSON file for the image, and the artifacts of those commands are in the layer, as accessible as a filesystem. If you accidentally published a password or key, or some other critical secret, there's no getting it back, and people can find as easily as they can find anything else in a published open source code base.
Docker Run Redownload Image
Be very careful about what you're publishing. If you do accidentally publish a secret, take it down right away and update credentials on whatever systems it might have compromised.
Docker Redownload Image Online
Docker can be a bit confusing with its terminology, but once you wrap your head around the basic workflow described here, it should be very easy to be in-control of what you're building, knowing exactly when and how you share that with the world.