Understanding Images and Containers

The App Devs Primer on Images and Containers, Part I

Mar 27, 2024 • Evan "Hippy" Slatis

•

Overview

OCI Images are binary software packages.

Images are binary software packages designed to contain a complete set of dependencies needed to run an application, up to and including the base operating system.
Containers are processes launched from images.

Containers are runtime processes created from images that run in isolation from each other with their own filesystems, cpu, and memory.

⬆ Table of Contents

Audience and scope

This is the first of a series of three primers and two short labs that will describe and teach Open Container Initiative (OCI, or “Docker”) image and container technology. These primers are not meant to give anyone a complete or thorough understanding of the technology, but an overview of the concepts, how it benefits application developers, and the practical knowledge needed to use it.

Not everything contained in Part I will be necessary for an application developer to do their job, but it’s presented because it helps in the overall understanding of the technology and what it’s doing; e.g. it’s important to understand image layers, because they will influence how Dockerfiles (image build scripts) are written, but not so much blobs. The reason we take a sentence or two to explain blobs is that they show up on the command line when pulling images, so understanding what blobs are helps to understand what the CLI is doing when executing certain commands. For topics that are not germane to day-to-day work we try to be as brief as possible, and links to other resources are provided for those that want a more in depth understanding.

🛈	Estimated time to read all three primers and complete labs: ~3 hrs, depending on experience.

⬆ Table of Contents

OCI or Docker?

The technology for creating images and containers has existed for decades, but Docker, Inc. created the original specification for their images and containers to make them easily accessible to the average developer. Eventually Docker and other members of the industry helped create an independent community to define and maintain a standard specification, The Open Container Initiative.

⬆ Table of Contents

Images

OCI images are nothing more than a binary packaging format for software, loosely comparable to a Java JAR, Python wheel, or npm package. JAR files are simply zip files with a different extension, and Python wheels are distributed as gzipped tarballs, for example. Like Python wheels, images are packaged as gzipped tarballs (tar.gz).

Each image has its own, independent filesystem that contains a complete set of dependencies needed to run an application. In practice this usually includes a complete facade of an operating system. Because images contain so much extra software than traditional packaging strategies, they will be much larger in size.

Images are immutable, and cannot be changed once they are built. A new image will need to built to realize any changes to an application’s code or internal configuration.

Images are defined by and built with scripts known as Dockerfiles. These scripts typically live in the root directory of an application’s source code repository and are used to build an application image during development and at the end of an application’s continuous integration pipeline.

🛈	While the OCI specification actually denotes Containerfiles to be the default, Docker isn’t fully OCI compliant as of the writing of this primer. Because most publicly available projects still use Dockerfiles, for convenience and historical purposes these primers will only refer to Dockerfiles.

⬆ Table of Contents

Image layers

Images are comprised of a set of read-only layers stacked one on top of the other. Image layers represent a change to an image’s filesystem. This is loosely similar to a Git commit, which represents an atomic change to a set of files in a Git repository. The utilities used to build and pull images can cache image layers, so similar images that use the same layers can be built faster and don’t have to repeated download or rebuild the layer for every new image that shares it.

⬆ Table of Contents

Blobs

Image layers are stored in low-level, unnamed tar.gz files called blobs, which we only mention because you will see this term when pulling images on the command line. Blobs are referenced internally by their digests or hash.

⬆ Table of Contents

Base images

Like classes in object-oriented languages, images can be built upon to create new images. Base images are images used to build other images. The new image’s layers are stacked on top of the base to create the new image.

Taking the object-oriented analogy a step further, images and their base images have a loose inheritance structure. When a new image is built on top of another image, it initially "inherits" everything from its base image, and it will keep the properties of the base image unless they are specifically overridden by the new image build.

⬆ Table of Contents

Minimal base images

Base images tend to be larger than they need to be, because to be reusable to a wide variety of applications they necessarily contain extra software that won’t be used by some particular application. Larger images means it takes longer to download from remote repositories, and takes up more system resources. There are many different communities dedicated to building minimal OCI images of one type or another for use as base images. Minimizing an application’s image size is a best practice, so it’s best to use these smaller base images when available.

⬆ Table of Contents

Scratch

The most fundamental of base images is called scratch. The scratch image has a link to what looks like an image on Docker Hub, but scratch doesn’t actually exist anymore. Building an image "from scratch" will not add a layer to the final image, and just means the new image is starting out with an empty filesystem and only a root directory, /.

While using scratch as a base image is the most minimal of images, building these types of images requires an advanced knowledge and understanding of gathering and installing the minimum amount of low-level dependencies for an application to run. In almost all cases this isn’t practical for an application developer, and how to do so is outside the scope of this primer, but we include this information here for completeness.

⬆ Table of Contents

Minimalist image examples

The following are a couple of examples of commonly used, minimalist base images:

Alpine
Red Hat’s Universal Base Image (UBI)

⬆ Table of Contents

So what about virtual machines?

VM and OCI images are completely different technologies with very different goals. Virtual machine images are designed to virtualize hardware. OCI images are only designed to package software. In particular, virtual machines images are

Much larger in size
Take much longer to download
Take much longer to start

Unless direct access to hardware is needed, in every metric packaging and running an application in an OCI image versus a VM results in a vast improvement in performance and resource usage.

⬆ Table of Contents

Image warehousing

Images are stored and shared remotely in registries and repositories.

⬆ Table of Contents

Image registries

An image registry is a digital warehouse for images. Image registries are analogous to source-code-hosting facilities and servers like GitHub or GitLab, but for images.

The following are examples of different public Image registries, and are free to use by the public:

Docker Hub
The original Docker registry. Docker also has a free image available for local, development use.
Red Hat Quay
Red Hat’s public image registry, which hosts many of Red Hat’s publicly available images, along with providing opportunities for individuals to host their own images. It is also available for download for free use, since everything Red Hat produces is OSS.

GitHub and GitLab also have options for storing images with Git repositories, but their primary purpose as a platform is for Git version control.

⬆ Table of Contents

Image repositories

Within a registry, images are stored in image repositories. These are analogous to Git repositories. They are identified with a user-defined, human readable name, and help with organizing images into meaningful, human-readable collections. The repository name and image name are synonymous.

⬆ Table of Contents

Image Manifests and Digests

The image manifest is a JSON document containing information about the images metadata and layers.

The _image digest is a unique unique sha256 hash based on the image manifest when the image is created.

Repeating the same image build with no other changes will result in the exact same hash. Image digests have the form of <algorithm>:<64 digit hex>; e.g.:

sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b

Although you’ll almost always pull images using a repository name and tag, you can also use its digest, and you will occasionally see images referenced by their digest in order to guarantee which build of an image is being pulled.

🛈	Images also have an "ID", but you can safely ignore them. They are local values used by the OCI tooling and of no real importance to end users.

⬆ Table of Contents

Image Tags

While the image digest guarantees a unique identifier between different image builds, it is not very meaningful or convenient to human beings trying to understand what each specific image was built for, which is the purpose of image tags. Image tags allow developers to easily name and identify each separate image within a repository. Image tags are analogous to Git tags in that they attach a human readable identifier to a particular hash.

Tags have a many-to-one relationship with images and their digests; i.e. a single image digest can be referenced by any number of tags, or none at all. A common example are builds that are tagged for deployment across multiple development environments; e.g. as an image is built, deployed, and promoted from dev (DEV) to functional testing (QA or QAT) to user-acceptance testing (UAT) environments as part of a typical software development lifecycle, the image might gather the tags of dev, qat, and uat, respectively.

⬆ Table of Contents

latest

By default, any untagged image pushed to a repository is automatically given the tag latest. This tag can also be given to an image explicitly. Conversely, if you pull an image with specifying the tag, it will try and pull latest by default. Just like with application build and dependency management tools (e.g. Maven, Pip, or npm), best practices are that you should always refer to a specific image tag (or digest) and not to latest, or unpredictable builds and behavior could result. During development and/or local testing it can be a convenient shortcut, though.

⬆ Table of Contents

Image repositories are NOT version controlled

Because we expect software developers to be very familiar with version control systems we have made use of those analogies to better communicate some of the concepts inherent in Image registries and repositories; however, it is important to remember that image repositories do NOT track changes to images or tags.

This can become confusing to developers new to OCI image technology, because the workflows and terms appear very similar on the surface. As changes to source code are "pushed" to a Git repository, a new build is typically kicked off, which in turn results in an image incorporating the new application artifacts being "pushed" to its image repository and tagged exactly the same as the last build; e.g. as my_app:dev. Looking at the source branch in the Git repository and image tag in the image repository, both have new hashes associated with them. The difference is the Git commit hash represents a change in the source code, whereas the image digest represents a completely different image with no relation to the previous one.

In this sense Git tags and image tags are very much alike, because if you "move" a Git tag from one commit to the another, no record is kept in Git of the commit hash it previously pointed to. In Git this is generally considered a bad practice, but, again, Git is a distributed version control system and not a centralized digital warehouse for storing OCI images.

⬆ Table of Contents

Example: eclipse-temurin on Docker Hub

As mentioned above, Docker Hub is a public image registry that hosts a number of official Image repositories free for download, one of which is eclipse-temurin. eclipse-temurin is one of many Image repositories containing official builds of OpenJDK images for building or running Java applications. A quick look at their tags reveals dozens of OpenJDK images that hold different combinations of Java and operating system versions; e.g.:

The first represents Java 11 installed on an Alpine image, a family of OCI images based on Alpine Linux. The latter two represent the Java 11 and 17 JDKs installed on Red Hat’s UBI 9 image, respectively.

⬆ Table of Contents

Containers

Whereas images are executable binary files, containers are the processes running on the host created from images. Running a container is actually a two step process, create and start.

Create
Creating a container takes an image and gives it a unique ID and filesystem, which facilitates being able to run the same image multiple times.
Start
Starting a container will launch a process on the host machine. The application running inside the container will be isolated from the rest of the system and behave as if it’s running in its very own host with it’s own filesystem.

⬆ Table of Contents

The advantages of containers

From the software developer’s perspective, containers have two very important traits that give them advantage over other forms of packaging and deploying your application:

Consistency
Scalability

Both are related to each other, and together they allow projects to use one of the most promising innovations to come to software development in recent years, the principle of “Build once, deploy many”.

⬆ Table of Contents

Consistency

No more “But it works on my machine”

Because images are immutable and include all of the dependencies needed to run your software from the operating system on up, the environment you deploy your application has now become consistent. This means whether you launch an image as a container in a development, test, or any number of production environments, the container will run exactly the same way. As a application developer, you won’t have to worry about where your application is running, because it’s always the same. That’s the benefit of packaging your software along with its complete runtime environment, rather than just your application without the total set of dependencies needed to run it.

This consistency means that in almost all cases when an issue is found in one environment (e.g. production), you can be reasonably confident that you’ll be able to reproduce the issue in some other environment (e.g. development or test) so you can focus on fixing it. In general, your project should never get mired in and stumped by the dreaded “Works on my machine” problem again due to differences in how the application is deployed.

☠

Just because containers ensure the environment your application is deployed to is always the same, that does not mean it will have the same configuration; e.g. development and production are rarely if ever deployed with the same configuration. Containers can drastically minimize the problem, but cannot eliminate the issue completely.

⬆ Table of Contents

Scalability

From the host system’s perspective, each container is an independent process. From the application’s perspective, each container it runs in is a separate host system with its own, independent filesystem. As mentioned earlier, containers are relatively lightweight and can startup rather quickly, which means that if you designed your software appropriately you can easily and quickly spin up as many copies of your application as your host machine will allow. This is called horizontal scaling.

It’s also very easy to scale back down again by simply stopping the unneeded containers. They’re just running processes.

To take advantage of scaling easily, applications still need to be designed properly to handle it. We suggest following the REST architectural style in particular to most easily take full advantage of this feature.

🛈	*Vertical scaling* involves increasing a container’s internal resources such CPU or memory, and while it can be done it is not a design or deployment issue that application developers would be concerned with, and is therefore outside the scope of these primers.

⬆ Table of Contents

Are containers secure?

TL;DR: no.

Containers run in isolation from the point of view of the host system and each other, but they are not built with security in mind. We will discuss how to mitigate this issue in more detail when we describe how to write image definitions using Dockerfiles in a later primer.

⬆ Table of Contents

Summary

These are the main points presented in this primer:

Images are just another format for packaging applications.
- They contain all the software dependencies needed to run the application.
- They are stored and shared from registries and repositories.
Containers are application runtime processes created from images.
- They run in isolation from each other and the host system.
- They provide consistency across all deployment environments.
- They are easily scalable.
- They do not run in a secure sandbox.

⬆ Table of Contents

Next Steps

Part II of this series will cover the Podman and Docker utilities for building and managing your images and containers.

Part III of this series will cover writing Dockerfiles for building your own images.

⬆ Table of Contents

Further Reading

Comments