Writing and Using Dockerfiles

The App Devs Primer on Images and Containers, Part III

Apr 2, 2024 • Evan "Hippy" Slatis

•

Overview

Dockerfiles are scripts that define your images and how they are built. They can define a base image, user ID, persistent volume, what software to install in the image, and more.

🛈	While the OCI specification actually denotes Containerfiles to be the default, Docker isn’t fully OCI compliant as of the writing of this primer. Because most publicly available projects still use Dockerfiles, for convenience and historical purposes these primers will only refer to Dockerfiles.

🛈

Because we feel that Podman is the better tool technologically, this primer will only reference Podman; however, because Docker and Podman are conceptually and semantically equivalent, wherever podman is mentioned in this document or the code, docker can be safely substituted with no change in meaning or outcome.

Prerequisites

This primer assumes the reader is familiar with the basic concepts behind OCI images and containers and Podman/Docker CLI commands. If not, we suggest reading Part I of this series to cover the basic concepts behind OCI images and containers and/or Part II that covers the Podman/Docker CLI.

The lab at the end utilizes a simple Spring Boot application for testing. Since the purpose of the lab is to learn about Dockerfiles and not Java, the lab comes pre-built and installing Java is not necessary.

⬆ Table of Contents

General format and information

Dockerfiles have the following format:

# This is a comment
<INSTRUCTION> [arguments]

Leading whitespace is ignored, but whitespace in instruction arguments is not. By convention, Dockerfile instructions are written in ALL CAPS, but the language is case insensitive.

⬆ Table of Contents

File naming convention

By convention, Dockerfiles are named "Dockerfile" without an extension.

podman build .

The build command above will automatically look for a file named "Dockerfile" in the current directory. To deviate from convention, Podman has the flag -f or --file, which can take a path or a URL as an argument; e.g.

podman build -f Dockerfile.my_ext

⬆ Table of Contents

JSON vs freeform text

Instructions have two general forms:

JSON array

Executable INSTRUCTIONs refer to this as "exec" style.
```
<INSTRUCTION> <OPTIONS> ["value1", ..., "valueN"]
```
⚠
You must use double-quotes for values.
Free form text

Executable INSTRUCTIONs refer to this as "shell" style.
```
<INSTRUCTION> <OPTIONS> value1 ... valueN
```

⬆ Table of Contents

Squashing image layers

Every instruction in a Dockerfile adds a layer to an image, although only those that actually modify the image filesystem increase the image size. Layers are cached by Podman for faster builds and downloads.

Podman has the --squash flag when building images to compress all new layers into a single layer, and it’s turned on by default, but it is still considered good practice to use as few instructions in your Dockerfile as possible; e.g. looking at an example RUN statement that will upgrade everything in the base image, install wget, and then clean the dnf cache to reduce the final image size:

# bad practice, creates three new layers
RUN dnf upgrade -y --no-docs --refresh
RUN dnf install -y --no-docs wget
RUN dnf clean all

# good practice, creates a single layer
RUN dnf upgrade -y --no-docs --refresh && \
    dnf install -y --no-docs wget && \
    dnf clean all

docker build --squash will also work, but it’s considered experimental as of this writing, which means the Docker daemon will need to be started with the --experimental flag to use it. Docker’s documentation also offers a short discussion on some of the limitations of squashing an image build, but since the pros almost always outweigh the cons, it’s best to leave the option on.

To compress everything including the base image layers into a single layer Podman also offers a --squash-all flag, and the resulting image of a build will be a single layer. This functionality is not currently available with Docker.

⬆ Table of Contents

Heredoc

Most recently both Podman and Docker have begun to support heredoc in Dockerfiles. Revisiting the RUN example above:

RUN <<EOF
  set -e # exit immediately if commands below fail
  dnf upgrade -y --no-docs --refresh
  dnf install -y --no-docs wget
  dnf clean all
EOF

COPY and ADD commands also support heredoc. Most Dockerfiles you’ll review on the internet will look like the previous example and not use heredoc, because heredoc was introduced only a few years ago for Docker, and only late in 2023 for Podman.

💡	When possible, use heredoc as a cleaner and more readable option to reduce image size.

☠	There were reports that the layer cache wasn’t being invalidated when the heredoc was changed. It should be fixed in the latest releases, but in case you run into this issue, just change the heredoc name or use the older notation.

⬆ Table of Contents

Common Dockerfile Instructions

Around half of the available instructions available in a Dockerfile are summarized below. They are the instructions that are most commonly used, and the ones we’ll use in the lab below. They are roughly presented in the order they’d be used in a typical Dockerfile rather than alphabetically, because the purpose of this primer is for learning how to write and use Dockerfiles and not to be a reference document.

💡	For convenience, each header below links to its instruction’s official Dockerfile reference documentation.

⬆ Table of Contents

FROM

FROM <image>[:<tag>]

Defines the base image of the build. Usually the first line of every Dockerfile. If no tag is given, latest is assumed. FROM can be parameterized with an ARG statement, but otherwise FROM must be the first statement in a Dockerfile.

# the following two statements are equivalent
FROM busybox
FROM busybox:latest

# Use the alpine-based Python image
FROM python:alpine

# A parameterized FROM statement
ARG VERSION_PARAMETER
FROM python:${VERSION_PARAMETER}

⬆ Table of Contents

ARG

ARG <name>[=<default value>]

Defines a build variable. Variables can be assigned a default value or left empty to act as parameters to be passed through the command line when starting a build. These key/values pairs are only available during the build, and will not be part of the image when the build is completed. Argument scope is from the line in which it is declared.

☠	Arguments must be declared in the Dockerfile. References to arguments declared on the command line but not in the Dockerfile will be empty.

# defines a build parameter optionally set on the command line
# that can be used when the build is executing
ARG MY_COMMAND_LINE_PARAMETER

# defines a variable value that can be used when the build is executing
# defining this variable on the command line will override it
ARG MY_DOCKERFILE_VARIABLE=someValue

Referencing a argument in a Dockerfile has the same form as Shell, i.e. ${<arg name>}. If you need to pass an argument to running container, assign it to an environment variable using an ENV statement.

# Referencing the ARG later in Dockerfile and passing it to the container
ENV MY_FORMER_COMMAND_LINE_PARAMETER=${MY_COMMAND_LINE_PARAMETER}
<<RUN>> echo ${MY_COMMAND_LINE_PARAMETER}

⬆ Table of Contents

Using the CLI to pass in argument values

Use the --build-arg or --build-arg-file options to pass arguments to the Dockerfile:

# Assign N number of arguments on the command line
podman --build-arg MY_ARG1=my_value1 ... --build-arg MY_ARGn=my_valueN

Or using a properties/env file of arg=value lines:

# read in a list of arguments and their values from a file
podman --build-arg-file argfile.conf
podman --build-arg-file argfile.env
podman --build-arg-file argfile.properties

CAUTION

⚠	Do NOT use ARGs to pass in build secrets.

The values can be exposed through podman history. If accessing build secrets is required, more information is available here.

⚠	Always declare your ARG statements BEFORE RUN statements.

Declare ARGs towards the top of the file and before any RUN statements. ARG statements are not cached in the image or intermediate layers; e.g. if a build parameter is changed on the command line podman build will not notice a change to a cached image layer until the ARG is realized in an instruction that is persisted to the image. Because RUN statements intrinsically realize all ARG values as part of their environment variables and affect the final image, and changes to ARGs declared before the RUN statement will be invalidate the layer cache.

⬆ Table of Contents

ENV

ENV <name>=<value> ...

# the equals operator is optional
ENV <name> <value> ...

Defines an environmental variable and value.

# defines the locale of the host in a Linux-based container
ENV LANG=en_US.UTF-8

# define mutliple key/value pairs; use backslashes if values contain spaces
ENV FOO=BAR BAZ=value\ with\ spaces

ENV values are available both during the build and in the running container.

ENV MY_VAR=my_value
# print MY_VAR during the build
RUN echo ${MY_VAR}
#print MY_VAR when the container is run
ENTRYPOINT ["echo", "${MY_VAR}"]

Environment variables ALWAYS override argument values of the same name.

# The output will be "baz"
ARG FOO=bar
ENV FOO=baz
ARG FOO=something_else
RUN echo ${FOO}

Environment variables also support a limited number of bash parameter expansion modifiers.

# if MY_ENV_VARIABLE is not defined, use the value some_default_value
${MY_ENV_VARIABLE:-some_default_value}

⬆ Table of Contents

USER

USER <username>

Set the current USER to use in the image for subsequent Dockerfile instructions. The last USER instruction in the Dockerfile will be the UID used to launch a container based on the image.

# root user
USER root

# user with UID 1001: best practices
USER 1001

Root is typically declared towards the top of the Dockerfile to carry out tasks such as installing software or changing file permissions for the image.

⬆ Table of Contents

Container security

It goes without saying that good security practices should always be followed when writing code, and Dockerfiles are no exception. Applications running in containers may look and feel like they’re running in isolation, but this is not guaranteed.

⚠	The OCI runtime container environment is isolated, but is NOT secure. Unless otherwise needed, best practice is to declare a unprivileged USER UID before defining a ENTRYPOINT; e.g. `USER 1001`.

⬆ Table of Contents

COPY and ADD

ADD  <src> ... <dest>
ADD  ["<src>", ... "<dest>"]

COPY  <src> ... <dest>
COPY  ["<src>", ... "<dest>"]

The instructions are defined as follows:

COPY - copy a file or directory into an image.
ADD - copy a local or remote file or directory into an image.

They each takes a list of paths, moving a list of n-1 files or directories to the last listed file or directory. <src>'s must be within the build context.

# COPY or ADD appConfig.yaml from the target directory to /app/config.yaml in the image
ADD ["target/appConfig.yaml", "/app/config.yaml"]
COPY target/appConfig.yaml /app/config.yaml

# COPY or ADD the file file1.txt and directory dir1 to the app directory in the image
ADD file1.txt dir1 /app
COPY ["file1.txt", "dir1", "/app"]

⬆ Table of Contents

Differences between COPY and ADD

COPY can only copy local files into images, but ADD as some additional capabilities:

ADD can copy remote files from URLS
ADD can pull Git repositories
ADD will automatically unpack compressed tar file

Using ADD can have unpredictable side effects. RUN statements can use git pull, curl, or wget for fetching remote files, and zip, bzip2, or gzip for decompression.

☠	Use a combination of COPY and RUN and avoid using ADD.

⬆ Table of Contents

RUN

RUN <command> ...
RUN ["<command", ...]

Run defines commands that are run during the build. It can be used to install software, create users, set file permissions, etc.

# Install wget on an RPM-based Linux image
RUN dnf install -y --no-docs wget

Compare this with ENTRYPOINT and CMD, which defines commands and arguments that only execute when the container is launched.

⬆ Table of Contents

EXPOSE

EXPOSE is functionally a no-op, but it’s widely used to document which ports, if any, are expected to be used the by the container. It can document whether they are UDP or TCP, with TCP being the default if not used.

# TCP ports
EXPOSE 8080
EXPOSE 8080/tcp

# UDP port
EXPOSE 8080/udp

⬆ Table of Contents

WORKDIR

WORKDIR <path>

Set the container’s working directory. Like other instructions, WORKDIR can be used multiple times in a Dockerfile. The initial WORKDIR is whatever was the last value defined in the base image. If no WORKDIR has ever been set, it’s the root directory, /. Relative paths in a Dockerfile are relative to the current working directory.

WORKDIR /app
WORKDIR resources
RUN pwd

Output of the above with be /app/resources.

⬆ Table of Contents

ENTRYPOINT and CMD

"exec" form:

CMD ["<command>", "<param1>", ...]
CMD ["<param1>, ..."]

ENTRYPOINT ["<command>", "param1", ...]

"shell" form:

CMD <command> <param1> ...
CMD <param1> ...

ENTRYPOINT <command>, param1 ...

ENTRYPOINT and CMD define the commands and parameters that run and are used when the container is launched.

These instructions are defined as follows:

ENTRYPOINT - defines the executable to start when your container launches. Defaults to /bin/sh -c.

The "shell" form of ENTRYPOINT will always launch a shell before acting on the arguments. The "exec" (JSON array) form must be used if a shell isn’t wanted or needed.
CMD - The list of arguments passed to the ENTRYPOINT. Can easily be overridden on the command line.

Since ENTRYPOINT defaults to launching a shell and taking a script as an argument, only defining CMD will work, but it creates unnecessary overhead with an extra sh process.

# passes "echo", "-n", and "howdy" as arguments to the ENTRYPOINT, overriding
# the CMD instruction if it was defined
podman run busybox echo -n howdy

# overrides the ENTRYPOINT with the `echo` command and passes `-n` and `howdy` to the CMD instruction
podman run --entrypoint "echo" busybox -n howdy

While both CLI examples above have the same result, the first one will launch an extra, extraneous shell.

💡	Use ENTRYPOINT to define an executable and arguments that a container should always launch with, and use CMD for arguments that are expected to be commonly overridden by any users of the image.

💡	For complex ENTRYPOINTs, consider creating an executable shell script and copying it into the image. Define an ENTRYPOINT that calls the shell script; e.g. `ENTRYPOINT my-complex-script.sh` or ENTRYPOINT["sh", "-c", "my-complex-script.sh"].

Compare this with RUN, which defines commands that only execute during the build.

⬆ Table of Contents

Practical Lab (~20 mins)

The following is a brief lab designed as an introduction to writing and building images with Dockerfiles, and running images with exposed ports.

💡

All text with a dollar sign or hash prompt and gray background is meant to be typed by the user in a BASH compatible *nix shell. Just click through and launch the terminal. Windows users running can use the Window Subsystem for Linux, run a full update at the Bash terminal and install Podman if it isn’t already installed. Alternative, Windows users can use a Windows terminal if Docker or Podman is installed, and while the labs are not tested or designed for use in Windows terminals, they should work.

Local terminal prompt: $some_command #
Container terminal prompt: / # some_command

The results of the command will have a similar gray background format and be prefaced with the label "Output:"; e.g.

Output:

some_command output

Clone the Git repository from GitHub:
```
$ git clone https://github.com/hippyod-labs/container-lab
```
The downloaded application is a simple Spring Boot application.
Create the Dockerfile in your preferred editor or Vim.
```
$ cd container-lab
$ vim Dockerfile
```
The Dockerfile:
```
# (a)
FROM docker.io/eclipse-temurin:17-jre-ubi9-minimal

# (b)
ARG SPRING_PROFILES_ACTIVE=prod
ENV SPRING_PROFILES_ACTIVE=${SPRING_PROFILES_ACTIVE}

# (c)
USER root

# (d)
RUN <<EOF
    set -ex
    microdnf upgrade -y --nodocs --refresh
    microdnf clean all
    mkdir -p /mnt/logs
    chmod 777 /mnt/logs
EOF

# (e)
ENV JAVA_APP_DIR=/app
COPY target/*.jar ${JAVA_APP_DIR}/app.jar

# (f)
EXPOSE 8081

# (g)
USER 1001

# (h)
WORKDIR ${JAVA_APP_DIR}

# (i)
ENTRYPOINT ["java", "-jar", "app.jar"]
```
Features to pay attention to:
1. The FROM statement defines the base image.
2. Declare the build parameter SPRING_PROFILES_ACTIVE with a default value and pass it to the container environment. Declare it before the RUN statement to make sure changes to the value from the command line are acknowledged on subsequent builds.
3. The initial USER during image building is root so changes to the image can be easily made.
4. A heredoc RUN instruction upgrades all installed software in the image to the latest available, cleans the cache downloaded during the upgrade to reduce the final image size, and creates a logs directory, /mnt/logs with read-write permissions for everyone.
  
  💡
  It’s a good idea to upgrade all software in the base image and rebuild regularly to patch any vulnerabilities.
  
  💡
  Remove any intermediate files created by a RUN instruction to reduce image size.
5. The result of a Maven build in the project is a JAR file, and the Dockerfile will COPY it into the /app directory in the image.
  
  💡
  Per best practices, the JAR file is NOT copied to the root directory.
6. EXPOSE doesn’t actually do anything, but it does document for consumers of the image that the container port will be a TCP port at 8081.
7. With the image filesystem fully defined and all software installed, set the USER to an unprivileged UID.
  
  ☠
  Containers are isolated, but not secure. Running with an unprivileged UID helps protect the host system.
8. Set the working directory to /app.
9. The ENTRYPOINT runs in the preferred "exec" format, which means Java runs directly at launch without a separate shell process being created.
Save the Dockerfile and return to the terminal.

Build the image:

$ podman build --tag containerlab .

🛈	If the build fails complaining about the `Unknown instruction: "SET"`, it means you’re probably running on an older version of Podman that does not recognize heredoc. Please update your installed version to the the latest version per the Podman installation page.

Output:

STEP 1/11: FROM docker.io/eclipse-temurin:17-jre-ubi9-minimal
Trying to pull docker.io/library/eclipse-temurin:17-jre-ubi9-minimal...
Getting image source signatures
Copying blob 00038fe29d65 done   |
Copying blob 2895d6faeea8 done   |
Copying blob fac16dd16cc7 done   |
Copying blob c94b45a4f4f6 done   |
Copying blob 440448b8b996 done   |
Copying config 4f135ec10c done   |
Writing manifest to image destination
STEP 2/11: ARG SPRING_PROFILES_ACTIVE=prod
--> 37fc2072d95d

<redacted>

STEP 10/11: WORKDIR ${JAVA_APP_DIR}
--> 0bd77c959e0f
STEP 11/11: ENTRYPOINT ["java", "-jar", "app.jar"]
COMMIT containerlab
--> 9a31eac84c6c
Successfully tagged localhost/containerlab:latest
9a31eac84c6c1a08cdd216e573fe34b6db4b1dd6146b2528919f2ee6988431bf

Run a container based on the image in detached mode.
```
$ podman run --detach --publish 8081:8081 --name my_first_container containerlab
```
- --detach - run the container detached, meaning the container will run in the background.
- --publish - publish the container port (second value) to the host port
- --name - name the container. A random name will be generated by Podman if this is missing; e.g. agitated_darwin.
  
  This demonstrates how to externalize a port on the container for external consumption, and how to run a container in background on the host. More information on the above podman run options can be found here.
Output:
```
cc4b78dfd216f1291c0d584dfc11532e9f2c3b3aa2ca77eb99558fd1a1ac1c75
```

List the container to prove its running.

$ podman ps

Output:

CONTAINER ID  IMAGE                          COMMAND     CREATED         STATUS         PORTS                   NAMES
72e09ec47078  localhost/containerlab:latest              15 seconds ago  Up 15 seconds  0.0.0.0:8081->8081/tcp  my_first_container

Open a browser, and go to URL http://localhost:8081/. The web page should be showing the logs being printed with the default greeting defined in container-lab/src/main/resources/application.properties, and will update every two seconds:
```
GREETING #1 : Hello, world
GREETING #2 : Hello, world
GREETING #3 : Hello, world
GREETING #4 : Hello, world
```

Stop and destroy the container.

$ podman stop my_first_container
$ podman rm my_first_container

Output:

my_first_container

my_first_container

Rebuild the image, but this time set the default active Spring profile to Spanish.

$ podman build --build-arg SPRING_PROFILES_ACTIVE=es --tag containerlab .

Output:

<redacted>

STEP 10/11: WORKDIR ${JAVA_APP_DIR}
--> 0a979925ee8c
STEP 11/11: ENTRYPOINT ["java", "-jar", "app.jar"]
COMMIT containerlab
--> b8f578632e24
Successfully tagged localhost/containerlab:latest
b8f578632e244318754a2a94259ad813e0e5f17e97954abd78095c59ef9e09c1

Run the container for a second time, open a browser, and go to URL http://localhost:8081/.
```
$ podman run --detach --publish 8081:8081 --name my_first_container containerlab
```
Output:
```
GREETING #1 : Hola
GREETING #2 : Hola
GREETING #3 : Hola
GREETING #4 : Hola
```
This demonstrates how image configuration can be parameterized and be changed via the build.
Start a second container, but override the build configured Spring profile, open a browser, and go to URL http://localhost:8082/.

☠
Be careful to use the different port number. We have to use a separate port since the first container is already using 8081.
```
$ podman run --detach --env SPRING_PROFILES_ACTIVE=dev --publish 8082:8081 --name my_second_container containerlab
```
Output:
```
GREETING #1 : Hello Dev
GREETING #2 : Hello Dev
GREETING #3 : Hello Dev
GREETING #4 : Hello Dev
```
This demonstrates how an image can be run with one or more different configurations without having to rebuild the image. In theory we’re looking at the development deployment versus the Spanish production deployment.
Stop both containers.
```
$ podman stop my_first_container my_second_container
```
Output:
```
my_first_container
my_second_container
```
If you look in either browser window, both pages are now blank.

Cleanup the system by removing the unused containers.

$ podman rm my_first_container my_second_container

Output:

my_first_container
my_second_container

Finish cleanup of the system by removing the unused images.

$ podman rmi containerlab eclipse-temurin:17-jre-ubi9-minimal

Output:

Untagged: localhost/containerlab:latest
Untagged: docker.io/library/eclipse-temurin:17-jre-ubi9-minimal
Deleted: b8f578632e244318754a2a94259ad813e0e5f17e97954abd78095c59ef9e09c1
Deleted: 0a979925ee8c45cdd3dbbbfe804799f75450bb306beed28602b05cbdc11e703a
Deleted: fbc98b58f0860ccbfbc9b7ecf014cd23c0113041cff6526c7696513941222dda
Deleted: 8d727d8b1b001ca96f32cc61e2848f05430fcb15fdf4c778cc2b247741fb11bb
Deleted: 713e582edf7d1d8e18788f3e2d9e27e13e1007f6d7aa255b871965e58b1311ce
Deleted: 48c193f4b35eb1f003e05a2fcd9409494f1508d1d1526843afb663c4550b2c41
Deleted: 228dcf3298beb7b0caaf4e7149eafe745e7f4e0a2299d313c00d5cc1de2962c6

The above output in the final two steps confirms all lab containers and images have been removed from the local system.

This concludes the lab and the series of primers for OCI images and containers.

⬆ Table of Contents

Further reading

Dockerfile reference

Comments