Writing and Using Dockerfiles
The App Devs Primer on Images and Containers, Part III
Comment •Overview
Dockerfiles are scripts that define your images and how they are built. They can define a base image, user ID, persistent volume, what software to install in the image, and more.
🛈
|
While the OCI specification actually denotes Containerfiles to be the default, Docker isn’t fully OCI compliant as of the writing of this primer. Because most publicly available projects still use Dockerfiles, for convenience and historical purposes these primers will only refer to Dockerfiles. |
🛈
|
Because we feel that Podman is the better tool technologically, this primer will only reference Podman; however, because Docker and Podman are conceptually and semantically equivalent, wherever podman is mentioned in this document or the code, docker can be safely substituted with no change in meaning or outcome.
|
Prerequisites
This primer assumes the reader is familiar with the basic concepts behind OCI images and containers and Podman/Docker CLI commands. If not, we suggest reading Part I of this series to cover the basic concepts behind OCI images and containers and/or Part II that covers the Podman/Docker CLI.
The lab at the end utilizes a simple Spring Boot application for testing. Since the purpose of the lab is to learn about Dockerfiles and not Java, the lab comes pre-built and installing Java is not necessary.
General format and information
Dockerfiles have the following format:
# This is a comment <INSTRUCTION> [arguments]
Leading whitespace is ignored, but whitespace in instruction arguments is not. By convention, Dockerfile instructions are written in ALL CAPS, but the language is case insensitive.
File naming convention
By convention, Dockerfiles are named "Dockerfile" without an extension.
podman build .
The build command above will automatically look for a file named "Dockerfile" in the current directory. To deviate from convention, Podman has the flag -f
or --file
, which can take a path or a URL as an argument; e.g.
podman build -f Dockerfile.my_ext
JSON vs freeform text
Instructions have two general forms:
-
JSON array
Executable INSTRUCTIONs refer to this as "exec" style.
<INSTRUCTION> <OPTIONS> ["value1", ..., "valueN"]
⚠You must use double-quotes for values. -
Free form text
Executable INSTRUCTIONs refer to this as "shell" style.
<INSTRUCTION> <OPTIONS> value1 ... valueN
Squashing image layers
Every instruction in a Dockerfile adds a layer to an image, although only those that actually modify the image filesystem increase the image size. Layers are cached by Podman for faster builds and downloads.
Podman has the --squash
flag when building images to compress all new layers into a single layer, and it’s turned on by default, but it is still considered good practice to use as few instructions in your Dockerfile as possible; e.g. looking at an example RUN statement that will upgrade everything in the base image, install wget, and then clean the dnf
cache to reduce the final image size:
# bad practice, creates three new layers RUN dnf upgrade -y --no-docs --refresh RUN dnf install -y --no-docs wget RUN dnf clean all
# good practice, creates a single layer RUN dnf upgrade -y --no-docs --refresh && \ dnf install -y --no-docs wget && \ dnf clean all
docker build --squash
will also work, but it’s considered experimental as of this writing, which means the Docker daemon will need to be started with the --experimental
flag to use it. Docker’s documentation also offers a short discussion on some of the limitations of squashing an image build, but since the pros almost always outweigh the cons, it’s best to leave the option on.
To compress everything including the base image layers into a single layer Podman also offers a --squash-all
flag, and the resulting image of a build will be a single layer. This functionality is not currently available with Docker.
Heredoc
Most recently both Podman and Docker have begun to support heredoc in Dockerfiles. Revisiting the RUN example above:
RUN <<EOF set -e # exit immediately if commands below fail dnf upgrade -y --no-docs --refresh dnf install -y --no-docs wget dnf clean all EOF
COPY and ADD commands also support heredoc. Most Dockerfiles you’ll review on the internet will look like the previous example and not use heredoc, because heredoc was introduced only a few years ago for Docker, and only late in 2023 for Podman.
💡
|
When possible, use heredoc as a cleaner and more readable option to reduce image size. |
☠
|
There were reports that the layer cache wasn’t being invalidated when the heredoc was changed. It should be fixed in the latest releases, but in case you run into this issue, just change the heredoc name or use the older notation. |
Common Dockerfile Instructions
Around half of the available instructions available in a Dockerfile are summarized below. They are the instructions that are most commonly used, and the ones we’ll use in the lab below. They are roughly presented in the order they’d be used in a typical Dockerfile rather than alphabetically, because the purpose of this primer is for learning how to write and use Dockerfiles and not to be a reference document.
💡
|
For convenience, each header below links to its instruction’s official Dockerfile reference documentation. |
FROM
FROM <image>[:<tag>]
Defines the base image of the build. Usually the first line of every Dockerfile. If no tag is given, latest is assumed. FROM can be parameterized with an ARG statement, but otherwise FROM must be the first statement in a Dockerfile.
# the following two statements are equivalent FROM busybox FROM busybox:latest
# Use the alpine-based Python image FROM python:alpine
# A parameterized FROM statement ARG VERSION_PARAMETER FROM python:${VERSION_PARAMETER}
ARG
ARG <name>[=<default value>]
Defines a build variable. Variables can be assigned a default value or left empty to act as parameters to be passed through the command line when starting a build. These key/values pairs are only available during the build, and will not be part of the image when the build is completed. Argument scope is from the line in which it is declared.
☠
|
Arguments must be declared in the Dockerfile. References to arguments declared on the command line but not in the Dockerfile will be empty. |
# defines a build parameter optionally set on the command line # that can be used when the build is executing ARG MY_COMMAND_LINE_PARAMETER
# defines a variable value that can be used when the build is executing # defining this variable on the command line will override it ARG MY_DOCKERFILE_VARIABLE=someValue
Referencing a argument in a Dockerfile has the same form as Shell, i.e. ${<arg name>}
. If you need to pass an argument to running container, assign it to an environment variable using an ENV statement.
# Referencing the ARG later in Dockerfile and passing it to the container ENV MY_FORMER_COMMAND_LINE_PARAMETER=${MY_COMMAND_LINE_PARAMETER} <<RUN>> echo ${MY_COMMAND_LINE_PARAMETER}
Using the CLI to pass in argument values
Use the --build-arg
or --build-arg-file
options to pass arguments to the Dockerfile:
# Assign N number of arguments on the command line podman --build-arg MY_ARG1=my_value1 ... --build-arg MY_ARGn=my_valueN
Or using a properties/env file of arg=value lines:
# read in a list of arguments and their values from a file podman --build-arg-file argfile.conf podman --build-arg-file argfile.env podman --build-arg-file argfile.properties
CAUTION
⚠
|
Do NOT use ARGs to pass in build secrets. |
The values can be exposed through podman history
. If accessing build secrets is required, more information is available here.
⚠
|
Always declare your ARG statements BEFORE RUN statements. |
Declare ARGs towards the top of the file and before any RUN statements. ARG statements are not cached in the image or intermediate layers; e.g. if a build parameter is changed on the command line podman build
will not notice a change to a cached image layer until the ARG is realized in an instruction that is persisted to the image. Because RUN statements intrinsically realize all ARG values as part of their environment variables and affect the final image, and changes to ARGs declared before the RUN statement will be invalidate the layer cache.
ENV
ENV <name>=<value> ...
# the equals operator is optional ENV <name> <value> ...
Defines an environmental variable and value.
# defines the locale of the host in a Linux-based container ENV LANG=en_US.UTF-8
# define mutliple key/value pairs; use backslashes if values contain spaces ENV FOO=BAR BAZ=value\ with\ spaces
ENV values are available both during the build and in the running container.
ENV MY_VAR=my_value # print MY_VAR during the build RUN echo ${MY_VAR} #print MY_VAR when the container is run ENTRYPOINT ["echo", "${MY_VAR}"]
Environment variables ALWAYS override argument values of the same name.
# The output will be "baz" ARG FOO=bar ENV FOO=baz ARG FOO=something_else RUN echo ${FOO}
Environment variables also support a limited number of bash parameter expansion modifiers.
# if MY_ENV_VARIABLE is not defined, use the value some_default_value ${MY_ENV_VARIABLE:-some_default_value}
USER
USER <username>
Set the current USER to use in the image for subsequent Dockerfile instructions. The last USER instruction in the Dockerfile will be the UID used to launch a container based on the image.
# root user USER root
# user with UID 1001: best practices USER 1001
Root is typically declared towards the top of the Dockerfile to carry out tasks such as installing software or changing file permissions for the image.
Container security
It goes without saying that good security practices should always be followed when writing code, and Dockerfiles are no exception. Applications running in containers may look and feel like they’re running in isolation, but this is not guaranteed.
⚠
|
The OCI runtime container environment is isolated, but is NOT secure. Unless otherwise needed, best practice is to declare a unprivileged USER UID before defining a ENTRYPOINT; e.g. USER 1001 .
|
COPY and ADD
ADD <src> ... <dest> ADD ["<src>", ... "<dest>"]
COPY <src> ... <dest> COPY ["<src>", ... "<dest>"]
The instructions are defined as follows:
They each takes a list of paths, moving a list of n-1 files or directories to the last listed file or directory. <src>'s must be within the build context.
# COPY or ADD appConfig.yaml from the target directory to /app/config.yaml in the image ADD ["target/appConfig.yaml", "/app/config.yaml"] COPY target/appConfig.yaml /app/config.yaml
# COPY or ADD the file file1.txt and directory dir1 to the app directory in the image ADD file1.txt dir1 /app COPY ["file1.txt", "dir1", "/app"]
Differences between COPY and ADD
COPY can only copy local files into images, but ADD as some additional capabilities:
-
ADD can copy remote files from URLS
-
ADD can pull Git repositories
-
ADD will automatically unpack compressed tar file
Using ADD can have unpredictable side effects. RUN statements can use git pull
, curl
, or wget
for fetching remote files, and zip
, bzip2
, or gzip
for decompression.
☠
|
Use a combination of COPY and RUN and avoid using ADD. |
RUN
RUN <command> ... RUN ["<command", ...]
Run defines commands that are run during the build. It can be used to install software, create users, set file permissions, etc.
# Install wget on an RPM-based Linux image RUN dnf install -y --no-docs wget
Compare this with ENTRYPOINT and CMD, which defines commands and arguments that only execute when the container is launched.
EXPOSE
EXPOSE is functionally a no-op, but it’s widely used to document which ports, if any, are expected to be used the by the container. It can document whether they are UDP or TCP, with TCP being the default if not used.
# TCP ports EXPOSE 8080 EXPOSE 8080/tcp
# UDP port EXPOSE 8080/udp
WORKDIR
WORKDIR <path>
Set the container’s working directory. Like other instructions, WORKDIR can be used multiple times in a Dockerfile. The initial WORKDIR is whatever was the last value defined in the base image. If no WORKDIR has ever been set, it’s the root directory, /
. Relative paths in a Dockerfile are relative to the current working directory.
WORKDIR /app WORKDIR resources RUN pwd
Output of the above with be /app/resources
.
ENTRYPOINT and CMD
-
"exec" form:
CMD ["<command>", "<param1>", ...] CMD ["<param1>, ..."]
ENTRYPOINT ["<command>", "param1", ...]
-
"shell" form:
CMD <command> <param1> ... CMD <param1> ...
ENTRYPOINT <command>, param1 ...
ENTRYPOINT and CMD define the commands and parameters that run and are used when the container is launched.
These instructions are defined as follows:
-
ENTRYPOINT - defines the executable to start when your container launches. Defaults to
/bin/sh -c
.The "shell" form of ENTRYPOINT will always launch a shell before acting on the arguments. The "exec" (JSON array) form must be used if a shell isn’t wanted or needed.
-
CMD - The list of arguments passed to the
ENTRYPOINT
. Can easily be overridden on the command line.
Since ENTRYPOINT defaults to launching a shell and taking a script as an argument, only defining CMD will work, but it creates unnecessary overhead with an extra sh
process.
# passes "echo", "-n", and "howdy" as arguments to the ENTRYPOINT, overriding # the CMD instruction if it was defined podman run busybox echo -n howdy
# overrides the ENTRYPOINT with the `echo` command and passes `-n` and `howdy` to the CMD instruction podman run --entrypoint "echo" busybox -n howdy
While both CLI examples above have the same result, the first one will launch an extra, extraneous shell.
💡
|
Use ENTRYPOINT to define an executable and arguments that a container should always launch with, and use CMD for arguments that are expected to be commonly overridden by any users of the image. |
💡
|
For complex ENTRYPOINTs, consider creating an executable shell script and copying it into the image. Define an ENTRYPOINT that calls the shell script; e.g. ENTRYPOINT my-complex-script.sh or ENTRYPOINT["sh", "-c", "my-complex-script.sh"].
|
Compare this with RUN, which defines commands that only execute during the build.
Practical Lab (~20 mins)
The following is a brief lab designed as an introduction to writing and building images with Dockerfiles, and running images with exposed ports.
💡
|
All text with a dollar sign or hash prompt and gray background is meant to be typed by the user in a BASH compatible *nix shell. Just click through and launch the terminal. Windows users running can use the Window Subsystem for Linux, run a full update at the Bash terminal and install Podman if it isn’t already installed. Alternative, Windows users can use a Windows terminal if Docker or Podman is installed, and while the labs are not tested or designed for use in Windows terminals, they should work.
Local terminal prompt: $some_command #
Container terminal prompt: / # some_command
The results of the command will have a similar gray background format and be prefaced with the label "Output:"; e.g. Output: some_command output
|
-
Clone the Git repository from GitHub:
$ git clone https://github.com/hippyod-labs/container-lab
The downloaded application is a simple Spring Boot application.
-
Create the Dockerfile in your preferred editor or Vim.
$ cd container-lab $ vim Dockerfile
The Dockerfile:
# (a) FROM docker.io/eclipse-temurin:17-jre-ubi9-minimal # (b) ARG SPRING_PROFILES_ACTIVE=prod ENV SPRING_PROFILES_ACTIVE=${SPRING_PROFILES_ACTIVE} # (c) USER root # (d) RUN <<EOF set -ex microdnf upgrade -y --nodocs --refresh microdnf clean all mkdir -p /mnt/logs chmod 777 /mnt/logs EOF # (e) ENV JAVA_APP_DIR=/app COPY target/*.jar ${JAVA_APP_DIR}/app.jar # (f) EXPOSE 8081 # (g) USER 1001 # (h) WORKDIR ${JAVA_APP_DIR} # (i) ENTRYPOINT ["java", "-jar", "app.jar"]
Features to pay attention to:
-
The
FROM
statement defines the base image. -
Declare the build parameter
SPRING_PROFILES_ACTIVE
with a default value and pass it to the container environment. Declare it before the RUN statement to make sure changes to the value from the command line are acknowledged on subsequent builds. -
The initial
USER
during image building isroot
so changes to the image can be easily made. -
A heredoc
RUN
instruction upgrades all installed software in the image to the latest available, cleans the cache downloaded during the upgrade to reduce the final image size, and creates a logs directory,/mnt/logs
with read-write permissions for everyone.💡It’s a good idea to upgrade all software in the base image and rebuild regularly to patch any vulnerabilities. 💡Remove any intermediate files created by a RUN instruction to reduce image size. -
The result of a Maven build in the project is a JAR file, and the Dockerfile will COPY it into the
/app
directory in the image.💡Per best practices, the JAR file is NOT copied to the root directory. -
EXPOSE doesn’t actually do anything, but it does document for consumers of the image that the container port will be a TCP port at 8081.
-
With the image filesystem fully defined and all software installed, set the
USER
to an unprivileged UID.☠Containers are isolated, but not secure. Running with an unprivileged UID helps protect the host system. -
Set the working directory to
/app
. -
The ENTRYPOINT runs in the preferred "exec" format, which means Java runs directly at launch without a separate shell process being created.
Save the Dockerfile and return to the terminal.
-
-
Build the image:
$ podman build --tag containerlab .
🛈If the build fails complaining about the Unknown instruction: "SET"
, it means you’re probably running on an older version of Podman that does not recognize heredoc. Please update your installed version to the the latest version per the Podman installation page.Output:
STEP 1/11: FROM docker.io/eclipse-temurin:17-jre-ubi9-minimal Trying to pull docker.io/library/eclipse-temurin:17-jre-ubi9-minimal... Getting image source signatures Copying blob 00038fe29d65 done | Copying blob 2895d6faeea8 done | Copying blob fac16dd16cc7 done | Copying blob c94b45a4f4f6 done | Copying blob 440448b8b996 done | Copying config 4f135ec10c done | Writing manifest to image destination STEP 2/11: ARG SPRING_PROFILES_ACTIVE=prod --> 37fc2072d95d <redacted> STEP 10/11: WORKDIR ${JAVA_APP_DIR} --> 0bd77c959e0f STEP 11/11: ENTRYPOINT ["java", "-jar", "app.jar"] COMMIT containerlab --> 9a31eac84c6c Successfully tagged localhost/containerlab:latest 9a31eac84c6c1a08cdd216e573fe34b6db4b1dd6146b2528919f2ee6988431bf
-
Run a container based on the image in detached mode.
$ podman run --detach --publish 8081:8081 --name my_first_container containerlab
-
--detach
- run the container detached, meaning the container will run in the background. -
--publish
- publish the container port (second value) to the host port -
--name
- name the container. A random name will be generated by Podman if this is missing; e.g.agitated_darwin
.This demonstrates how to externalize a port on the container for external consumption, and how to run a container in background on the host. More information on the above
podman run
options can be found here.
Output:
cc4b78dfd216f1291c0d584dfc11532e9f2c3b3aa2ca77eb99558fd1a1ac1c75
-
-
List the container to prove its running.
$ podman ps
Output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 72e09ec47078 localhost/containerlab:latest 15 seconds ago Up 15 seconds 0.0.0.0:8081->8081/tcp my_first_container
-
Open a browser, and go to URL
http://localhost:8081/
. The web page should be showing the logs being printed with the default greeting defined incontainer-lab/src/main/resources/application.properties
, and will update every two seconds:GREETING #1 : Hello, world GREETING #2 : Hello, world GREETING #3 : Hello, world GREETING #4 : Hello, world
-
Stop and destroy the container.
$ podman stop my_first_container $ podman rm my_first_container
Output:
my_first_container
my_first_container
-
Rebuild the image, but this time set the default active Spring profile to Spanish.
$ podman build --build-arg SPRING_PROFILES_ACTIVE=es --tag containerlab .
Output:
<redacted> STEP 10/11: WORKDIR ${JAVA_APP_DIR} --> 0a979925ee8c STEP 11/11: ENTRYPOINT ["java", "-jar", "app.jar"] COMMIT containerlab --> b8f578632e24 Successfully tagged localhost/containerlab:latest b8f578632e244318754a2a94259ad813e0e5f17e97954abd78095c59ef9e09c1
-
Run the container for a second time, open a browser, and go to URL
http://localhost:8081/
.$ podman run --detach --publish 8081:8081 --name my_first_container containerlab
Output:
GREETING #1 : Hola GREETING #2 : Hola GREETING #3 : Hola GREETING #4 : Hola
This demonstrates how image configuration can be parameterized and be changed via the build.
-
Start a second container, but override the build configured Spring profile, open a browser, and go to URL
http://localhost:8082/
.☠Be careful to use the different port number. We have to use a separate port since the first container is already using 8081. $ podman run --detach --env SPRING_PROFILES_ACTIVE=dev --publish 8082:8081 --name my_second_container containerlab
Output:
GREETING #1 : Hello Dev GREETING #2 : Hello Dev GREETING #3 : Hello Dev GREETING #4 : Hello Dev
This demonstrates how an image can be run with one or more different configurations without having to rebuild the image. In theory we’re looking at the development deployment versus the Spanish production deployment.
-
Stop both containers.
$ podman stop my_first_container my_second_container
Output:
my_first_container my_second_container
If you look in either browser window, both pages are now blank.
-
Cleanup the system by removing the unused containers.
$ podman rm my_first_container my_second_container
Output:
my_first_container my_second_container
-
Finish cleanup of the system by removing the unused images.
$ podman rmi containerlab eclipse-temurin:17-jre-ubi9-minimal
Output:
Untagged: localhost/containerlab:latest Untagged: docker.io/library/eclipse-temurin:17-jre-ubi9-minimal Deleted: b8f578632e244318754a2a94259ad813e0e5f17e97954abd78095c59ef9e09c1 Deleted: 0a979925ee8c45cdd3dbbbfe804799f75450bb306beed28602b05cbdc11e703a Deleted: fbc98b58f0860ccbfbc9b7ecf014cd23c0113041cff6526c7696513941222dda Deleted: 8d727d8b1b001ca96f32cc61e2848f05430fcb15fdf4c778cc2b247741fb11bb Deleted: 713e582edf7d1d8e18788f3e2d9e27e13e1007f6d7aa255b871965e58b1311ce Deleted: 48c193f4b35eb1f003e05a2fcd9409494f1508d1d1526843afb663c4550b2c41 Deleted: 228dcf3298beb7b0caaf4e7149eafe745e7f4e0a2299d313c00d5cc1de2962c6
The above output in the final two steps confirms all lab containers and images have been removed from the local system.
This concludes the lab and the series of primers for OCI images and containers.