TL;DR
The current semantics of --from
intrinsically induce pathological coupling between build stages. Its intimate binding to build stage implementation opposes the principle of encapsulation necessary to permit reuse, as well as reason, in isolation, about an individual stage's behavior. By defeating encapsulation, --from
thwarts applying current Dockerfile reuse features, such as ONBUILD
and inhibits the introduction of future reuse mechanisms.
To avoid the harmful traits associated to --from
, the existing Build Context abstraction should be adapted so its content can be extended by mounting a stage's image file path into it, instead of introducing the new stage/image reference concept to Dockerfile development. By extending its content and introducing a mapping mechanism to the existing Build Context abstraction, the --from
syntax can be eliminated, current reuse features restored, and the introduction of new reuse mechanisms unencumbered.
TOC
Issue: Tight, Pathological Coupling
The design of --from
ensures the COPY
instruction tightly couples itself to the implementation of other build stages. Tight coupling results from --from
โs purposely crafted facility to directly reference artifacts of other build stages, within a given Dockerfile, by stage names/positions and their physical locations (paths) in those other images.
This pathological coupling, encouraging the internals of any build stage to intimately bind themselves to any other stage within a Dockerfile, eliminates the interface boundary between stages. This absence of an interface boundary negates encapsulation prohibiting human developers and algorithms from considering an individual build stage as a โblack boxโ when defining or analyzing its behavior.
Issue expresses itself by:
- Increasing the difficulty of implementing future features that encourage Dockerfile reuse, due to the absence of encapsulation, as well as discouraging the use of existing ones (
ONBUILD
).
- Dramatically increases the amount of manual code produced by a developer because existing "boilerplate" code cannot be reused due to its direct, rigid binding to a particular artifact (file) instance.
- Simple changes, like renaming a directory containing a set of artifacts or inserting/removing a build stage, can potentially ripple through the entire set of Dockerfile commands that reference this directory or build stage.
Issue: Precludes ONBUILD Trigger Support
ONBUILD
trigger support enables a developer to declaratively encode an imageโs transform behavior: operations responsible for converting a set of input artifacts to output ones. This declarative code includes a specification of an input interface followed by command(s) that execute a transform. The input interface definition emerges from the union of source file artifact (directory/filename) references specified by the triggered ADD/COPY
Dockerfile commands and is statically defined during the construction of the ONBUILD
image while the transform consists of one or more RUN
commands.
Example
Create a golang compiler image that executes ONBUILD
commands to automatically produce a golang executable image but not run it. Define the input interface: the path to copy golang source file(s) for the compiler image's Build Context, as /golang/app
. Name the compiler image exgolang. Create the Dockerfile for this image by modifying a copy of the Docker Hub golang:1.7-onbuild image Dockerfile.
Dockerfile Contents:
FROM golang:1.7
RUN mkdir -p /go/src/app
WORKDIR /go/src/app
# Union the source argument of each COPY/ADD to determine the trigger's 'input interface'.
# Only one COPY instruction with single source argument of โ/golang/appโ. Therefore,
# this trigger's 'input interface' is '/golang/app'.
ONBUILD COPY /golang/app /go/src/app
ONBUILD RUN go-wrapper download
ONBUILD RUN go-wrapper install
To reuse the defined trigger behavior, simply encode a FROM
statement that references the image name (FROM exgolang
) configured with ONBUILD
commands. By promoting the DRY principle, ONBUILD
triggers dramatically increase an imageโs build time utility, reliability, and adaptability while simultaneously eliminating or greatly decreasing the code required to employ this image in other Dockerfiles by other developers. Given this understanding, an ONBUILD
trigger definition is remarkably akin to a function definition.
Example
Using the exgolang
image created above, generate a golang server
executable from source server.go
located in /golang/app/
.
Build Context
Dockerfile
golang/app/
server.go
Dockerfile
Docker build command:
> docker build /-t server .
The single instruction Dockerfile above when executed by docker build
:
- Copies golang source from the Build Context :
/golang/app
directory into the image directory of /golang/app
.
- Downloads any dependent golang packages.
- Runs the compiler generating the executable file
/go/bin/app
from server.go
that resides in the resultant image's file system.
As described and demonstrated by example, images incorporating ONBUILD
statements are analogous to function definitions. This similarity extends to the equivalence of an ONBUILD
image's input interface to a function's parameter list. As in the case of a function parameter list, an ONBUILD
image's body: the series of ONBUILD
statements, binds (couples) to the file paths referenced by each instruction just like statements within a function body bind to its parameters. For example, the COPY
issued by the trigger statement ONBUILD COPY /golang/app /go/src/app
binds to the source file path: /golang/app
. This file path: /golang/app
is equivalent to a parameter defined for a function and performs a similar role, as it represents an interface element. Given this equivalence, why isn't there a mapping mechanism, like the one implemented for functions, that maps arguments specified by an invocation statement to parameters?
When formulating ONBUILD
support, the design avoided implementing an argument to parameter mapping mechanism on the trigger invocation statement: FROM
. Although this mapping mechanism is intrinsic to function invocation, I speculate, at the time when trigger support was implemented, the multistage build feature was a distant, future consideration. Meanwhile, the limitation of a single stage Dockerfile masked this issue, as the Build Context could be structured to mirror the input interface required by a single stage's ONBUILD
triggers. In other words, the Build Context file path (argument) names exactly match the (parameter) names required by the ONBUILD ADD/COPY
instructions. However, introducing multistage builds starkly silhouettes the absence of an argument to parameter mapping mechanism.
Multistage support forces the once "elemental" Build Context, whose content and structure was dictated by the needs of a single FROM
, to become a composite one that must comply to the dependencies of two or more FROM
statements. Since the problems inherent to the transformation from an elemental to composite Build Context diminish not only trigger support but also affect non-trigger statements that follow a FROM
, their discussion occurs in the topic: Issue: Ignores Aggregate Build Context below. Besides this issue of composite Build Contexts, pathological coupling introduced by --from
impedes applying ONBUILD
triggers.
COPY
trigger instructions are currently bound at the time of their creation to a Build Context file path. If COPY
where to include --from
which stage name/position should it bind to, as it has to resolve the stage name within the context of all other existing and future Dockerfiles? Unfortunately, without introducing another mechanism to rebind the source file path references specified by ONBUILD COPY
instructions within the scope of its invocation, it's very difficult within a multistage Dockerfile to reuse existing triggered enabled images once, let alone twice.
Issue: Ignores Aggregate Build Context
Since the Dockerfile semantics before incorporating multistage assumed a single FROM
statement, the expected Build Context reflected only those source artifacts located in the directory structure required by ADD/COPY
commands immediately following FROM
. Incorporating many FROM
statements within a single Dockerfile requires a means to initially compose/aggregate the Build Context with the more elemental ones needed by each FROM
then partition this composite/aggregate to supply the specific (elemental) Build Context expected by an individual FROM
(stage).
Example
Using the exgolang
image created above, attempt to generate three golang server executables from an Aggregate Build Context. Note, issues related to partitioning the Aggregate Build Context are broadly applicable to any multistage Dockerfile without regard to its use of ONBUILD
.
Build Context
Dockerfile
golang/app/
server.go
golang/app2/
server.go
golang/app3/
server.go
Dockerfile
FROM exgolang
# the following stage will simply recompile golang/app/server.go instead of golang/app2/server.go
FROM exgolang
# the following stage will simply recompile golang/app/server.go instead of golang/app3/server.go
FROM exgolang
Docker build command:
/server > docker build /-t servers .
Unfortunately, the multistage build design ignores addressing Aggregate Build Context issues by failing to provide a mechanism that both partitions and restructures the Aggregate Build Context to supply the elemental Build Context needed by a specific FROM
. Therefore, executing the above docker build
command copies the same golang source /server/golang/app/server.go
into three distinct images, runs the compiler and generates the same server
executable writing it to each image's /go/bin
directory.
Additionally, when incorporating stages referencing ONBUILD
triggers, current multistage Dockerfile support not only inhibits their use but when "it works" the outcome can be dangerous, especially when the trigger assumes a Build Context interface of "." (everything interface) as in COPY . /go/src
. In this situation, the entire Aggregate Build Context would be accessible to any stage, thereby, polluting an individual stage's source artifact set with artifacts from all other stages.
Issue: Complexity due to added Dockerfile abstractions
Any worthwhile program must apply coupling to map its abstractions to an implementation. However, it's important to minimize coupling whenever possible. One method to reduce coupling relies on limiting the abstractions required to only the essential ones applicable to realize the encoded algorithm's objective.
The purpose of a Dockerfile is to provide the scaffolding needed to deliver source artifact(s) to a transform that then produces output artifact(s). Since the transforms, executed by the RUN
command, rely on reading and writing to files within a file system, the source artifacts must be eventually mapped as files within a file system. Perhaps due to a desire to align with this necessity, the Build Context abstraction responsible for providing source artifacts was also designed to represent source artifacts as files within a file system. This design choice, matching the representation of the Build Context with the one required by the underlying transforms (files in a file system), resulted in Dockerfile commands, like COPY
, whose syntax and behavior nearly mirrors that of a corresponding OS command, such as cp
, and facilitated Dockerfile adoption by leveraging a developer's existing understanding of it.
The introduction of COPY --from
adds a new abstraction: stage/image reference, to Dockerfile coding. This addition abstraction necessitated changing COPY
's interface and weaving the resolution of stage/image references into its implementation so COPY
's binding mechanisms could differentiate between Build Context and other stage/image sources. Besides adding some complexity to applying COPY
, introducing the stage/image reference abstraction imposes implications for features that rely on COPY
's behavior. When assessing these implications one hopes for beneficial or neutral outcomes regarding their effect. However in this situation, the rigid binding of --from
to a particular stage/image precludes the use of COPY --from
in any current reuse mechanism, such as ONBUILD
, or future one. This negative outcome not only prevents reuse mechanisms, like ONBUILD
, from referencing other stages/images but also diminishes the utility of --from
, as it can't be applied in all valid contexts of the COPY
instruction.
An often sighted strength of Unix derivative OSes is their insistence on mapping various abstractions, like hard drives, IPC, ... to a file. Therefore, instead of adding complexity by creating a corresponding concrete OS concept for each supported device/abstraction, which in many cases would only offer a slightly different interface, Unix designers mapped new abstractions (especially devices) to a single one - the file. Once mapped, the majority of the code written to manage/manipulate this single abstraction (file) immediately applies to the new one. Since image/stage references are essentially file path references, perhaps, in lieu of explicitly exposing --from
's stage/image reference abstraction, it should be mapped to an existing abstraction: the Build Context.
Recasting the stage/image references as file paths in the Build Context confers the following benefits:
- Reduces complexity by eliminating the explicit stage/image reference abstraction and the
--from
option. COPY
reverts to its prior, simpler syntax.
- Limits artifact coupling to only Build Context file paths which existed before multi-stage support.
- Existing or future mechanisms that apply to a Build Context, within a Dockerfile, like partitioning, renaming, and restructuring also immediately apply to artifacts contributed by other stages within a Dockerfile without writing additional code.
Issue: Extra Build Stage & Redundant COPYing
If the objective of a multistage build is the creation of a single layer representing a runtime image, the current semantics of COPY --from
requires an extra build stage and redundant COPYing when the resultant build artifacts must be assembled from more than one build stage or image.
##### Example
Applying the current semantics of COPY --from
, create a golang webserver whose stdout and stderr is redirected to a remote logging facility as a single layer in the resulting image.
```
FROM golang:nanoserver as webserver
COPY /web /code
WORKDIR /code
RUN go build webserver.go
FROM golang:nanoserver as remotelogger
COPY /remotelogger /code
WORKDIR /code
RUN go build remotelogger.go
# extra build stage and physical coping due to semantics of COPY --from in order
# to generate single layer in next build stage
FROM scratch as extra_redundant_copying
COPY --from=webserver /code/webserver.exe /redundant/webserver.exe
COPY --from=remotelogger /code/webserver.exe /redundant/remogelogger.exe
COPY /script/pipem.ps1 /redundant
FROM microsoft/nanoserver as extra_redundant_copying
COPY --from=extra_redundant_copying /redundant /
CMD ["\pipem.ps1"]
EXPOSE 8080
```
The above situation generalizes to N extra build stages and X redundant copy operations when there's a desire to create a resultant image of N layers where each layer requires artifacts from more than a single stage.
Recommendations:
- Eliminate direct coupling to artifacts within images from other build stages by removing
--from
as an option to COPY
.
- Support a mapping mechanism that partitions, restructures, and renames file paths defined in the Aggregate (Global) Build Context so the resulting mapped version matches the (Local) Build Context required by an individual stage. A mapping mechanism satisfying these qualities has already been proposed and explored by #12072. In a nutshell, the mechanism, implemented by the keyword
CONTEXT
, mounts the desired Aggregate Build Context file paths, similar to docker run -v
option, into the Build Context created for an individual stage.
- Support a mechanism to allow a build stage to extend the Aggregate Build Context with the output artifacts produced by that stage. Proposal #12415 offers a solution
MOUNT
that's analogous to CONTEXT
. However, MOUNT
mounts an image's file path into the Aggregate Build context instead of mounting it into the stage's local Build Context.
Applying the recommendations above, when compared to the currently implement multistage design:
- Promote encoding Dockerfiles with current and future reusable build mechanisms.
- Seamlessly integrate with existing Dockerfile abstractions, such as Build Context and
ONBUILD
triggers.
- Dramatically reduce the Dockerfile code required to reuse an image when building a new one.
- Eliminate the necessity of encoding extra build stages and the overhead of redundant copying.
- Foster innately declarative mechanisms of
CONTEXT
and MOUNT
proposed by the links referenced above.
Comparison: Current Multistage Design vs. Recommended
The examples below concretely contrast, through the encoding of the same scenario, the benefits offered by the recommended approached when compared to the existing multistage design.
Scenario
Using already available Docker Hub images, construct a container composed of three independent golang executables. One executable implements a webserver, another a logging device that relays messages to a remote server, while the third reports on the webserver's health.
Initial Build Context
The initial Build Context common to both examples.
Build Context (initial aggregate/global context)
Dockerfile
script.sh
go/src/webserver/
server.go
go/src/logger/
server.go
go/src/health/
server.go
Example: Current Multistage Design
FROM golang:1.7 AS webserver
COPY /go/src/webserver /go/src/webserver
WORKDIR /go/src/webserver
RUN go-wrapper download \
&& export GOBIN=/go/bin \
&& go-wrapper install server.go
FROM golang:1.7 AS logger
COPY /go/src/logger /go/src/logger
WORKDIR /go/src/logger
RUN go-wrapper download \
&& export GOBIN=/go/bin \
&& go-wrapper install server.go
FROM golang:1.7 AS health
COPY /go/src/health /go/src/health
WORKDIR /go/src/health
RUN go-wrapper download \
&& export GOBIN=/go/bin \
&& go-wrapper install server.go
FROM scratch AS requiredExtra
COPY --from webserver /bin/server /final/bin/webserver
COPY --from logger /bin/server /final/bin/logger
COPY --from health /bin/server /final/bin/health
COPY /script.sh /start.sh
FROM alpine
COPY --from requiredExtra /final /start.sh /
ENTRYPOINT /start.sh
EXPOSE 8080
Example: Recommended Multistage Design
FROM golang:1.7-onbuid CONTEXT /go/src/webserver/:/ MOUNT /go/bin/app:/final/bin/webserver moby/moby#1
FROM golang:1.7-onbuid CONTEXT /go/src/logger/:/ MOUNT /go/bin/app:/final/bin/logger moby/moby#2
FROM golang:1.7-onbuid CONTEXT /go/src/health/:/ MOUNT /go/bin/app:/final/bin/health moby/moby#3
FROM alpine CONTEXT /final/bin:/bin /script.sh:/start.sh moby/moby#4
COPY . / moby/moby#5
ENTRYPOINT /start.sh
EXPOSE 8080
Differences
Recommended Multistage Design when compared to Current Multistage Design:
- Encourages more declarative solutions by:
- leveraging reuse features, such as
ONBUILD
, that minimize developer produced code and
- declares external data dependencies via
CONTEXT & MOUNT
separately from the Dockerfile operations like COPY
.
- Seamlessly leverages current
ONBUILD
images.
- Eliminates harmful coupling by replacing direct, rigid physical stage/image references with Build Context file paths that can be rebound, through a standard mapping mechanism, when running the Dockerfile.
- Addresses issue of partitioning, structuring, and renaming Aggregate Build Context artifacts using a syntax and behavior similar to
docker run -v
.
- Eliminates complexity of
--from
and stage/image reference support by replacing both with a mapping mechanism that encourages encapsulation.
- Eliminates encoding extra build stage(s) and redundant copying.
- Clearly delineates the input and output artifacts aiding developer comprehension.
- Simplifies DAG analysis, as only
FROM
instructions need be parsed to reveal the data dependencies between stages.
Example: Recommended Multistage Design: Explained
- CONTEXT partitions the initial Aggregate Build Context to present the Local Build Context required by the
FROM
. For this stage, the webserver's golang source named server.go
is the only file that appears in the "root" dir of the Local Build Context. Once this stage finishes, MOUNT
associates the file /go/bin/app
located in the last container created by this stage to the Aggregate Build Context as /final/bin/webserver
.
Local Build Context
Aggregate Build Context
Dockerfile
script.sh
go/src/webserver/
server.go
go/src/logger/
server.go
go/src/health/
server.go
final/bin/
webserver
- CONTEXT partitions the initial Aggregate Build Context to present the Local Build Context required by the
FROM
image. For this stage, the logger's golang source named server.go
is the only file that appears in the "root" dir of the Local Build Context. Once this stage finishes, MOUNT
associates the file /go/bin/app
located in the last container created by this stage to the Aggregate Build Context as /final/bin/logger
.
Local Build Context
Aggregate Build Context
Dockerfile
script.sh
go/src/webserver/
server.go
go/src/logger/
server.go
go/src/health/
server.go
final/bin/
webserver
logger
- CONTEXT partitions the initial Aggregate Build Context to present the Local Build Context required by the
FROM
image. For this stage, the health's golang source named server.go
is the only file that appears in the "root" dir of the Local Build Context. Once this stage finishes, MOUNT
associates the file /go/bin/app
located in the last container created by this stage to the Aggregate Build Context as /final/bin/health
.
Local Build Context
Aggregate Build Context
Dockerfile
script.sh
go/src/webserver/
server.go
go/src/logger/
server.go
go/src/health/
server.go
final/bin/
webserver
logger
health
- CONTEXT partitions the Aggregate Build Context extended by stages 1-3 by isolating the contents of
/final/bin/
directory and projecting (renaming) it as /bin/
. Additionally the shell script script.sh
is renamed to start.sh
.
Local Build Context
start.sh
bin/
webserver
logger
health
- Create a single layer by
COPY
ing the Local Build Context, into the root directory of alpine
.