Giter Club home page Giter Club logo

Comments (13)

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024 1

@macropin

Thank you for your kind complement of my critique. Although I'm not quite sure what you expect to experience by subscribing to this thread.

I do appreciate that the core maintainers/developers where polite enough to respond to my arguments given the competition for their time to both respond to the other community posts and their driving desire to improve Docker through actually writing code. However, it's evident to me that even if they suspected the validity of some of the technical arguments presented above, that they believe the already encoded mult-stage mechanisms address the concerns well enough for the common use cases.

from buildkit.

dnephin avatar dnephin commented on August 24, 2024

Issue: Tight, Pathological Coupling

I think moby/moby#32100 would fix this

Issue: Extra Build Stage & Redundant COPYing

Would be fixed by moby/moby#32507 and moby/moby#32904 . Copy and many other metadata operations can be implemented without creating any layers.

Comparison: Current Multistage Design vs. Recommended

This seems like a really specific use case, and I don't think this reflects the general problem that is solved by multi stage builds.

I would personally put those 4 into 4 separate Dockerfiles. They are building different applications, not a single one. You could use docker-compose build to build them all at once.

from buildkit.

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024

I think moby/moby#32100 would fix this

I look into this.

Would be fixed by moby/moby#32507 and moby/moby#32904 . Copy and many other metadata operations can be implemented without creating any layers.

As far as I can tell from exploring --from using docker hub image 17.05.0-ce only a single stage/image can be referenced by COPY --from, as --from cannot be specified more than once for a given COPY. I imagine this constraint also applies to RUN's --mount option, as it's difficult to discern from reading moby/moby#32507. Therefore, attempts to coalesce files sourced from two different stages/images into a single layer cannot be accomplished with either a single RUN or COPY instruction. Also, once COPY --from completes, it creates a new layer, therefore, please explain how : "Copy and many other metadata operations can be implemented without creating any layers."?

Notice the creation of a new layer for each COPY --from executed below:

Step 1/10 : FROM scratch as sp1
 ---> 
Step 2/10 : COPY afile /test/
 ---> 762a6ad71240
Removing intermediate container d14eb99a8af9
Step 3/10 : FROM scratch as sp2
 ---> 
Step 4/10 : COPY bfile /test/
 ---> f1971116fd80
Removing intermediate container 51a88fa672ce
Step 5/10 : FROM scratch as sp3
 ---> 
Step 6/10 : COPY cfile /test/
 ---> 1ee3cebffbff
Removing intermediate container 9894fe8756ef
Step 7/10 : FROM alpine
latest: Pulling from library/alpine
cfc728c1c558: Pull complete 
Digest: sha256:c0537ff6a5218ef531ece93d4984efc99bbf3f7497c0a7726c88e2bb7584dc96
Status: Downloaded newer image for alpine:latest
 ---> 02674b9cb179
Step 8/10 : COPY --from=sp1 /test /test
 ---> f12c4787c339
Removing intermediate container 3c0b5995b66f
Step 9/10 : COPY --from=sp2 /test /test
 ---> 6e620387b951
Removing intermediate container 9ac28d25abc4
Step 10/10 : COPY --from=sp3 /test /test
 ---> 0b84156ccf30

This seems like a really specific use case, and I don't think this reflects the general problem that is solved by multi stage builds.

I disagree. One of the primary objectives of Multistage build is the separation of build time concerns from the run time image. Essentially, the example manufactures three different artifacts needed by the run time image using three different stages that focus exclusively on providing the environments needed to construct each artifact. Once finished, the last stage transfers the artifacts (golang executables) from their no longer necessary build environments, combining them to create the final run time image.

At a minimum, at least 2 stages are required when a run time artifact must be built, instead of simply copied from the Build Context. In this situation, the first stage is polluted by the build environment needed to construct the run time artifact while the second stage extracts the constructed run time artifact from its build environment by transferring it into the run time image. Therefore, it's not unreasonable to expect scenarios where more than one run time artifact must be built to satisfy the run time image requirements.

Finally, one could easily create another example involving the building of two dynamic C++ libraries with the third stage creating an executable that depends on them . This can be accomplished by furnishing the appropriate source code and substituting the ONBUILD golang images with corresponding ONBUILD C++ ones.

Any feedback regarding the Issue: Ignores Aggregate Build Context?

from buildkit.

dnephin avatar dnephin commented on August 24, 2024

Therefore, attempts to coalesce files sourced from two different stages/images into a single layer cannot be accomplished with either a single RUN or COPY instruction

Why are intermediate layers a problem? Layers from a previous stage are not in the final stage, so it shouldn't matter how many layers you have in intermediate stages. You can grab them all as a single layer in the final stage.

Also moby/moby#32904 will allow for COPY to work without creating any image or container. So effectively no layers, or at least none of the problems caused by extra layers

it's not unreasonable to expect scenarios where more than one run time artifact must be built to satisfy the run time image requirements.

This does seem reasonable, and I believe that works fine, as you demonstrage in your example.

Any feedback regarding the Issue: Ignores Aggregate Build Context?

I don't really see the issue. You can do something like this to append to merge contexts:

FROM alpine as base
COPY . .

FROM base as app
# this stage now has everything from the original context

You can filter by starting from a fresh base and using COPY --from=base instead of FROM base.

Also EXPORT from moby/moby#32100 would make this a little more declarative.

from buildkit.

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024

Why are intermediate layers a problem? Layers from a previous stage are not in the final stage, so it shouldn't matter how many layers you have in intermediate stages. You can grab them all as a single layer in the final stage.

Agreed, there shouldn't be a problem. However, I currently don't know how to "grab them all as a single layer". As far as I can tell, two COPY --from commands would be required to transfer artifacts from two different intermediate stages and currently, a commit is performed after each COPY. After reading your comments and moby/moby#32904 a couple more times I now believe I understand your replies - that once moby/moby#32904 becomes available, a commit won't be issued after a COPY instruction, therefore, multiple COPY --from can be executed without generating additional layers.

I don't really see the issue.

How would one write the golang example without rewriting the golang ONBUILD image?

from buildkit.

dnephin avatar dnephin commented on August 24, 2024

I currently don't know how to "grab them all as a single layer"

This line from your example should accomplish that. It will be a single layer in the final image:

COPY --from requiredExtra /final /start.sh  /

How would one write the golang example without rewriting the golang ONBUILD image?

That golang example is already working, right? So is the problem that its so verbose? and each stage seems to be very similar?

from buildkit.

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024

I currently don't know how to "grab them all as a single layer"

This line from your example should accomplish that. It will be a single layer in the final image:

COPY --from requiredExtra /final /start.sh /

Yes of course, within the example COPY --from created only a single layer in the run time image. It works as intended due to the example's design. The stage named requiredExtra referenced by COPY --from above, issued a series of four COPY operations to locate all the artifacts into the (same) file system allocated to requiredExtra and this stage executes before the final one containing COPY --from requiredExtra /final /start.sh /

However, according to what I now understand from our posts, once moby/moby#32904 is merged, one should be able to eliminate requiredExtra and simply issue four independent COPY --from instructions in the final stage, as they should only generate a single layer:

FROM alpine
COPY --from webserver /bin/server /bin/webserver
COPY --from logger    /bin/server /bin/logger
COPY --from health    /bin/server /bin/health
COPY /script.sh /start.sh
ENTRYPOINT /start.sh

Therefore, the above should generate exactly 2 layers:

  • 1 from the single layer "alpine" image,
  • 1 generated by executing the four COPY instructions and ENTRYPOINT.

Let me know if my understanding above is incorrect.

That golang example is already working, right?

Yes. The Example: Current Multistage Design should work.

So is the problem that its so verbose? and each stage seems to be very similar?

Yes & Yes. Due to the Aggregate Build Context and lack of mechanisms to partition/map it so each stage can be defined with its own Local Build Context, one can't use the current golang on-build trigger image to implement any stage. There are essentially two reasons for the repetitive code:

  • COPY has to perform this partitioning/mapping from the Aggregate Build Context to the stage's file system.
  • Since the source COPY file path has to be different for each stage, in order to perform this mapping/partitioning, the remaining commands, that could have been encapsulated in an ONBUILD trigger, cannot be encoded that way because the golang source must be COPYed before running the remaining commands.

from buildkit.

tonistiigi avatar tonistiigi commented on August 24, 2024

Let me know if my understanding above is incorrect.

It makes it possible for builder to squash these layers but we do not want to do that.

You should not care about the number of layers, and in the future not even know how many layers there were. Multiple layers that don't share contents do not perform any worse than a single one. In that case, multiple layers perform much better as they can reuse the data from previous builds. Checking for deduplication is a separate issue. If these copies share sources then it is not how multi-stage builds should be used.

ONBUILD

What you are asking should be basically FROM foo WITH bar AS baz that allows setting any source(dir, image, stage, git) as the main context for the stage. I'm open to consider this although it may be hard to justify the new syntax if it only helps the ONBUILD case.

from buildkit.

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024

You should not care about the number of layers...

Thanks for reminding me, as the original reason for eliminating layers was to flush build time artifacts from the run time image. Since multistage builds properly separate build and run time concerns, your right, layer count doesn't matter.

FROM foo WITH bar AS baz

I believe I understand this reference. bar becomes the stage's Build Context.

For me, the Build Context represents the essential abstraction for resolving a stage's file path references. A stage acquires input artifacts for its transforms and shares output artifacts via its Build Context. Its just a simple file system whose content and structure are unique to a given stage. To remain simple, the file paths do not directly expose concepts of an image or stage reference. Therefore, in order to include other abstractions like image or stage file paths, these abstractions must be mapped to a Build Context file path. This is analogous to how the Unix file system works.

In Unix, network files, in memory file systems, RAID arrays, ... can be mounted into the local file system, permitting processes to read and write to these hidden abstractions using simple file path references to the local file system, concealing the complexity of where/how these files are actually stored. Additionally, the simple file path references present a static interface that can be rebound to a different hidden abstraction. For example, a simple file path reference can be bound to a RAID array then rebound to another hidden abstraction, like an in memory file system. After rebinding, the processes referencing this file path wouldn't know/care about the change.

So what's my point? I would suggest eliminating the notion of stage/file references from COPY --from and RUN --mount and limit their binding to the file path references offered by a stage's Build Context. This provides a simple, static interface that developer's create and code to when designing a stage. Then separately provide an ability to rebind these simple file path references to the necessary abstraction when running a stage. I don't believe this suggestion is new to you. Your initial proposal suggested docker build docker://image-reference[::/subdir] which bound the image file path reference to the Build Context. This feature allowed seamless rebinding to a different image and as long as this image reflected the required Build Context.

A couple final points:

  • If you solve the issues precluding the use of current ONBUILD triggers within a multistage build I think it will focus your attention to the intricacies of binding abstractions that will hopefully provide a model to explore mechanisms that result in a cohesive solution to ONBUILD and other reuse features.
  • Concerning RUN --mount, I'm not suggesting eliminating --mount, instead, simply limit its ability to reference file paths provided by only the stage's Build Context.
  • The CONTEXT/MOUNT mapping features that I've suggested are a bit more flexible than typical Unix mount. They support the assembly of file path contents from multiple sources. For example, given a directory '''x''' one could contribute files from more than one file path (directory) to create x's content. Since MOUNT can be associated to any FROM it provides a facility to include any stage or image reference. If you wish to explore this further, let me know.

from buildkit.

tonistiigi avatar tonistiigi commented on August 24, 2024

What you are calling context isn't really any different from any of the other sources that build can use like images, stages, tar archives, git repos. It is just the source that happens to contain the files from the working dir of the client. An important property of these sources that makes the core of builder to work is that they are all immutable.

from buildkit.

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024

What you are calling context isn't really any different from any of the other sources that build can use like images, stages, tar archives, git repos.

Exactly the point! A Build Context is an abstraction, just like the *nix file system is an abstraction allowing various kinds of resources to present themselves as simple file path(s) that can be traversed, read, renamed ... using a standard interface. Therefore, instead of limiting the notion of a "Build Context" to a concrete definition: the "source that happens to contain the files from the working dir of the client" extended it to include: images, stages, tar archives, git repos by reflecting these things as Build Context file paths.

Prior to the introduction of --from the Build Context was the sole abstraction used to provide source artifacts to a Dockerfile. What's being suggested by this thread is to remain faithful to this purpose by mapping other file path resources, such as images, stages, tar archives, and git repos to Build Context file paths.

An important property of these sources that makes the core of builder to work is that they are all immutable.

What's suggested by MOUNT is a mechanism that extends the Build Context without "writing" to it. Therefore, the Aggregate Build Context becomes analogous to an insert only database. In fact, it behaves similarly to RUN --mount which permits the extension of an (immutable) image's file system. Of course, MOUNT can suffer from the same write issue that RUN --mount introduces: the mount destination directory may contain artifacts contributed by the image, thereby, logically deleting these artifacts while the build runs, and once complete, these artifacts become visible again. However, precautions can be encoded to avoid possible side effects introduced by this behavior should it be considered dangerous.

Finally, CONTEXT produces a view (perspective) derived from the Aggregate Build Context that satisfies a given step's source artifact's needs. A content hash can be calculated for this view to monitor its immutability separate from the Aggregate Build Context. Therefore, extending the Aggregate Build Context won't cause a cache miss for a particular step unless a step's view is itself altered, CONTEXT definition updated, or one of its necessary source artifacts changed.

from buildkit.

WhisperingChaos avatar WhisperingChaos commented on August 24, 2024

Any feedback regarding the Issue: Ignores Aggregate Build Context?
I don't really see the issue.

In addition to my initial reply, the below discusses a few more reasons why the suggested workaround is problematic:

  • The COPY . . encoded by the workaround converts an Aggregate Build Context file system to an image file system in order to utilize the extended binding mechanisms COPY --from and RUN --mount that offer an ability to rename a single resource with each invocation. Although limited, this ability is important, as it allows the Aggregate Build Context to be "reshaped" into a form that could be consumed by other build stages that bind themselves to their own specific set of Build Context file paths. Unfortunately, unless there's a mechanism to convert the image file system references back to the Build Context file system abstraction, the code relying on the Build Context file system can't be used and would have to be rewritten to employ COPY --from/RUN --mount. At one time, docker build docker://image-reference[::/subdir] was proposed which would allow an image file system to be reflected as a Build Context but I'm uncertain of its implementation status.

  • Although RUN --mount permits binding the top level directory to a different name or allows an individual file to be renamed, this ability is limited to a single rename operation for a given RUN --mount command. Unfortunately, if there's either more than one source image/stage that must contribute artifacts to a specific transform (RUN) or more than one rename operation required to compose the proper interface needed by it then a series of other RUN --mount or COPY --from operations must first be executed before running the desired transform. These additional RUN --mount/COPY --from essentially "reshape" the source artifacts contributed by other images/stages to reflect the input interface required by the consuming transform. Therefore, the simplicity of --from and --mount encourages the shape (file path names) of artifacts offered by stage(s)/image(s) to mirror those required by the transforms (stage(s)) that consume them in order to reduce redundant COPY --from and/or RUN --mount operations. This promoted, tight coupling between stages/images reduces the ability to reuse code and increases its rigidity. Essentially, --from and --mount encode mapping mechanisms that are too simple.

All the issues above apply to the provided workaround due to the lack of declarative and inflexible mapping mechanisms offered by --from /--mount when compared to the ones provided by CONTEXT/MOUNT.

from buildkit.

macropin avatar macropin commented on August 24, 2024

Came here to complain about lack of global args w/ multistage builds... and WhisperingChaos critique did not disappoint! 5/5 will subscribe.

from buildkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.