Giter Club home page Giter Club logo

Comments (20)

runcom avatar runcom commented on July 4, 2024

make sense but right now, we already have https://github.com/vbatts/docker-utils#dockertarsum, could have a look at it?

from skopeo.

rhatdan avatar rhatdan commented on July 4, 2024

@vbatts PTAL

from skopeo.

giuseppe avatar giuseppe commented on July 4, 2024

how is the sh256 checksum computed for each layer? I have tried both:

docker save busybox | dockertarsum and docker save busybox | tar xO d51a083a3b01fe8c58086903595b91fc975de59a9e9ececec755df384a181026/layer.tar | dockertarsum

but I still don't get the same hash I see in the v2 registry image manifest (under fsLayers)

from skopeo.

giuseppe avatar giuseppe commented on July 4, 2024

Got a bit confused of all the different ids that are around, so probably this feature in Skopeo is not needed.

My idea was that we will skip downloading layers that we have imported into the OSTree repository from a docker saved tarball, e.g.

docker save -o busybox.tar busybox
atomic pull --tar=busybox.tar
atomic pull --docker busybox.tar # Does a pull from the registry

The second pull would not do download anything. To do that though, I would have to map each layer imported with the same sha256 used by fsLayers.

It seems the sha256 listed under fsLayers is just the sha256sum of the binary blob you download from the v2 registry. Given it is compressed, I don't know if it is possible to make it reproducible starting from a docker save'd image.

Is there a way to do that or any plan to have a v2 similar layout for docker save'd images?

from skopeo.

rhatdan avatar rhatdan commented on July 4, 2024

@vbatts ^^

from skopeo.

cgwalters avatar cgwalters commented on July 4, 2024

We'd need to look at how the Docker Engine handles this now...the core problem that @vbatts was trying to solve with tarsum was similar to pristine-tar - be able to regenerate a tarball with the same checksum form disk content.

But the Docker people just went with straight sha256...so Engine must do something here. One thing I'd suggested was that if you were re-uploading a layer, the Engine would keep track of the registry it downloaded it from, and rather than re-synthesize it on the client, just re-fetch from the server. Was something like that implemented?

from skopeo.

cgwalters avatar cgwalters commented on July 4, 2024

That said, do we actually need the ability to do docker save and have it avoid redownloads for anything?

Oh actually, this gets into the whole "docker save doesn't do v2" problem which the OSBS people hit...I don't have a link handy.

from skopeo.

runcom avatar runcom commented on July 4, 2024

@giuseppe the sha256 stored in the manifest is just the sha256sum of the downloaded layer's tar file, e.g. I downloaded layer a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar from docker.io/busybox

sha256sum a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar

If I add a random file to that tar I get a different sha sum:

sha256sum a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar
4589eafb71a90c4acbd360e68de3517b5d7844dbaaf5fb3ca9fe841f3ae1e754  a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar

from skopeo.

runcom avatar runcom commented on July 4, 2024

Oh actually, this gets into the whole "docker save doesn't do v2" problem which the OSBS people hit...I don't have a link handy.

this is one of the issue though

from skopeo.

giuseppe avatar giuseppe commented on July 4, 2024

@runcom, yes I realized that afterwards. Probably the reason of calculating the sha256 before the compression was to reduce the number of operations done on unverified data; but now there is the issue of how to get the same sha256 given the same tarball.

@cgwalters yes, I agree that it is not a blocking issue, the only additional cost is to redownload the image if the two methods are used together (import a docker save'd image + fetch it from a registry).

from skopeo.

cgwalters avatar cgwalters commented on July 4, 2024

BTW the discussion on compression, checksums, and reproducibilty mirrors that of the rationale for https://github.com/cgwalters/git-evtag (And we should probably have an OSTree equivalent)

from skopeo.

vbatts avatar vbatts commented on July 4, 2024

tarsum is not used.

from skopeo.

vbatts avatar vbatts commented on July 4, 2024

i'm trying to figure out what is the issue we're solving here? is something still needing attention?

from skopeo.

giuseppe avatar giuseppe commented on July 4, 2024

@vbatts The issue is that the checksums for layers imported from a docker save'd tarball are different than those showed in the manifest file (under fsLayers). If we first import to the OSTree repository from a tarball, and later fetch the same image from the registry, we will need to download each layer again, even though it is the same image and same version.

Is there a way that I can get the same sha256 checksums that are under fsLayers from the image.tar file obtained as docker pull IMAGE && docker save -o image.tar IMAGE?

from skopeo.

runcom avatar runcom commented on July 4, 2024

We could try to convert the manifest to the docker save format and generate the hash (I can take a look at it probably), Vincent is it feasible this way?

from skopeo.

vbatts avatar vbatts commented on July 4, 2024

The checksum of the blob in the fsLayers is of the layer.tar itself. I.E.

vbatts@f ~ (master) $ docker pull fedora:latest
latest: Pulling from library/fedora

a3ed95caeb02: Already exists
236608c7b546: Pull complete 
Digest: sha256:1fa98be10c550ffabde65246ed2df16be28dc896d6e370dab56b98460bd27823
Status: Downloaded newer image for fedora:latest
vbatts@f ~ (master) $ docker save fedora:latest | tar -tv
drwxr-xr-x 0/0               0 2016-03-04 18:40 768d4f50f65f00831244703e57f64134771289e3de919a576441c9140e037ea2/  
-rw-r--r-- 0/0               3 2016-03-04 18:40 768d4f50f65f00831244703e57f64134771289e3de919a576441c9140e037ea2/VERSION  
-rw-r--r-- 0/0             388 2016-03-04 18:40 768d4f50f65f00831244703e57f64134771289e3de919a576441c9140e037ea2/json  
-rw-r--r-- 0/0            1024 2016-03-04 18:40 768d4f50f65f00831244703e57f64134771289e3de919a576441c9140e037ea2/layer.tar  
drwxr-xr-x 0/0               0 2016-03-04 18:40 9a233237d70560774705931fc55fe1a3a4619cccf2d0a76671256080c2af6fdb/  
-rw-r--r-- 0/0               3 2016-03-04 18:40 9a233237d70560774705931fc55fe1a3a4619cccf2d0a76671256080c2af6fdb/VERSION  
-rw-r--r-- 0/0            1195 2016-03-04 18:40 9a233237d70560774705931fc55fe1a3a4619cccf2d0a76671256080c2af6fdb/json  
-rw-r--r-- 0/0       212476928 2016-03-04 18:40 9a233237d70560774705931fc55fe1a3a4619cccf2d0a76671256080c2af6fdb/layer.tar  
-rw-r--r-- 0/0            1667 2016-03-04 18:40 ddd5c9c1d0f2a08c5d53958a2590495d4f8a6166e2c1331380178af425ac9f3c.json
-rw-r--r-- 0/0             279 1970-01-01 00:00 manifest.json  
-rw-r--r-- 0/0              89 1970-01-01 00:00 repositories
vbatts@f ~ (master) $ docker save fedora:latest | tar xO 768d4f50f65f00831244703e57f64134771289e3de919a576441c9140e037ea2/layer.tar | sha256sum
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  -
vbatts@f ~ (master) $ docker save fedora:latest | tar xO 9a233237d70560774705931fc55fe1a3a4619cccf2d0a76671256080c2af6fdb/layer.tar | sha256sum
4f9e31a2233f97f1fe18c26d44effd10a5ea3d9839299cf88003c85aea75391c  -

Though now that I'm thinking about it, they do compress it before pushing. While gzip (deflate) is deterministic, it still varies by implementation. To get the same stream, you'd have to use golang's 'compress/gzip' library and matching compression level that the docker engine uses. This would only be a couple of lines to make an executable for.

from skopeo.

giuseppe avatar giuseppe commented on July 4, 2024

Thanks for confirming it. I am going to add an helper process to atomic that can be used to compute the sha256 for the tarballs.

I am going to close this issue as anyway it is not related to Skopeo.

from skopeo.

cgwalters avatar cgwalters commented on July 4, 2024

Ugh, so everything (including Docker engine) that interacts with Docker images now needs to use a specific implementation of compress/flate forever? And if anyone improves the implementation in golang, then every project will have to carry a forked copy of the current version.

from skopeo.

vbatts avatar vbatts commented on July 4, 2024

@cgwalters compression is the bane of a lot of this. The compress/gzip in golang is consistent to itself (for each compression level), just as GNU gzip is (including the --no-name flag). But there are numerous implementations of the deflate algorithm, and non-trivial ways to unpack and repack content with the same implementation.
So yes, many of the assumptions of content-addressibility that folks are depending on, depends on specific set ups of golang compress/gzip

from skopeo.

vbatts avatar vbatts commented on July 4, 2024

@cgwalters further, this is my big argument for addressing based on the digest of the uncompressed tar. Which invalidates many APIs dealing with digests of opaque blobs, and the desire to not transport huge uncompressed archives.

from skopeo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.