Giter Club home page Giter Club logo

Comments (10)

ialidzhikov avatar ialidzhikov commented on June 6, 2024 1

Hi @milosgajdos 👋

I tried to reach over Slack but let me also try to reach here.

In the context of this issue, I was wondering how deletion of an image works in the normal registry cache case. I read this morning about garbage collection binary (https://distribution.github.io/distribution/about/garbage-collection/). Do I understand right that delete requests against a normal registry (not a pull through cache) actually only remove the corresponding manifests/layers (which are actually only references to the blobs) and do NOT delete blobs. According to my understanding blobs seems to be deleted only by the garbage collection binary. Can you confirm this? I see that the GC binary only deletes blob that are no longer referenced.
So, I was wondering whether we could reuse the same approach for the proxy. I mean to adapt the proxy to do not delete blobs directly but to run the GC binary (as sidecar or as cronjob) to delete the blobs. WDYT?

The part that deletes the blob in the context of the above issue is

if err != nil {
return err
}
. Another approach would be to somehow execute this v.RemoveBlob call after we make sure that the corresponding digest is not referenced by any other manifest in the repository (or in the scheduler-state.json file).


Another approach would be to somehow execute this v.RemoveBlob call after we make sure that the corresponding digest is not referenced by any other manifest in the repository (or in the scheduler-state.json file).

@milosgajdos do you know of a good way how such check can be performed? I was thinking about listing all manifests in the repo, then all of their references. But I am unsure about this approach as meanwhile a new image can be added that references the blob that just gets deleted.

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

cc @milosgajdos @thaJeztah

from distribution.

milosgajdos avatar milosgajdos commented on June 6, 2024

Yeah, the proxy hasnt received my effort. We might need to remove it from v4 completely unless there is some sizeable effort the community is willing to put into it. If it were up to me I'd remove it from v3 as well, just like we removed the client that has received similar amount of attention.

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

Do I understand right that delete requests against a normal registry (not a pull through cache) actually only remove the corresponding manifests/layers (which are actually only references to the blobs) and do NOT delete blobs. According to my understanding blobs seems to be deleted only by the garbage collection binary. Can you confirm this?

As far as I see from the code, deleting a layer (DELETE /v2/<name>/blobs/<digest>) only removes the layer but does NOT delete the blob. The blobHandler calls blobs.Delete

err := blobs.Delete(bh, bh.Digest)
. The corresponding interface implementation is
func (lbs *linkedBlobStore) Delete(ctx context.Context, dgst digest.Digest) error {
if !lbs.deleteEnabled {
return distribution.ErrUnsupported
}
// Ensure the blob is available for deletion
_, err := lbs.blobAccessController.Stat(ctx, dgst)
if err != nil {
return err
}
err = lbs.blobAccessController.Clear(ctx, dgst)
if err != nil {
return err
}
return nil
}
. OnBlobExpire of the proxyRegistry does this:
// Clear the repository reference and descriptor caches
err = blobs.Delete(ctx, r.Digest())
if err != nil {
return err
}
err = v.RemoveBlob(r.Digest().String())
if err != nil {
return err
}
. Hence it also calls blobs.Delete. When I debugged this, blobs.Delete only deletes the layer/the reference. The v.RemoveBlob call is the actual call that deletes the blob.

When I also check for usages of vacuum.Delete, it is only called by the garbage collector (

err = vacuum.RemoveBlob(string(dgst))
) and from the OnBlobExpire of the proxyRegistry.

If some of the maintainers can confirm my observations, then that would be great.


If all of the above is true, then IMO the best option would be to require the garbage collector to be ran as sidecar or cronjob. I suggest to drop

err = v.RemoveBlob(r.Digest().String())
if err != nil {
return err
}
from OnBlobExpire and let the garbage collector delete unreferenced blobs.
WDYT?

cc @thaJeztah @Jamstah (I am tagging other folks in case @milosgajdos is OOO)

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

According to my testing, the proxy does not delete blobs for manifests. The corresponding handling is:

s.OnManifestExpire(func(ref reference.Reference) error {
var r reference.Canonical
var ok bool
if r, ok = ref.(reference.Canonical); !ok {
return fmt.Errorf("unexpected reference type : %T", ref)
}
repo, err := registry.Repository(ctx, r)
if err != nil {
return err
}
manifests, err := repo.Manifests(ctx)
if err != nil {
return err
}
err = manifests.Delete(ctx, r.Digest())
if err != nil {
return err
}
return nil
})

I pulled alpine:3.19.0 via the pull through cache. I took a snapshot of the file system with tree. Then I adapted the scheduler-state.json file to expire all blobs/manifest. Below I attach the diff of the file system before and after expiration of blobs and manifests:

 .
 ├── docker
 │   └── registry
 │       └── v2
 │           ├── blobs
 │           │   └── sha256
 │           │       ├── 1d
-│           │       │   └── 1dc785547989b0db1c3cd9949c57574393e69bea98bfe044b0588e24721aa402
-│           │       │       └── data
 │           │       ├── 51
 │           │       │   └── 51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48
 │           │       │       └── data
 │           │       ├── a7
 │           │       │   └── a70bcfbd89c9620d4085f6bc2a3e2eef32e8f3cdf5a90e35a1f95dcbd7f71548
 │           │       │       └── data
 │           │       └── c3
-│           │           └── c303524923177661067f7eb378c3dd5277088c2676ebd1cd78e68397bb80fdbf
-│           │               └── data
 │           └── repositories
 │               └── library
 │                   └── alpine
 │                       ├── _layers
 │                       │   └── sha256
 │                       │       ├── 1dc785547989b0db1c3cd9949c57574393e69bea98bfe044b0588e24721aa402
-│                       │       │   └── link
 │                       │       └── c303524923177661067f7eb378c3dd5277088c2676ebd1cd78e68397bb80fdbf
-│                       │           └── link
 │                       ├── _manifests
 │                       │   ├── revisions
 │                       │   │   └── sha256
 │                       │   │       ├── 51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48
-│                       │   │       │   └── link
 │                       │   │       └── a70bcfbd89c9620d4085f6bc2a3e2eef32e8f3cdf5a90e35a1f95dcbd7f71548
-│                       │   │           └── link
 │                       │   └── tags
 │                       │       └── 3.19.0
 │                       │           ├── current
 │                       │           │   └── link
 │                       │           └── index
 │                       │               └── sha256
 │                       │                   └── 51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48
 │                       │                       └── link
 │                       └── _uploads
 └── scheduler-state.json

-33 directories, 11 files
+31 directories, 5 files

You can see that blobs for manifests are not removed. I don't know if it is expected (implemented like this on purpose) or not.

So currently there is already a discrepancy how blobs and manifests are deleted. Blobs are deleted by removing the soft link and the blob content. Manifest are deleted by removing the soft link only.

Requiring the garbage-collect to the proxy would also solve the above problem.

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

There is the following note about the garbage collection which is a little bit worrying:

Note: You should ensure that the registry is in read-only mode or not running at all. If you were to upload an image while garbage collection is running, there is the risk that the image’s layers are mistakenly deleted leading to a corrupted image.

I am not sure if this constraint is applicable for the proxy. 🤔 WDYT? Is it?

from distribution.

milosgajdos avatar milosgajdos commented on June 6, 2024

I'm on sabbatical for a couple of months -- I only occasionally check my GH notifications on the phone and do what's only possible via the GH app. There are a lot of other maintainers who can pick this up.

There is the following note about the garbage collection which is a little bit worrying

Why is this worrying? This makes perfect sense: take GC-collected languages -- there is a very tiny "stop-the-world" moment when the unreferenced memory is cleaned up. Until that's done nothing can mess about with the memory that's about to be cleaned; the key is to avoid race conditions.

I should also let you know that the GC as it is in this code is in a dire state - it was a cute OG attempt but it doesn't scale and has a boat load of bugs in it. Many orgs have their own in-house registry GC solutions for that very reason.

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

Why is this worrying? This makes perfect sense: take GC-collected languages -- there is a very tiny "stop-the-world" moment when the unreferenced memory is cleaned up. Until that's done nothing can mess about with the memory that's about to be cleaned; the key is to avoid race conditions.

My first idea to tackle the issue was to follow what it is followed in the normal registry case. I asked many times above whether a maintainer can confirm my observations. Again, as far I see for a normal registry, you cannot delete a blob directly, you can only delete the image layers (references to blobs) and the actual deletion of blobs happens via the GC. Can you confirm this?
My initial idea was to run the GC as a cronjob - for example once every day. But it is now getting worse with the requirement the registry to be in readonly mode. I am not sure if proxy at all respects the corresponding setting. I think for our use-case it is not okay to put the registry in readonly mode everyday.

I'm on sabbatical for a couple of months. [...] There are a lot of other maintainers who can pick this up.

Ok, enjoy. Who from the maintainers can support me with this issue?

I should also let you know that the GC as it is in this code is in a dire state - it was a cute OG attempt but it doesn't scale and has a boat load of bugs in it. Many orgs have their own in-house registry GC solutions for that very reason.

:)


I was also thinking that we see occurrences of this issue mainly due to the hard-coded TTL in v2.8.3. As you know, it is hard-coded to 7d in v2.8.3. In v3.0.0-alpha.1 it is configurable and we could mitigate the number of occurrences of this issue by setting a higher TTL (for example 90d). Configurable TTL also means increased proxy effectiveness. Right now, with 7d its effectiveness is low because it deletes its blobs every 7d.
But we again come to the question with the releases for distribution/distribution. I asked many times in #4156 for a GA release. I also asked this morning in Slack: https://cloud-native.slack.com/archives/C01GVR8SY4R/p1705303309837879
The open PR for cherry-picking configurable TTL to release/2.8 was closed in favor of release v3.0.0: #4090 (comment)
Can someone cut v3.0.0 so that we could mitigate this issue?

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

I see that there is #2367 which is the same issue as this one.

from distribution.

ialidzhikov avatar ialidzhikov commented on June 6, 2024

The mitigation #2367 (comment) works for us. I will close this issue as duplicate of #2367.

from distribution.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.