I have hit my pull rate limit for docker hub because of rockcraft. Every time I run a

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Cache retrieved base during build about rockcraft HOT 15 OPEN

canonical commented on July 29, 2024

Cache retrieved base during build

from rockcraft.

Comments (15)

tigarmo commented on July 29, 2024 2

In the short term I think the simplest fix would indeed be to just cache the retrieved Docker Hub image and re-use it until the project is clean'ed

from rockcraft.

cjdcordeiro commented on July 29, 2024 1

@cjdcordeiro would it be possible to add a flag to specify the base image or allow the base: to use fqdn like public.ecr.aws/ubuntu/ubuntu:22.04_stable

in theory it could, but it would actually require some work, as we want to make sure Rockcraft is absolutely sure of what's being used as a base (it being an official ubuntu image). So some extensive validation would need to take place.

In the short term I think the simplest fix would indeed be to just cache the retrieved Docker Hub image and re-use it until the project is clean'ed

That would probably be the easiest for the time being. Although I'm a bit afraid of ending up with outdated ROCKs, cause if the base is not refreshed, one can go for weeks without getting the latest security updates...

from rockcraft.

cjdcordeiro commented on July 29, 2024 1

I feel like that 100pulls/6h is only part of the story, because I remember that I hit the limit way before I did 100 pulls.

oh yes 😛 I've had similar problems like this in the past, and ended up going down the rabbit hole of how docker pull works. Nowadays, however, Docker is a bit more explicit about how it works. In short, when you ask for an image, the client will pull its manifest. If we're talking about a multi-architecture image (which is our case), then it pulls 2 manifests: the image index (with the manifest list) and the actual image manifest. So in reality, Rockcraft might only have 50pulls/6h/IP.

edit: how about 8 hours so it fits the whole workday ;)

I don't think there's a right number 🤷 why? well, this limit could be hit, mainly, for 2 reasons: 1) you're building the same ROCK a lot 😁, or 2) you're building many ROCKs behind the same IP (which is probably the case for most, when building multiple ROCKs as part of a CI/CD pipeline). If we were only addressing "1)", then it would be easy -> 6h / 50 = 1 ROCK build every 7.2 minutes. And that would be your cache. But if you have multiple ROCKs being built...they aren't aware of each other, so 🤷

The rule of thumb here though should be to keep a low interval. Low enough to keep rebuilds fresh and with the most recent Ubuntu updates...but NOT so low that it will still make an IP hit the limit when multiple ROCKs are being built. My educated guess -> somewhere between 1h and 3h (=> allowing for an avg of 8 to 25 builds per cached interval).

Doing it off time is a little flawed in my opinion as it could cache the image right before an update is pushed which wouldnt be refreshed until hours later,

Precisely my point above in #184 (comment). However, IMO this is still the best immediate fix, for a few reasons:

quick to implement
the official Ubuntu image in DH is only updated 1/month...yes 1/month. Even our regular images (in ECR, ACR, etc.) are only updated, at most, twice a day. So this (temporary) 6h-based cache isn't off-putting
it is future-proof. I.e., while the 6h timer isn't the best solution, the underlying caching mechanism is desired, and once we agree on a CLI option for something like rockcraft pack --no-cache, we can get rid of the 6h boilerplate.

It would be better if it could compare a checksum or something of the cached image versus what is in docker hub to decide to pull or not.

The devil is in the details :) To compare image digests, you need the image manifest, so you need to pull it.

from rockcraft.

twovican commented on July 29, 2024

Having a similar issue, from the logs

2023-02-07 12:01:20.925 :: 2023-02-07 11:01:20.570 Failed to copy image: Command '['skopeo', '--insecure-policy', '--override-arch', 'amd64', 'copy', 'docker://ubuntu:20.04', 'oci:/root/images/ubuntu:20.0 4']' returned non-zero exit status 1. (time="2023-02-07T11:01:20Z" level=fatal msg="initializing source docker://ubuntu:20.04: reading manifest 20.04 in docker.io/library/ubuntu: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit")

from rockcraft.

tigarmo commented on July 29, 2024

I've had that issue happen to me too. I think the current behavior of always pulling from dockerhub was a stopgap/MVP and we'll move to something else. @sergiusens @cjdcordeiro do you know the plans here?

from rockcraft.

cjdcordeiro commented on July 29, 2024

No immediate plan I'm afraid. Ideally, we should have our own store to avoid these 3rd party dependencies. An alternative would be to use the ubuntu rootfs tarball instead, but this raises concerns in terms of tracing the underlying image build/digest.

from rockcraft.

twovican commented on July 29, 2024

@cjdcordeiro would it be possible to add a flag to specify the base image or allow the base: to use fqdn like public.ecr.aws/ubuntu/ubuntu:22.04_stable

from rockcraft.

tigarmo commented on July 29, 2024

I suppose we could add a simple "timestamp" file on the fetched bundle, so that if it's older than say 2 weeks we re-fetch?

from rockcraft.

cjdcordeiro commented on July 29, 2024

I suppose we could add a simple "timestamp" file on the fetched bundle, so that if it's older than say 2 weeks we re-fetch?

that's a bit too long.

The problem here is related with DH's pull rate limit, which is 100pull/6h, for anonymous users. So that's the timestamp we can work with. I.e, refresh if older than 6h :)

from rockcraft.

twovican commented on July 29, 2024

Is that a good way though to solve this problem, seems like if Docker Hub decides to change from 6H to 3H we will have to push an update, right?

from rockcraft.

cjdcordeiro commented on July 29, 2024

This is not a solution but rather a short-term easy fix. Also because we shouldn't be pulling from docker://ubuntu:... as those images are not built by us nor updated as frequently as the ones on ECR, ACR, etc.

While we come up with a better plan for handling ROCKs' bases, this short-term fix is actually a good thing because:

it overcomes the DH pull rate limit. @twovican if DH changes that to be 100/pulls every 3 hours, that's even better :) the more pull we have a day the less likely it is to hit the limit. The opposite is worse. If DH makes it 100pulls/day, then it means Rockcraft will pull at least 4 times (once every 6 hours). So we're good.
it puts in place some mechanisms that we can leverage in the future, like adding pack options to say when or not to make use of cached artifacts, like the base

from rockcraft.

tigarmo commented on July 29, 2024

I feel like that 100pulls/6h is only part of the story, because I remember that I hit the limit way before I did 100 pulls.
It was at the Engineering Sprint back in November, so maybe the IP was getting shared?
Regardless, I think 6h is probably a fine interval to pick. It'll get around the rate-limiting issue but also speed-up the iterative development of rocks on the day-to-day.
edit: how about 8 hours so it fits the whole workday ;)

from rockcraft.

jardon commented on July 29, 2024

Doing it off time is a little flawed in my opinion as it could cache the image right before an update is pushed which wouldnt be refreshed until hours later, but for development of a rock I likely dont care.

It would be better if it could compare a checksum or something of the cached image versus what is in docker hub to decide to pull or not.

But at least doing it based on time would unblock local development even if the method is not ideal. I won't complain as long as I no longer hit my pull rate limit.

from rockcraft.

sergiusens commented on July 29, 2024

We don't need to pull from a registry or hub; we could pull from the source too. I would rather remove this dependency.

from rockcraft.

sergiusens commented on July 29, 2024

To improve the experience, we would need to put focus in our reusability story in craft-providers; this all happens inside the LXD or multipass environment, so we want some form of global cache.

from rockcraft.

Cache retrieved base during build about rockcraft HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent