Giter Club home page Giter Club logo

Comments (20)

iskunk avatar iskunk commented on August 28, 2024 3

@iskunk Would it be possible to subscribe to the debian-security mailing list as a means of checking if stable has a new release? I'm not sure how different that would be in terms of mail volume from the main Debian tracker.

There's no need---the tracker subscription tells you about new uploads to stable-security as well as unstable. Here is the acceptance message for 127.0.6533.88-1~deb12u1 from Wednesday, for example.

If it's stable release builds you want, you'll get 'em. This automation will cover at least stable and unstable, as well as oldstable once chromium is being maintained for it again.

A word on progress: I have something working almost all the way, despite all the obstacles GitHub and OBS impose. The lack of a straightforward persistent disk volume on GitHub has been especially time-consuming, since Caches are a poor fit for the job and rewriting everything to use them added a lot of gratuitous complexity.

from ungoogled-chromium-debian.

networkException avatar networkException commented on August 28, 2024 1

Can we aim for that kind of hybrid approach, rather than polling alone?

sure, I don't want to block the automation on this discussion

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024 1

Could you somehow use workflow artifacts for that?

They're an even worse fit. Artifacts are great for when your workflow has some specifically associated output, like a compiled binary or log file. (I'm using them for the .dsc and .debian.tar.xz output files.) That's what the mechanism was made for. But here, what I need for this job is some state that persists across workflow runs, specifically

  1. A small text file recording which package versions have been processed already;
  2. A file-download cache, including the >900 MB orig source tarballs (since we really don't want to be downloading those more than once).

With artifacts or caches, the first thing you have to do is determine the specific artifact/cache that is the latest one (you don't care about older versions, after all). With caches, you have at least an easy way of doing this, using partial key matching; with artifacts, you have to run a query through the workflow runs. (Also, artifacts being directly associated with specific workflow runs is not helpful.)

Then you have to download the artifact/cache, read/write to your files as needed, then upload a new version of the cache or save a new artifact. Oh, and then you have to expire old versions of these artifacts/caches, lest you run over your project disk quota.

Compare all this song and dance to just having your project disk space available in some persistent volume mounted on the runner. You read, you write, you're done. The problem is fundamentally that artifacts and caches were designed for specific scenarios, and what I need to do is a different scenario that they don't address. And making my usage fit their model is incredibly awkward.

from ungoogled-chromium-debian.

PF4Public avatar PF4Public commented on August 28, 2024

@iskunk Your proposal is rather convoluted. Would it really be easy on the maintenance side? As far as I remember Debian was always behind in terms of updating to the latest chromium, perhaps version numbers might not align perfectly. All this might bring maintenance nightmares :( Have you attempted a kind of toy-implementation yourself?

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

Hi @PF4Public,

Note that I have all of this already running since about February. The differences are that (1) it's running on my own server, (2) it handles more than just Chromium, and (3) it provides packages (including u-c) for XtraDeb. The point of this proposal is to set up a similar process for u-c here on a public runner, feeding the OBS builds.

Debian's releases do lag Google's by a bit, but that's unavoidable. I have not seen any issue with version misalignments... Debian obviously packages the Linux release rather than the Windows one, but u-c has always put out a tag matching the Linux release, so there hasn't been a problem. I do notice that sometimes, the Debian package comes out before the u-c tag, and more often it's the other way around, so the automation obviously needs to handle both scenarios. But that's not hard to deal with, when you know to expect it.

As for maintenance work, >90% of it has been in a step not even covered here: the Debian-to-Ubuntu conversion. That's non-trivial, and it often breaks with new major versions. But that's not in scope of the basic automation proposal here, because it's targeting Debian alone. Ubuntu support can easily be added later.

That aside, there hasn't been much. Sometimes, the u-c-debian/convert/ framework needs to be updated because of a conflict between the Debian and u-c patches, but that has come up very occasionally (last times were for 121.0.6167.85 and 119.0.6045.105, and we're now at 123.0.6312.105). Sometimes, the Debian tracker notification comes in corrupted, and fails the PGP check, but I have an infrequent polling check to cover any sort of e-mail failure. Sometimes, there's a bug in my automation scripts, but as time goes on, the number of those converges asymptotically to zero.

If you feel the above is more complicated than it needs to be, I'm open to counter-proposals, of course. But two important qualities that I feel should be retained are

  • That the system be fully automated, so that in the common/ideal case, u-c releases can go out the door without any human intervention. This not only reduces the maintainer burden, but allows new versions to be provided with as little turnaround time as possible. (This naturally implies having strong authentication in the early steps, to prevent a bad actor from fooling the automation into serving their ends. The security must be no worse than Debian's.)
  • Polling should be used as sparingly as possible. Not only does polling for a new release imply a worst-case latency up to the polling period, it uses the resources of a public project in a manner that can be avoided, and is thus not a friendly behavior if done frequently.

Also, separately from the discussion of how this should be implemented, I currently don't have a way of prototyping it. Is there a way that my fork of the repo can get access to a GitHub runner? (The GH docs indicate that self-hosted runners should only be used with private repos, since otherwise a malicious actor who creates a fork can run bad stuff on the self-hosted runner.)

from ungoogled-chromium-debian.

PF4Public avatar PF4Public commented on August 28, 2024

If you feel the above is more complicated than it needs to be, I'm open to counter-proposals, of course.

Sorry if I sounded that way, I didn't mean it. Given that we don't have someone who permanently maintains Debian, your automation looks promising. You did a great job describing your idea, one thing was not very clear for me. What does it mean for ungoogled-chromium in terms of actions needed. Could you please give a quick overview of what needs to be done (in this repository in particular) step-by-step in order to implement your idea? I'm not very familiar with OBS, perhaps this could be the reason for your idea not being obvious for me.

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

Thanks, I'm not a fan of making things any more complex than they need to be. And it's especially important here, since transparency is ultimately the whole point.

I was told that GitHub-hosted runners are freely available, and I checked the "Actions" section of my fork of the repo again. Maybe it wasn't there before, maybe I overlooked it, but I'm now seeing a "GitHub-hosted runners" tab. So that much unblocks my work for now; I can actually start putting together a prototype of this.

The other thing I need for now is just feedback on the proposal. In the absence of that, I can only presume that everyone's OK with the approach described above. This is the time for any objections and/or course corrections at the design level.

Once the prototype is complete, it should be a matter of (1) submitting the new scripts as a PR, and (2) having the u-c principals create and configure their own ProxiedMail account (or whatever service ends up getting used). Once both are in, everything should (in theory) be armed and ready for automated uploads.

from ungoogled-chromium-debian.

PF4Public avatar PF4Public commented on August 28, 2024

I can actually start putting together a prototype of this.

That'd be great!

The other thing I need for now is just feedback on the proposal. In the absence of that, I can only presume that everyone's OK with the approach described above. This is the time for any objections and/or course corrections at the design level.

Exactly with that in mind it would be helpful to see an overview of your supposed workflow. It would ease identifying all the moving parts.

having the u-c principals create and configure their own ProxiedMail account (or whatever service ends up getting used)

I have my concerns here. Is there any other way for triggering apart from using mail? It would be much easier if your workflow could be adapted to be run in the existing infrastructure: GitHub Actions, OBS without involving anything extra. Perhaps a polling GitHub action could be an acceptable substitute for a mail service?

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

Exactly with that in mind it would be helpful to see an overview of your supposed workflow. It would ease identifying all the moving parts.

The overview is up there; the implementation is just going to be an elaboration of that.

I have my concerns here. Is there any other way for triggering apart from using mail?

Debian does not provide any means of notification of new package uploads (that I'm aware of) apart from their tracker. It would be nice if they offered e.g. a webhook service.

Perhaps a polling GitHub action could be an acceptable substitute for a mail service?

I have polling as a fallback mechanism, for when the e-mail notification fails. But that would be something like every four hours. I could poll Debian more frequently, but it would be abusive to poll them, say, every five minutes. Anywhere in that range, you have a tension between (1) checking more frequently, reducing average turnaround time on updates, but using more resources, and (2) checking less frequently, being more "polite" and parsimonious, but updates (including urgent security updates) take longer to go out.

Is it better to have to deal with that tradeoff, than a new element of infrastructure?

Bear in mind, this has worked remarkably well in my production setup. For example, Debian recently released version 123.0.6312.105-2 of their sid/unstable Chromium package. I received the source-accepted e-mail from their tracker this past Sunday at 18:47 UTC. My automation kicked in, uploaded the Ubuntu-converted package to Launchpad, and I got the e-mail notice of Ubuntu's package acceptance (which comes in well after the upload is complete) at 19:27 UTC.

Note that, for non-security uploads, Debian's builds start after the acceptance notice goes out (I download the source package from their incoming server that is used by the build nodes, not the regular package repos). Meaning, my package started building while Debian's builds were in progress. That's the kind of rapid response that I aspired to, and that I am advocating here.

(It doesn't always happen, of course, but that's generally down to the Debian-to-Ubuntu conversion breaking. Then I have to go in, and fix it, and sometimes that takes more effort/time than others. It's a contigency for which no automated solution exists.)

from ungoogled-chromium-debian.

PF4Public avatar PF4Public commented on August 28, 2024

Is it better to have to deal with that tradeoff, than a new element of infrastructure?

I'd say "yes". We already poll googleapis in main repo every hour without significant hurdles: https://github.com/ungoogled-software/ungoogled-chromium/blob/master/.github/workflows/new_version_check.yml

That's the kind of rapid response that I aspired to

That's understandable, but much longer delay would be also acceptable here.

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

Polling Google is a whole different ballgame. Every five minutes would be fine then. It's an issue with Debian because they're a shoestring operation.

I can leave out the e-mail portion, but I don't understand the reluctance to making such an easy optimization.

from ungoogled-chromium-debian.

PF4Public avatar PF4Public commented on August 28, 2024

I don't understand the reluctance to making such an easy optimization.

Mainly the bus factor that arises from the need to register at some third-party service.

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

Mainly the bus factor that arises from the need to register at some third-party service.

Nothing requires that one person hang onto the login credentials. That could be stored as a secret, or maybe some kind of private area accessible only to org members. (I'm not clear on what GitHub provides in this vein.)

Anyway, I've been digging into the GitHub Workflows docs, and experimenting. (This is something I need to learn anyway for work, so it's a useful exercise.) There are a few problems with the polling approach particular to GitHub:

  • From their doc: "In a public repository, scheduled workflows are automatically disabled when no repository activity has occurred in 60 days."

    This repo can easily go that long without being touched, given how the automation is being designed to minimize intervention. GitHub is clearly not amenable to the "set and forget" kind of cron job.

  • Speaking of cron, that is apparently the only syntax GitHub supports for scheduling the workflow. I'd want this to run (say) every three hours, and I don't care when within that three-hour window it runs, as long as over time the interval averages out. But because of the cron syntax, I can't express that margin of flexibility; I have to choose a specific minute when to run.

    Many folks seem to have the problem of their scheduled workflows starting well after the specified time (or at all), and this is often a result of scheduling their workflows at the top of the hour, without realizing that it's a peak load period for exactly that reason. I could choose e.g. 17 minutes past the hour, and that should be an improvement. But is that really when the load is lowest? I have to guess, and if I guess wrong, the workflow might not run when we need it to.

  • Even though the poll operation is very small, it still requires spinning up a full-size runner that could do much more. There is no ubuntu-micro option or the like. Which only exacerbates the previous point.

  • And even though the poll operation is quick, taking only a couple seconds at most, GitHub rounds up the runner usage to the nearest minute. Those two seconds get "billed" as one minute. So that makes this approach artificially expensive, and as the linked issue shows, GitHub hasn't exactly been falling over themselves to address that.

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

After further investigation, I've encountered some obstacles with the approach I had in mind:

  • The ProxiedMail service makes it possible for a received e-mail message to trigger a webhook. But the webhook payload format is fixed, intended for use with a endpoint customized to consume that format. (They provide sample code to integrate into "your Web application.") If you want to hook into an existing endpoint with its own defined format, like GitHub's repository_dispatch event, you're out of luck.
  • Zapier is an automation integration service, similar to IFTTT. You can put together an e-mail-to-webhook setup with them. But the webhook piece is considered premium functionality, not available in their free tier. The next tier up is US$20 per month.
  • IFTTT itself provides only a single global address for e-mail automation. Which automation to invoke is identified by the sender, not the recipient address. So this service cannot be driven by a third-party e-mail source.
  • I have not found any other online service that can provide the necessary e-mail-to-webhook functionality.

So it seems that a different approach is going to be needed.

The traditional solution to this problem is a Unix shell account that can receive mail (running in an always-on environment, obviously). Set up procmail(1) or a more modern equivalent, a cron job, and a bit of scripting, and that'll take care of this completely. I can provide the whole setup.

The question, then, is this: Do any core team members have a Linux/Unix shell account on which they are willing to host this monitoring function?

If no one does, then there are a few places online that will provide such accounts for free. On option I came across is freeshell.de. From their About page: "Our primary focus is on fostering anonymity, the free flow of information, and the promotion of freedom of speech." And obtaining an account only requires sending in a postcard.

(Note that it's not a big deal if someone who owns the operative shell account leaves the project for whatever reason. As long as someone else steps in, the GitHub endpoint can be triggered from a different source. We can always revoke the original auth token and issue a new one.)

Finally, as a remaining option, I could just hook in the new automation to my own Debian-monitoring system. It defeats the purpose of having an independent setup on the u-c side, but maybe that's not as big a deal as I think. (This automation would allow for a manual invocation in any event, so that will remain as a lowest-common-denominator option, even though in practice no one's going to want to do things that way.)

from ungoogled-chromium-debian.

networkException avatar networkException commented on August 28, 2024

Perhaps a polling GitHub action could be an acceptable substitute for a mail service?

I would like to point out that we already use this in the main repository for notifying us about a new release, notably only every six hours.

If being up to date is really a concern it can run every 10 minutes for all I care, but generally (also for security releases) I refer to the exceptions Google sets in their own release blog "Stable channel will [...] roll out over the coming days/weeks"

From their doc: "In a public repository, scheduled workflows are automatically disabled when no repository activity has occurred in 60 days."

I assume the action will self commit version bumps? That should be enough

Even though the poll operation is very small, it still requires spinning up a full-size runner that could do much more. There is no ubuntu-micro option or the like. Which only exacerbates the previous point.

Yes this is the whole "We rewrote our backend to be a monolithic stack and it outperformed the previous ludicrous AWS pipelines / lambda functions setup" situation but honestly for GitHub us spawning another VM in their Azure cluster is just a rounding error.

Those two seconds get "billed" as one minute. So that makes this approach artificially expensive

Perhaps I'm missing something but to my knowledge is the policy has been (for a long while) and still is that open source projects (just public ones - source available I guess) don't pay for CI. Otherwise we couldn't afford any of this, there's nobody paying for the project (and accepting money would be dangerous given the current project name).

On the one hand that means that builds are obnoxiously slow, on the other hand spawning a VM and pulling some OCI image over their prebuilt ubuntu ISO doesn't hurt either.

I might have missed something crucial here (its 2:30am) but please consider using this approach. I have no issue with hosting long running processes on my personal infra (unless its chromium builds because I don't have an EPIC in a colocation) but not unless it can be avoided by using easily traceable publicly configured CI

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

If being up to date is really a concern it can run every 10 minutes for all I care, but generally (also for security releases) I refer to the exceptions Google sets in their own release blog "Stable channel will [...] roll out over the coming days/weeks"

Choosing a polling period is often more of an art than a science, given the many considerations involved. But the way GitHub "bills" the time certainly pushes it to longer intervals. Which aren't ideal from a security standpoint.

I assume the action will self commit version bumps? That should be enough

There's nothing to commit for normal version updates. Nothing in the repo changes that is specific to a Chromium release (unless it's the occasional update to fix breakage, as I've already been doing).

but honestly for GitHub us spawning another VM in their Azure cluster is just a rounding error.

It's an issue if there's a lot of contention, hence the "street knowledge" of not running scheduled jobs at the top of the hour. While I don't like using more resources than necessary on principle, this can be a practical consequence of that.

Perhaps I'm missing something but to my knowledge is the policy has been (for a long while) and still is that open source projects (just public ones - source available I guess) don't pay for CI.

The CI is free, but it's not unlimited. (Huh, I thought I'd seen 6K minutes per month, but now I see 2K. That sucks.)

I have no issue with hosting long running processes on my personal infra (unless its chromium builds because I don't have an EPIC in a colocation) but not unless it can be avoided by using easily traceable publicly configured CI

Keep in mind the role that this portion of the automation plays in the big picture. All this does is start the GitHub workflow, at such a time when a new version is available. It's not providing any content that will affect the output of the build. So in the sense of tracing/auditing, there isn't really much there to look at. Now, obviously, the GitHub workflow proper that prepares the source is a different story. But this? This is equivalent to a team member monitoring the Debian notifications, and manually invoking the workflow when appropriate. What's actually going on behind the scenes doesn't matter, only the time that the call is received.

I can push the remote-Unix-side automation to a separate area of my branch, if you like. Note that all this would do is (1) receive e-mail from Debian's tracker, (2) run an infrequent cron job, (3) make intermittent HTTP requests to Debian's repos, and (4) make a REST call to GitHub with an auth token to kick off the workflow. Nothing compute- nor I/O-intensive is involved; rather, the critical bit is that this is running on a 24/7 system that can reliably receive mail (so no creaky old Linux box running in the corner of someone's flat).

from ungoogled-chromium-debian.

networkException avatar networkException commented on August 28, 2024

The CI is free and minutes are not limited, all the pricing that pages lists is for private repos: "GitHub Actions usage is free for standard GitHub-hosted runners in public repositories, and for self-hosted runners"

Only the job (6h) / workflow (72h) limits apply, you can spawn as many workflows that run for just a minute as you want

from ungoogled-chromium-debian.

iskunk avatar iskunk commented on August 28, 2024

The docs don't do a good job of delineating between "free" and "unlimited," but I found a reference on the GitHub blog that clarifies "GitHub Actions are unlimited for public repositories and the Free plan also provides 2,000 minutes of Actions each month for use with your private projects." That explains how it's possible to build u-c on GitHub, even if it can't be done all in one go.

That makes the polling approach more feasible, but the other issues remain. Fundamentally, the problem is that a GitHub Action is just an awkward fit for this (monitoring). Ideally, you want a lightweight, always-on process, rather than a heavyweight, intermittent one.

I can have polling as a fallback measure, something like every 4-6 hours, so that updates will run even if external infrastructure goes offline. But in the normal course of things, it would be best for this repo's workflows to be started by external events, which GitHub doesn't natively support. Can we aim for that kind of hybrid approach, rather than polling alone?

from ungoogled-chromium-debian.

SugaryHull avatar SugaryHull commented on August 28, 2024
  1. tracker.debian.org sends out a chromium source-upload notification to an e-mail address that we have previously subscribed on the site

@iskunk Would it be possible to subscribe to the debian-security mailing list as a means of checking if stable has a new release? I'm not sure how different that would be in terms of mail volume from the main Debian tracker.

from ungoogled-chromium-debian.

PF4Public avatar PF4Public commented on August 28, 2024

persistent disk volume on GitHub

Could you somehow use workflow artifacts for that?

from ungoogled-chromium-debian.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.