Proposal: We stop adding new things to Peril If we

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Resolution <a class="user-mention notranslate" data-hovercard-type

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

amazing! <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

[RFC] Deprecate Peril,about artsy/readme

Comments (25)

mdole commented on May 26, 2024 4

@ashfurrow thanks for the thoughtful feedback and for volunteering to help with the infrastructure issues!

Moving Peril back to Artsy's Heroku account sounds like an excellent way to allow us to continue to make use of Peril while we contemplate other routes.

To your point about GH Actions + third-party services: totally hear you that it's questionable to bank on a service that hasn't launched yet, and yeah I don't love the idea of adding many smaller apps to our org that would have to be maintained and learned. As @zephraph noted, it's a lot of unknowns.

That being said, I still worry about the long-term viability of Peril since it is not being actively worked on and does not have a critical mass of users/contributors.

Feels to me like it would make more sense to have that conversation once GH Actions launches and we get a sense for whether or not it would actually accomplish what we'd want it to. How does this sound?

Now: @zephraph + @ashfurrow move Peril to Artsy's Heroku account, enabling us to more easily maintain and debug it. We continue to use Peril as we have been using it, including adding new functionality as desired. If that move goes smoothly, we close this issue for the time being.
Once GH Actions launch: @mdole + @zephraph create a task force for exploring it, and if it turns out to be viable, reopen this issue with a more concrete plan.

from readme.

ashfurrow commented on May 26, 2024 4

I worked on this at Peer Lab, where Orta was around to ask questions, and I made it pretty far! I've got the API stood up on Heroku, and the dashboard working, there's just one integration point that isn't working (I didn't get to even looking at it yet, should be straightforward). It's looking very likely that we can get full Peril functionality without much effort; all the changes I've had to make so far are here.

I'd like to point out that, since this RFC was opened, we've seen a healthy level of activity on the Peril repo. @zephraph and I have both had our PRs merged quickly, to improve logging and stability, as well as other Peril users contributing improvements and of course, Orta too. When we have opened issues/PRs, they've been quickly addressed.

Given the active development on Peril, the promising early results of hosting our own installation, and (as @damassi pointed out) the tasks and org-wide functionality (which we cannot replicate with Danger alone), I feel more strongly than ever that Peril is worth the investment for Artsy. It provides a lot of value, both material and cultural; most importantly, it enables a culture of automating our own culture, which I go into a lot of detail about in this talk. Everyone on the team being able to hack on the team culture is a big cultural differentiator for Artsy, and so I remain a "no" vote on deprecating Peril. I recognize that, as-is, our use of Peril has problems, but it would be much more costly to replace Peril altogether than it would be to fix those specific problems.

from readme.

ashfurrow commented on May 26, 2024 3

I'm a 👎 on this. Peril is existing infrastructure that (mostly, I'll get to that) already works, whereas GitHub actions and unspecified third-party services don't inspire my confidence. Peril also applies organization-wide, where Danger and GitHub Actions (as I understand them) are per-repo, which would require significant setup.

It seems like the reasoning around deprecate Peril mainly comes down to infrastructural complexity, and with that I agree. We're running Peril on Orta's "hosted Peril" infrastructure, which indeed never got fully-shipped. An alternative solution to deprecating Peril would be to move Artsy's Peril installation back to Artsy's Heroku account, where we have the access to logs and so on, to make debugging easier. (Heroku infrastructure is described here). This would reduce most of the complexity of hosting Peril and bring Peril back under our infrastructure; reading through the Platform meeting notes, it seems like this was the big sticking point.

GitHub is arguably our most important tool for collaborating with each other, and Peril enables a whole kind of automation built on top of that tool. I would much rather have a single tool to automate over GitHub than many smaller ones. Let me know if I can clarify any of this. Edit: forgot to say that I don't necessarily agree with categorizing Peril as non-core; our engineering culture is a huge differentiator for our team, even just in terms of hiring, and Peril powers a lot of that culture.

from readme.

ashfurrow commented on May 26, 2024 3

@mdole that sounds great – I've added myself to the Platform Practice meeting calendar invitation, so as to not forget to show up this time 😅 I'd be happy to help in any way I can.

And thanks, Matt, for opening this RFC. It's sometimes difficult to have these conversations, but I learned a lot about how other engineers at Artsy see Peril. And this discussion has motivated me to dive in and contribute more to the OSS project itself, which I'm also grateful for 🙇

from readme.

mdole commented on May 26, 2024 3

Resolution

@ashfurrow and @zephraph stepped up and created a path that would allow us to easily move Peril to Artsy’s infrastructure, addressing much of the rationale for this RFC’s existence in the first place. The discussion also brought up the fact that we shouldn’t put too much faith in GitHub Actions without actually knowing what it’s capable of, which we won’t until after launch.

For now, the solution that makes the most sense to Justin and myself is to maintain Peril in its current form. We also want to figure out how we make it sustainable going forwards, and plan to discuss in this week’s Platform Practice.

To return to our original reasons for opening this RFC:

Debugging is incredibly difficult; access to logs is limited
- Ash + Justin successfully proved Peril can be moved to Artsy-hosted Heroku infrastructure, giving us access to logs.
PRs were submitted (and merged) to Peril to improve logging
- Peril is hosted on Infrastructure not owned or widely accessible by Artsy engineering
  Viable next steps for hosting Peril on Artsy’s Heroku account to be presented to Platform via @ashfurrow
Concern that Peril may not be actively developed in the future
- Still somewhat true, but as noted in the discussion above, it is still used by several orgs and is not in need of active development
In the framework Sam discussed at a recent H2 meeting, Peril falls in the Outsource bucket—it is not mission critical and it is a context project
- Opinions differ on this one. Regardless of exactly where it falls, Peril does still serve an important purpose for our org, and there isn’t a service available that would serve that purpose as fully. If we can continue to take advantage of Peril without over-investing our time and energy into it (which it seems likely we’ll be able to do given the progress made by Ash and Justin in recent weeks), it seems worth our while to maintain

Level of Support

7: RFC Rejected, with Conflicting Feedback.

Additional Context:

Really appreciate the spirited discussion - the net result was a huge positive, as Artsy engineers stepped up to the plate and took ownership over our implementation.

While this RFC has been rejected, there is still room for future exploration of services in this space. Once GH Actions launches, we can start a discussion around what gaps it might be able to fill in our current automation infrastructure.

Next Steps

We’ll wrap up the discussion in Platform Practice this week. To be resolved:

Overview of Peril setup and migration onto Artsy’s Heroku account
Who maintains Peril going forwards?
How do we make sure more than 1-2 engineers understand and feel comfortable working on it?
Where do engineers go for questions or support?

Possible suggestions:

Create a new "Automation" (or "GH Automation"? "Cultural Automation"?) section in README; create sub-docs for Peril and other services we use
Contribute documentation directly to Peril as well?
New working group for Peril - help empower developers to understand config, deployment, etc.

from readme.

ashfurrow commented on May 26, 2024 2

@mdole that sounds like a good plan! My only piece of feedback would be to be careful about judging Peril's use by the lack of active work on it. As I understand it, Peril is "done", so it wouldn't necessarily have a lot of ongoing work, you know? The infrastructure for hosted Peril is incomplete, but Peril itself is more-or-less stable. Two open source communities that I help manage use Peril, along with Gatsby, Fastlane, CocoaPods, and Wordpress. So we're definitely not on our own as Peril users 😄

from readme.

ashfurrow commented on May 26, 2024 2

Okay, so Justin and I looked deeper into this and discussed options with Orta. Moving back to Heroku has some unknowns, with possible solutions. We're not quite ready to make that move. Justin and I are going to investigate more (see below).

Revisiting the RFC and notes from the meeting, it seems like there are two main problems we're trying to address:

Difficulty in debugging Peril rules.
A desire to self-host/own our own infrastructure.

Justin and I are interested in taking shared ownership of Artsy's Peril installation, with an immediate focus on addressing these two concerns. We have some ideas on how to improve Peril's logging generally but also how we might solve the problems specific to Artsy (maybe with Heroku hosting, others are discussing something similar).

Our next steps are to spec out what work would be necessary to host our own full Peril (versus staying on Orta's infrastructure), and improving Peril's logging (to address debugability issues). Since Artsy owns our own dependencies, we would be investing time into Peril. I believe that Peril is core to Artsy Engineering's culture, and I think that we at least owe it to ourselves to to investigate how much work would be necessary to fix our problems before we deprecate it. If we run into a problem with Peril, then we should be opening an issue like any other open source project that we use.

from readme.

dblandin commented on May 26, 2024 2

Our next steps are to spec out what work would be necessary to host our own full Peril (versus staying on Orta's infrastructure), and improving Peril's logging (to address debugability issues).

@ashfurrow I'd be happy to help out with this effort!

from readme.

mdole commented on May 26, 2024 2

amazing! @ashfurrow thank you for putting in so much time, effort, and care. given the progress that's been made (appreciate the links), I would like to do a fresh writeup of our current status this week and have a brief chat at Platform Practice on Thursday - feels like we should consider closing this RFC if the main issues have been addressed. @zephraph does that sound good to you? want to work on the writeup together?

from readme.

zephraph commented on May 26, 2024 1

If it's that quick, I definitely think it'd be prudent for us to give it a go. Having peril integrated into a single environment (not running on lambdas spread in a different environment) might go a long way in helping the debuggability issues we've had in the past.

At the very least it gives us more context than we currently have and reduces the risk of having our setup on infra that we don't control.

The biggest hesitation I have currently is that there are generally a lot of unknowns on both sides (continuing with peril or finding a path to a different solution). I'd like to reduce some of those unknowns so we can make a more informed decision.

from readme.

joeyAghion commented on May 26, 2024 1

An alternative solution to deprecating Peril would be to move Artsy's Peril installation back to Artsy's Heroku account

From what I understand, those instructions do not work as they predate Peril being updated to work on AWS Lambda.

To the critical obstacles listed above I'd add that Peril's security model is challenged to align with some of our private projects or the need to isolate privileges across projects and environments.

Also I want to clarify that we have no real interest in self-hosting or owning our own infrastructure. However it's unacceptable to house important engineering functions and sensitive data on infrastructure controlled by an unaffiliated private party.

Based on Peril's vision and discussions, it's clear that it won't be a self-service solution without a lot of work. That's not to say it's not a useful tool. In fact, it is so useful that it frequently comes up as a solution to real day-to-day needs that we experience within engineering. That's why this RFC is extra-important: to clarify that Peril should not be the default tool when there are alternatives available, even if they are less powerful (like Danger).

I think engineering culture is so important (it's what I wake up and fall asleep thinking about), but to be clear Peril is not at all "core" in the terms we've used recently to aid prioritization. "Core" is defined as differentiating Artsy from the perspective of customers and leading to purchases. Is that uncomfortable and strict? Sure--but it needs to be if it's going to enable hard trade-offs.

I take "own your dependencies" to mean that we should not reinvent wheels and I'm certainly not suggesting that we build a Peril alternative. But I think we can and should evaluate it critically and weigh any investment carefully.

from readme.

jonallured commented on May 26, 2024

Sad face! But also I think this is a very pragmatic direction to head so I'm 👍.

from readme.

ansor4 commented on May 26, 2024

Would we be interested in porting over a low-risk service to the Github Actions beta? It could influence our decision to use GH actions in the future, but I could see it changing in ways that our findings would be moot.

from readme.

mdole commented on May 26, 2024

@ansor4 def worth thinking about - I'm not sure how that would work since the Actions beta is only for individuals and not for orgs, but if someone with access to the beta wants to try it out I'm all for it. or maybe @zephraph could speak to the possibilities more eloquently since I believe he's experimented with Actions a bit.

from readme.

zephraph commented on May 26, 2024

I do have access to GitHub actions beta... I could port one of the scheduled peril tasks (like a slack message) to actions via scheduled actions. Would be down to pair with someone on that.

Edit: It's worth noting that we just don't know what the ultimate capabilities of actions will be. We don't know how much of what Peril covers could be handled by actions.

from readme.

zephraph commented on May 26, 2024

@ashfurrow what do you think the LOE of getting us switched over onto a heroku instance would be?

from readme.

ashfurrow commented on May 26, 2024

@zephraph that's a good question. The GitHub app that we used to use still exists so there's not much additional setup on GitHub's end. We'd provision a new Heroku app, configure it with the GitHub app's credentials, and point the GitHub app at the Heroku install. Then we'd need to deactivate the hosted Peril app (to avoid having two Perils running). We could, additionally, send logs to Artsy's main Papertrail from the Heroku env vars. I'd say between 30–60 minutes? I'd be happy to help – I've hosted Peril installs on Heroku for a number of open source organizations, so I know my way around.

from readme.

ashfurrow commented on May 26, 2024

Agreed – I think it would.

from readme.

ashfurrow commented on May 26, 2024

It seems like moving Peril off of Orta's AWS infrastructure and onto our own Heroku account is pretty uncontroversial (even if we do deprecate Peril long-term). To avoid interrupting anyone's work, Justin and I are planning to work on this on Saturday at Peer Lab (unless anyone has objections).

from readme.

mdole commented on May 26, 2024

Good point! I should've done my research more thoroughly :)

from readme.

ashfurrow commented on May 26, 2024

@joeyAghion Thanks for clarifying there. That's a great point about unaffiliated third parties. To fill in the technical details, you're correct about Heroku (which is why Justin and I didn't switch over during the weekend). However, as I linked to above, there is interest in adding the Lambda-levels of functionality to self-hosted Heroku installations, which is what Justin is looking into.

I suppose you're right about the "core"-ness of Peril. However, that framework taken to its extreme conclusion would lead to a radically different team. A strictly, purely customer-focused Artsy Engineering team isn't one that open sources code, or blogs, or cares about psychological safety even. We do those things because ultimately they help us ship a better product. They aren't "core", but they are "core to Engineering", and that's what I was talking about (what differentiates Artsy Engineering from other engineering teams).

What I'm proposing is that Justin and I are given some time to explore self-hosting. If we can use full-feature Peril as a Node server, written in TypeScript, sending logs to Papertrail, then those are all technologies we're already super-familiar with.

from readme.

joeyAghion commented on May 26, 2024

Some of the things you mention are more about how we do things than what we do. Many are practically non-negotiable if we want to act humanely and professionally. To me, there's a big difference between those and extending Peril to be more conventional and self-service, which is a technical undertaking that will displace other projects given our limited capacity.

I only ask that we think practically about any further investment and next steps. There were a number of frustrating starts and stops recently that attempted to modify Peril for our purposes, such as giving it access to more internal resources (like APIs or Datadog) and iterating on the schema validation tooling.

from readme.

ashfurrow commented on May 26, 2024

I understand those frustrations – @mdole has helpfully filled me in on some context.

Certainly there's a difference between those things. I said "extreme conclusion" because my point was: we need balance. I'm all for practicality, too! It just seems, to me, that deprecating Peril with no concrete replacement at hand is unbalanced. (Edit: upon reflection, Joey's call to practicality really resonated with me, and I definitely want to emphasize that I am on-board with thoughtful next steps 👍)

from readme.

zephraph commented on May 26, 2024

My perspective on this has shifted over time.

When these conversations originally started happening, it was really around the difficulties that we were having. I'd made plans to tackle some of these challenges, but those fell through when other infrastructure work kicked into high gear. In the meantime people were having very real issues and wanted to know firmly what our next steps were.

When @mdole and I dug through what our peril rules were covering, we both concluded that other apps (or potentially actions) could shore up parts of those needs if required. Given our current state at that time and limited capacity to address issues the hold recommendation was made. From the platform practice meeting, the attendees wanted a firm statement on a course of action and that's where this RFC came from.

I was thinking about this problem through immediate needs. We generally also put a lot of stock into GitHub actions, but we really don't have a firm idea of the capabilities or cost of that service and it still doesn't meet our more immediate needs. There's also a lot of other things we didn't do or consider... Like digging deeper to get a sense of LOE of self hosting or just asking Orta for assistance.

This RFC is very pragmatic, but in hindsight it misses some nuance. Talking to @ashfurrow over the weekend really helped drive the perspective that having a single tool that's approachable to add these automation steps really empowers our team to rapidly develop high impact tooling. More practically though, we already have a lot of fragmentation at Artsy, and moving to leverage many different tools to handle the problems that Peril handles just means we have more things to configure (and debug when something goes wrong).

Now, ultimately, I agree with @joeyAghion. We need to be practical, mindful of our limited capacity, and ensure we're making healthy business decisions with our time.

That said, as a next step for Peril, I think it's worth trying to get us in a stable state. That ensures we don't have to sink a lot of time in rebuilding or replicating infrastructure that we already have. If successful we have a single tool (which we have knowledge of already) that we can continue to invest in to build low effort, high impact automation. The early indications are that the LOE isn't as high as we initially feared. If we can achieve this with reasonable effort, it's worth it.

from readme.

damassi commented on May 26, 2024

For those reading along Peril's tasks and org-wide functionality will be of interest:

https://github.com/artsy/peril-settings/tree/master/tasks
https://github.com/artsy/peril-settings/tree/master/org

from readme.

[RFC] Deprecate Peril about readme HOT 25 CLOSED

Comments (25)

Resolution

Level of Support

Additional Context:

Next Steps

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent