Comments (9)
OR, in the injector, remove the finalizer before writing to sysrq and re-add the finalizer on failure (since it is the only part of the code that will still be running if the injection fails) so we can catch that later
I like this option most, I'll draft up a PR for Nikos to test
from chaos-controller.
Oh I see. This isn't a problem for us because the node gets replaced on any nodeFailure disruption, which is why the only way we handle it is to check if the target node is gone, and if yes, go ahead and clean up the stuck pod.
The only idea for a fix I have for this is to go ahead and remove the chaos pod finalizer from within before we create the kernel panic, though it runs the risk that the disruption doesn't occur
from chaos-controller.
Interesting.
I'm curious why the node gets replaced for you.
Does its health-check fail and auto-scaling kicks in?
If this is the case, what is the difference between the 2 node failures for you from a behavior perspective?
I think it is sensible to remove the finalizer once the Disruption expires?
from chaos-controller.
I think it is sensible to remove the finalizer once the Disruption expires?
Hmm, for node failures specifically, that seems fine? There's nothing to clean, really. @Devatoria ?
If this is the case, what is the difference between the 2 node failures for you from a behavior perspective?
Almost nothing, afaict
I'm curious why the node gets replaced for you.
I think it's cloud provider health checks failing
from chaos-controller.
@ptnapoleon Would it be worth adding a bit more logic on node disruption cleanup? I see multiple options here:
- in the controller, ensure the node is
NotReady
if the chaos pod exit code is not 0 (we can conclude that the node is disrupted) before removing the finalizer - OR, in the injector, remove the finalizer before writing to sysrq and re-add the finalizer on failure (since it is the only part of the code that will still be running if the injection fails) so we can catch that later
Wdyt?
from chaos-controller.
I'm testing #503 . It seems simple and safe. Altering the finalizer logic by having chaos pods mutate their own finalizer was causing reconcile issues in the controller
from chaos-controller.
This seems to be working locally and on staging for us, let me know if it's sufficient for you!
from chaos-controller.
Thanks Philip!
I will test this today in our clusters and will let you know.
from chaos-controller.
@ptnapoleon perfect, this works as expected. Thank you Philip.
from chaos-controller.
Related Issues (20)
- User Request: Feature flag to disable deletion of Disruption HOT 3
- User Request: Support for percentage in CPU stress HOT 3
- User Request: Debugging instructions HOT 3
- User Issue: Status not being a subresource causes issues with control planes HOT 5
- User Issue: Traffic surge once a packet drop network failure finishes HOT 6
- User Issue: CPU pressure does not consume 100% of the pods allocated CPU HOT 1
- User Request: store failures in the Custom Resource's status HOT 11
- User Issue: Security vulnerabilities flagged in Docker images with Go 1.16 HOT 4
- User Request: Release Dynamic Targeting behind a feature flag in controller HOT 4
- User Issue: Unable to terminate node level network experiments HOT 8
- Static Targeting example HOT 2
- User Request: Release cloudProviders behind a feature flag in controller HOT 5
- User Issue: Unable to gracefully terminate pods container HOT 9
- Guidance running CPU pressure experiments HOT 5
- User Issue / Suggestion: Controller arguments do not supersede config file HOT 2
- User Issue: Error pulling image with 7.19.0
- User Issue: CPU Pressure experiment could not inject the disruption successfully HOT 5
- Unable to dynamically target and experiments end on PreviouslyPartiallyInjected 7.22 HOT 4
- 7.26.0 Upgrade Issues - CrashBackLoop HOT 3
- 7.26.0 Upgrade Issues - InjectionStatus PreviouslyPartiallyInjected HOT 21
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chaos-controller.