Comments (6)
FYI: These are the metrics which are implemented by default: docs
from risingwave-operator.
TODOs:
- Currently metrics are labeled as coming from the
kube-rbac-proxy
and not from themanager
. We need to change that - Use
NewCounterVec
with the other metrics - Use
NewCounterVec
to implement the other attributes, e.g. the API version
from risingwave-operator.
@arkbriar I have a couple questions. Could you have a look, please?
- Should
webhook_request_pass_count
be incremented during calls to the mutating webhook? IMHO a request to a mutating webhook always passes and only a request to a validating webhook can be rejected. - I do not fully understand
controller_reconcile_requeue_after
. What is the difference tocontroller_reconcile_duration
. To be clear: We want to count the ms when we callRequeueAfter
, right? - I am not sure how much sense
controller_reconcile_panic_count
makes sense at the moment, since we do not have any calls topanic
in the reconciler - What collector do you refer to in the issue?
from risingwave-operator.
- Should
webhook_request_pass_count
be incremented during calls to the mutating webhook? IMHO a request to a mutating webhook always passes and only a request to a validating webhook can be rejected.
Yes, it should be incremented. The mutating webhook can reject a request by returning an error.
- I do not fully understand
controller_reconcile_requeue_after
. What is the difference tocontroller_reconcile_duration
. To be clear: We want to count the ms when we callRequeueAfter
, right?
Yes, you're right! IMO, controller_reconcile_requeue_after
should be a Histogram value and only be updated when the result contains a non-zero RequeueAfter
, e.g., Result{RequeueAfter: time.Second}
. It's quite different from the controller_reconcile_duration
which records the elapsed time of the Reconcile
method per execution. The controller_reconcile_duration
is similar to the controller_runtime_reconcile_time_seconds but with additional labels.
- I am not sure how much sense
controller_reconcile_panic_count
makes sense at the moment, since we do not have any calls topanic
in the reconciler
Panics can be implicitly triggered, e.g., divide-zero panics. Panic always means bugs that need to be fixed ASAP, but also we don't want conditional panics caused by some objects to affect others. So the best idea is to recover from the panic and let the controller keep running, and of course record it so that we can set an alert on this. Currently, there's no recovery implemented and I think it's easy to add one while adding the controller_reconcile_panic_count
metric.
- What collector do you refer to in the issue?
Oh, I mean the metric collectors, i.e., the codes for recording the metrics. And by proxy I mean we can use a proxy pattern to do that, like the following:
type MutatingWebhook interface {
Default(context.Context, runtime.Object) error
}
type MutatingWebhookMetricsRecorder struct {
// extra labels
// ...
// webhook
webhook MutatingWebhook
}
func (r *MutatingWebhookMetricsRecorder) Default(ctx context.Context, obj runtime.Object) error {
r.RecordBefore()
defer r.RecordAfter()
return r.webhook.Default(ctx, obj)
}
from risingwave-operator.
Closing because the targets are all implemented! Thanks @CAJan93 for your efforts in this!
from risingwave-operator.
Thanks for closing. My pleasure
from risingwave-operator.
Related Issues (20)
- same slot error HOT 2
- `risingwave-operator.yaml` is missing in the v0.5.1 release HOT 3
- add connector node to all the example yaml files HOT 1
- Supports Helm chart HOT 2
- Link helm chart in the README
- Optimize the upgrading process
- End-to-end test for standalone mode
- Embedded connector support
- Use unified `config-path` and `prometheus-listener-addr` in standalone mode
- Deprecated dedicated connectors
- support deploy risingwave on obs
- Apply new object store: OpenDAL S3 HOT 5
- Activate heap profiler on the frontend node HOT 1
- Deprecate `.status.componentReplicas.connector` HOT 1
- Use grpc health check if possible
- SSL support
- Allow HTTP HOT 2
- Improve liveness/readiness check
- only auto-bump kernel image version if kernel releases a stable version HOT 5
- "error"unable to update status: RisingWave.risingwave.risingwavelabs.com "CRD_NAME" is invalid: componentReplicas.connector: Required value HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from risingwave-operator.