Giter Club home page Giter Club logo

Comments (6)

CAJan93 avatar CAJan93 commented on August 15, 2024 1

FYI: These are the metrics which are implemented by default: docs

from risingwave-operator.

CAJan93 avatar CAJan93 commented on August 15, 2024 1

TODOs:

  • Currently metrics are labeled as coming from the kube-rbac-proxy and not from the manager. We need to change that
  • Use NewCounterVec with the other metrics
  • Use NewCounterVec to implement the other attributes, e.g. the API version

from risingwave-operator.

CAJan93 avatar CAJan93 commented on August 15, 2024

@arkbriar I have a couple questions. Could you have a look, please?

  • Should webhook_request_pass_count be incremented during calls to the mutating webhook? IMHO a request to a mutating webhook always passes and only a request to a validating webhook can be rejected.
  • I do not fully understand controller_reconcile_requeue_after. What is the difference to controller_reconcile_duration. To be clear: We want to count the ms when we call RequeueAfter, right?
  • I am not sure how much sense controller_reconcile_panic_count makes sense at the moment, since we do not have any calls to panic in the reconciler
  • What collector do you refer to in the issue?

from risingwave-operator.

arkbriar avatar arkbriar commented on August 15, 2024
  • Should webhook_request_pass_count be incremented during calls to the mutating webhook? IMHO a request to a mutating webhook always passes and only a request to a validating webhook can be rejected.

Yes, it should be incremented. The mutating webhook can reject a request by returning an error.

  • I do not fully understand controller_reconcile_requeue_after. What is the difference to controller_reconcile_duration. To be clear: We want to count the ms when we call RequeueAfter, right?

Yes, you're right! IMO, controller_reconcile_requeue_after should be a Histogram value and only be updated when the result contains a non-zero RequeueAfter, e.g., Result{RequeueAfter: time.Second}. It's quite different from the controller_reconcile_duration which records the elapsed time of the Reconcile method per execution. The controller_reconcile_duration is similar to the controller_runtime_reconcile_time_seconds but with additional labels.

  • I am not sure how much sense controller_reconcile_panic_count makes sense at the moment, since we do not have any calls to panic in the reconciler

Panics can be implicitly triggered, e.g., divide-zero panics. Panic always means bugs that need to be fixed ASAP, but also we don't want conditional panics caused by some objects to affect others. So the best idea is to recover from the panic and let the controller keep running, and of course record it so that we can set an alert on this. Currently, there's no recovery implemented and I think it's easy to add one while adding the controller_reconcile_panic_count metric.

  • What collector do you refer to in the issue?

Oh, I mean the metric collectors, i.e., the codes for recording the metrics. And by proxy I mean we can use a proxy pattern to do that, like the following:

type MutatingWebhook interface {
  Default(context.Context, runtime.Object) error
}

type MutatingWebhookMetricsRecorder struct {
   // extra labels
   // ...
   
   // webhook
   webhook MutatingWebhook
}

func (r *MutatingWebhookMetricsRecorder) Default(ctx context.Context, obj runtime.Object) error {
  r.RecordBefore()
  defer r.RecordAfter()
 
  return r.webhook.Default(ctx, obj)
}

from risingwave-operator.

arkbriar avatar arkbriar commented on August 15, 2024

Closing because the targets are all implemented! Thanks @CAJan93 for your efforts in this!

from risingwave-operator.

CAJan93 avatar CAJan93 commented on August 15, 2024

Thanks for closing. My pleasure

from risingwave-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.