Giter Club home page Giter Club logo

Comments (8)

dolik-rce avatar dolik-rce commented on August 30, 2024

This might work in theory... I see few practical problems:

  1. In step 1, the only way to clear the dict A is to iterate over get_keys, adding even more overhead.
  2. In step 2, copying the previous contents from A to B
    a. should not overwrite gauges, only increase counters
    b. requires (again) iterating over get_keys
  3. Switches between writing to A/B would have to be atomic, i.e. happening in all nginx workers at the same time.
  4. You'd have to make sure that there is always at most one nginx worker calling collect. That means using locks and potentially blocking other requests asking for metrics. This might actually be acceptable trade-off, since metrics are usually collected just a few times per minute.

First two problems are probably easily solvable (e.g. by doing everything in one iteration). The other two have solution too... Locks can be implemented on top of dict. Atomicity can also be ensured by using some indirection and dict. The only question is, how much slower it would be and how much of a trouble is that for users.

Anyway, I would be happy to test it, if you have time to code it.

from nginx-lua-prometheus.

knyar avatar knyar commented on August 30, 2024

That's a good idea, thanks for suggesting it.

As you mentioned, actually merging results back is the tricky bit; while counters can just be incremented atomically, for gauges there will be a race between the thread copying old data from A to B and other threads changing gauge values. One way to avoid this is to actually have three dicts - two for counters, and a single one for gauges (that will not be swapped). I suspect most users will have significantly more counters than gauges (especially given that histograms are implemented with counters), so this will have the desired performance benefit.

Note, that shared dictionaries are defined outside of the library, so this will not be backwards compatible; users will need to provide several dicts now instead of one.

Finally, we'll need to implement locking to make sure collect is not called concurrently. There are libraries (example) implementing locks on top of dictionaries, but this will add additional complexity.

We'd probably also want to have some sort of performance benchmark added first, and then use it to validate performance improvements from this change.

In general, this is not something I am planning to implement myself, but will be happy to collaborate on design and review the code.

from nginx-lua-prometheus.

knyar avatar knyar commented on August 30, 2024

Oh nice, @dolik-rce beat me to the first response with a very similar set of concerns :) Thanks!

I agree that switching between dicts atomically is another tricky bit about this.

My previous thinking about resolving the performance issue was to add support for other key/value stores (besides builtin dict) that provide a similar API - for example, an external Redis instance. I suspect this actually might be easier to implement correctly than doing the double dictionary dance.

from nginx-lua-prometheus.

juancampa avatar juancampa commented on August 30, 2024

Thanks for reading and writing back your concerns @knyar @dolik-rce, that's really helpful. We're thinking about using this library and just wanted to make sure that, if we ever hit a performance wall, there are ways around it. 🙌

I won't be working on this any time soon, unless it affects us, which it hasn't yet. So I'm gonna go ahead and close this one.

Thanks again!

from nginx-lua-prometheus.

dolik-rce avatar dolik-rce commented on August 30, 2024

My previous thinking about resolving the performance issue was to add support for other key/value stores (besides builtin dict) that provide a similar API

Well, I actually use a patched version of this module using yet another approach to avoid get_keys... If you store the keys as values in the dictionary, with well known keys (e.g. using sequence of integers), it is quite simple to read them back later using just non-blocking get calls. The trade-off here is that you use more space in the dictionary. The only tricky bit in this approach is storing the keys atomically, which I solved by using locks. But thinking of it now, it might be possible to do using just the properties of the dict. I'll have to look at it again 😄

from nginx-lua-prometheus.

knyar avatar knyar commented on August 30, 2024

That's also a nice idea, @dolik-rce. I think you don't need locking if you can accept the same key being stored multiple times when several threads are racing to create a new metric, and then ignore duplicate names while reading the keys.

I will be curious to see your current implementation. The way I would probably do it is:

  1. reserve part of the dict key space for tracking metric names - for example, keys starting with __metric.
  2. store metric names using keys like __metric_{idx} (where idx is an integer counter).
  3. create a __metric_last integer key which will store the index of the last metric name.

Then, while writing a value:

  1. check if there is an existing value stored for this metric.
  2. if there is no value (metric does not exist), increment __metric_last. incr will return the new value (idx).
  3. write metric value.
  4. write metric name as __metric_{idx}.

If multiple threads race between steps 1 & 3, the same metric name will be written with several indexes.

While collecting metrics:

  1. use a separate lua table to keep track of metric names that have been processed.
  2. iterate from 1 to __metric_last to get all metric names. Discard duplicates.
  3. read all metric values.

What do you think?

from nginx-lua-prometheus.

dolik-rce avatar dolik-rce commented on August 30, 2024

That is actually almost exactly the algorithm I use, good to see you arrived to same conclusions :-) The only part I didn't think of is resolving the race condition when collecting the metrics (I actually found there is one just last week, when looking at the code because of this issue). Additionally, I used a table to keep all the metrics keys in each worker. This table can be quickly updated by iterating only over newer indexes (since we only add metrics, never modify or remove) and I used it to implement nice and fast prometheus::get_keys() method.

I think it could actually be even slightly simpler if you omit the __metric_last. You don't actually need it, since you can iterate over the __metric_{idx} keys in a while loop and stop when you get first nil. Inserting would be slightly more complicated, but I think it might be possible to do it in safe and lock-less way. I'd keep a variable to track last known metric count and just try safe_add, until it succeeds. In most cases, all the metrics are created very soon after start of the server, so it would not present much overhead later when the application is already warmed up.

from nginx-lua-prometheus.

knyar avatar knyar commented on August 30, 2024

FYI, there have been some recent performance improvements for counter metrics that should decrease lock contention: b7c8feb...d8c2813

Counters are now incremented independently by each worker process, with deltas flushed regularly to the shared dictionary asynchronously. Changes are now in master, and I will cut a new release in a few days.

Even though implementation is different to the one described in this issue, I think it largely addresses performance concerns. I will close this issue now, but please feel free to reopen (or create a new issue or a PR) if you would like to suggest further improvements.

from nginx-lua-prometheus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.