If the remote system's time drifts from the local system's time, a potential memory le

We (<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Wrong expiry of sampled metrics if remote system's time drifts about influxdb_exporter HOT 7 CLOSED

prometheus commented on May 24, 2024

Wrong expiry of sampled metrics if remote system's time drifts

from influxdb_exporter.

Comments (7)

Fluepke commented on May 24, 2024

We (@hellerve and me) are working on a fix

from influxdb_exporter.

matthiasr commented on May 24, 2024

Hmm, I am not sure I agree with this. The sample time is in the protocol because it could be different from "now". That only works so-so with Prometheus, but the exporter does have rudimentary support for that.
If we start keeping track of submission time, we now impose on the exporter (and its operators) the complexity of a second, hidden timestamp. It is not immediately clear to me when which time would take effect – for example, why do we not ignore the submitted timestamps when specifying --timestamps?
I know keeping exact time is difficult, but with expiration on the order of seconds or minutes, is it unreasonable to expect to keep system clocks in sync enough? Does every system that receives time need to handle wibbly wobbly?

from influxdb_exporter.

matthiasr commented on May 24, 2024

How would influxdb handle these timestamps?

from influxdb_exporter.

vidister commented on May 24, 2024

At the moment the influx_exporter uses the date supplied from the data source to determine if the entrys should be deleted or left in place. That works fine when all the devices share the same system time.
But if, for example, your device lives in "the future", data will accumulate until the memory of your server is full. If the device lives in the past, data will never reach prometheus because it will be deleted before prometheus can even ask for it.

The in #62 suggested change doesn't affect the timestamp of the data itself, it just introduces a new value utilizing the system time for the decision whether to keep or discard the data.

from influxdb_exporter.

hellerve commented on May 24, 2024

I guess this boils down to a fundamental question: is a data point “fresh” when the supplier says it is, or is it fresh when it arrives in the system? I think fundamentally both approaches are valid, and both introduce weird error cases.

If we assume that a data point is fresh when whoever sends it says it is, time drift fundamentally changes how we look at the underlying data—it might be expired by the time it even arrives in our system. We trust the data, and take a hands-off approach

If we assume that a data point is fresh when it arrives in our system, we assume that only current data points will ever make it into a request to us. This leads to a different error case, where we impose semantics on someone else’s data (and it leads to two timestamps that should be equivalent or at least close but might not be, because the world is big and messy). We don’t trust the data, and take a hands-on approach.

In both cases, we fix an issue with the other approach, and it might not be possible to get to a “best of both worlds” situation here. Also: in both cases, we should probably at least document the potential error case.

from influxdb_exporter.

hellerve commented on May 24, 2024

I know keeping exact time is difficult, but with expiration on the order of seconds or minutes, is it unreasonable to expect to keep system clocks in sync enough? Does every system that receives time need to handle wibbly wobbly?

I also want to add to this question for a second. The reason we found this bug was because playing around with this system we got metrics sent from a (lab) device that was a little wonky—as lab devices often are. It sent us data with a timestamp from last year!

"This shouldn't happen! This device shouldn't even be operational!" was our first reaction, too. But it was, and it ran without any problems, except that the metrics didn't show up. Of course this needs to be fixed on the device, but I'm saying this to illustrate that there are weird systems in this world that we have to deal with in some way. Both approaches above are valid, but we have to acknowledge that they might fail in some cases.

from influxdb_exporter.

matthiasr commented on May 24, 2024

@hellerve thank you, you have perfectly summarized the core of the issue.

After thinking about this for a while, I would like to leave things as they are. As you pointed out, both approaches are valid. However, we already use and handle the client-supplied timestamps (if --timestamps is enabled). This would not change; so now we would be treating the same sample differently in different contexts.

In the end, there is no perfect way to deal with clients that are significantly out of time; all we can hope for is to deal with them consistently.

from influxdb_exporter.

Wrong expiry of sampled metrics if remote system's time drifts about influxdb_exporter HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent