Comments (7)
We (@hellerve and me) are working on a fix
from influxdb_exporter.
Hmm, I am not sure I agree with this. The sample time is in the protocol because it could be different from "now". That only works so-so with Prometheus, but the exporter does have rudimentary support for that.
If we start keeping track of submission time, we now impose on the exporter (and its operators) the complexity of a second, hidden timestamp. It is not immediately clear to me when which time would take effect – for example, why do we not ignore the submitted timestamps when specifying --timestamps
?
I know keeping exact time is difficult, but with expiration on the order of seconds or minutes, is it unreasonable to expect to keep system clocks in sync enough? Does every system that receives time need to handle wibbly wobbly?
from influxdb_exporter.
How would influxdb handle these timestamps?
from influxdb_exporter.
At the moment the influx_exporter uses the date supplied from the data source to determine if the entrys should be deleted or left in place. That works fine when all the devices share the same system time.
But if, for example, your device lives in "the future", data will accumulate until the memory of your server is full. If the device lives in the past, data will never reach prometheus because it will be deleted before prometheus can even ask for it.
The in #62 suggested change doesn't affect the timestamp of the data itself, it just introduces a new value utilizing the system time for the decision whether to keep or discard the data.
from influxdb_exporter.
I guess this boils down to a fundamental question: is a data point “fresh” when the supplier says it is, or is it fresh when it arrives in the system? I think fundamentally both approaches are valid, and both introduce weird error cases.
If we assume that a data point is fresh when whoever sends it says it is, time drift fundamentally changes how we look at the underlying data—it might be expired by the time it even arrives in our system. We trust the data, and take a hands-off approach
If we assume that a data point is fresh when it arrives in our system, we assume that only current data points will ever make it into a request to us. This leads to a different error case, where we impose semantics on someone else’s data (and it leads to two timestamps that should be equivalent or at least close but might not be, because the world is big and messy). We don’t trust the data, and take a hands-on approach.
In both cases, we fix an issue with the other approach, and it might not be possible to get to a “best of both worlds” situation here. Also: in both cases, we should probably at least document the potential error case.
from influxdb_exporter.
I know keeping exact time is difficult, but with expiration on the order of seconds or minutes, is it unreasonable to expect to keep system clocks in sync enough? Does every system that receives time need to handle wibbly wobbly?
I also want to add to this question for a second. The reason we found this bug was because playing around with this system we got metrics sent from a (lab) device that was a little wonky—as lab devices often are. It sent us data with a timestamp from last year!
"This shouldn't happen! This device shouldn't even be operational!" was our first reaction, too. But it was, and it ran without any problems, except that the metrics didn't show up. Of course this needs to be fixed on the device, but I'm saying this to illustrate that there are weird systems in this world that we have to deal with in some way. Both approaches above are valid, but we have to acknowledge that they might fail in some cases.
from influxdb_exporter.
@hellerve thank you, you have perfectly summarized the core of the issue.
After thinking about this for a while, I would like to leave things as they are. As you pointed out, both approaches are valid. However, we already use and handle the client-supplied timestamps (if --timestamps
is enabled). This would not change; so now we would be treating the same sample differently in different contexts.
In the end, there is no perfect way to deal with clients that are significantly out of time; all we can hope for is to deal with them consistently.
from influxdb_exporter.
Related Issues (20)
- How to Expose InfluxDB ping HOT 2
- udp listener not responding after getting parse error. HOT 1
- error occur when configured as heapster --sink HOT 4
- when new tag is added it throws error HOT 4
- Update CircleCI configuration to 2.0 format HOT 2
- https://community.influxdata.com/t/input-data-formats-json-to-influxdb-issue/4654 HOT 1
- label dimensions inconsistent with previously collected metrics HOT 6
- Use the Makefile.common HOT 1
- influxdb export send data to prometheus error HOT 3
- Clarify purpose of the exporter + point at InfluxDB's native integrations HOT 1
- MAINTAINER instruction in Dockerfile is deprecated
- Errors when push standart golang client metrics to influxdb_exporet HOT 4
- Switch logging to go-kit HOT 3
- Error messages should have JSON format HOT 4
- How should I set the address of the influxDB for listening HOT 1
- Add HTTPS and Basic auth HOT 3
- Implement InfluxDB 2.0 endpoints as appropriate HOT 3
- Add test for gzip'ed ingestion HOT 1
- Configuration of Influxdb exporter HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from influxdb_exporter.