Comments (19)
from victoriametrics.
That's a bit scary, I fixed a similar bug on trickster. Imo if the time is
wrong it should just error :)
Agreed and fixed it in the 667115a :) Proper error handling for invalid time parsing will be available in the next release of VictoriaMetrics.
from victoriametrics.
@rumanzo , thanks for the bug report.
In the mean time it would be great if you could provide tcpdump logs for requests generated by Promxy to VictoriaMetrics for the node_cpu_guest_seconds_total
query and the corresponding response from VictoriaMetrics.
from victoriametrics.
I tried to collect all possible useful info and attach archive
victoriametrics_debug.zip
from victoriametrics.
Thanks for the additional info. BTW, which VictoriaMetrics version do you use? The similar issue with Promxy has been fixed in v1.13.1
. The version may be determined by running ./victoria-metrics-prod --version
. cc'ing @ThomasADavis , the original reporter of the issue #20.
from victoriametrics.
VictoriaMetrics version victoria-metrics-20190531-150038-tags-v1.18.9-0-gf2cf5d8e
Promxy version v0.0.38
And I saw issue #20 and it is about another case
from victoriametrics.
From the pcap:
POST /api/v1/query HTTP/1.1
Host: akostin-prometheus-1.openstacklocal:8428
User-Agent: Go-http-client/1.1
Content-Length: 169
Content-Type: application/x-www-form-urlencoded
query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09%3A01%3A14%2B03%3A00HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Thu, 06 Jun 2019 09:24:05 GMT
Content-Length: 63
{"status":"success","data":{"resultType":"matrix","result":[]}}
which means that this host responded with no data to the following query:
{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]&time=2019-06-06T09:01:14+03:00
Although the timestamp is a bit odd (usually its epoch looking) the query looks fine. So for whatever reason the datastore (I'm assuming VictoriaMetrics from this) is not returning data to that query. @rumanzo can you run that same query directly against VictoriaMetrics and see if you get the same result? (I assume you will, but once there is a reproducible query promxy is out of the debug loop-- which should speed things up a bit ;) ).
from victoriametrics.
I made two requests - to prometheus and to VictoriaMetrics, and VictoriaMetrics returned values, while prometheus is not oO
$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:9090/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
* Trying 172.27.42.170:9090...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 9090 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:9090
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:28:56 GMT
< Content-Length: 117
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"error","errorType":"bad_data","error":"invalid parameter 'start': cannot parse \"\" to a valid timestamp"}%
$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
* Trying 172.27.42.170:8428...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 8428 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:8428
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:29:10 GMT
< Content-Length: 914
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]}]}}
from victoriametrics.
I have not seen anything wrong lately - but we use promxy v0.0.38, with VM 1.18.8, and Prometheus 2.10.0, with Grafana v6.2.0
from victoriametrics.
From the prometheus API docs:
Input timestamps may be provided either in RFC3339 format or as a Unix timestamp in seconds, with optional decimal places for sub-second precision. Output timestamps are always represented as Unix timestamps in seconds.
So it seems that VictoriaMetrics support more timestamp types than prom does :) Whats interesting is that when I send the timestamp as you have on my prom stack with promxy I do get data back. So interesting, but not the issue we are seeing here.
In your query here you are showing data back when you hit http://akostin-prometheus-3.openstacklocal
but the TCPdump shows no data from http://akostin-prometheus-1.openstacklocal
. Do your 3 hosts all have different data? Right now you have all 3 hosts configured in the same servergroup -- which means promxy assumes they all have the same data (more detail at https://github.com/jacksontj/promxy#what-is-a-servergroup). If these 3 nodes are actually separate you can either have separate servergroups for each or use the relabel_configs
to generate different labels per host.
from victoriametrics.
I get VM works, but Prometheus/Promxy does not, they both complain about the date/timestamp
from victoriametrics.
I have identical configs (prometheus, promxy, victoriametrics) on 3 identical servers. Promxy.yml config attached earlier.
The important thing is that when you change the port in promxy configuration to 9090 (prometheus) , everything starts working.
from victoriametrics.
Ah, then definitely sounds like an issue with VictoriaMetrics or the routing of metrics from prom to VictoriaMetrics. Promxy in this configuration will hit all 3 VM servers and take the first one that responds (assuming no "holes" in the data), so if (for example) the prometheus servers were only configured to write to 1 VM server promxy could return no data if one of the other 2 responded first. If the writes are being properly routed then it does sound like some issue VM side.
from victoriametrics.
So it seems that VictoriaMetrics support more timestamp types than prom does :)
VictoriaMetrics just silently replaces invalid time with the current time. This needs to be fixed, since it hides the real error - invalid time.
from victoriametrics.
@rumanzo , note that Promxy converts the initial request to /api/v1/query_range
to range request to /api/v1/query
. Could you issue the following request to all the VM nodes and make sure they return identical non-empty results:
http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z
Adjust the time
arg in the url if necessary.
Make the same with Prometheus nodes and compare results.
from victoriametrics.
First I build VictoriaMeters from master and update all nodes to newer version (-20190606-203122-heads-master-0-gf4e63cd)
I checked on my nodes with this query - now all replies empty {"status":"success","data":{"resultType":"matrix","result":[]}}
And I try another query with another metrics. I get query from grafana query inspector, curl all endpoints, prometheus and victoriametrics gave me normal result, promxy gave me nothing. I found query to victoriametrics from promxy with tcpdump, and try curl with that modified query mannualy:
curl -v 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query?query=%7Binstance%3D~%22akostin-prometheus-1.openstacklocal%3A9100%22%2Cjob%3D~%22akostin-prometheus-servers%22%2C__name__%3D%22node_memory_Cached_bytes%22%7D%5B621s%5D&time=2019-06-06T23%3A56%3A20%2B03%3A00'
And result with prometheus (:9090) is normal, victoriametrics gave me empty result.
And notice that not all queries broken (standard node_exporter)
from victoriametrics.
It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled {__name__ op "string"}
label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?
from victoriametrics.
It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled
{__name__ op "string"}
label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?
Hi. Yes, it works right! Thank you!
from victoriametrics.
Great! Closing the issue then.
from victoriametrics.
Related Issues (20)
- High Storage connection saturation HOT 10
- feat: vmagent: auto metrics for the size of scraped body
- No -cluster tag for v1.97.3 LTS release HOT 5
- "go back" button in browser doesn't work properly with VMUI HOT 1
- VictoriaLogs: how to get all fields name, how to get log rows count, how to know get storage usage and disk compress ratio? HOT 3
- vmauth doesn't help spreading loads for vminsert instances HOT 2
- vmagent: Potential drop of incorrect persistence queue when removing remotewrite URL HOT 3
- Add Read Usage Statistic on Cardinality Explorer page
- vmagateway: ingestion stops if qps limit reached HOT 1
- vmui: auto-complete doesn't work properly after `{` if there is no metric name in front of `{`
- vmagent metrics per (scrape) job
- Enhancement to search.maxUniqueTimeseries and indexdb population for long interval time series
- vmbackup httpAuth 401 HOT 4
- Slow inserts remain high, although the churn rate and the number of active series have decreased HOT 5
- vm-select: add http header log when query error or slow query occurs HOT 2
- rate() and irate() issues on constant data
- Proposal: Introducing end-to-end test for each vm component
- vmui: query trace fails to expand on mouse click
- Single docker compose file which has prebuild images too for easy self hosting HOT 1
- Question about vmagent 1.93.5 and memory usage with "last" streaming aggregation function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from victoriametrics.