Hi! I try to deploy prometheus cluster with 3 standalone prometheus + victoriametrics

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

From the pcap: <div class="snippet-clipboard-content notranslate position-relative

From the <a href="https://prometheus.io/docs/prometheus/latest/querying/api/#format-ov

Drop-in replacement prometheus not work with promxy (Prometheus HA) about victoriametrics HOT 19 CLOSED

victoriametrics commented on May 14, 2024

Drop-in replacement prometheus not work with promxy (Prometheus HA)

from victoriametrics.

Comments (19)

jacksontj commented on May 14, 2024 1

That's a bit scary, I fixed a similar bug on trickster. Imo if the time is wrong it should just error :)

…

On Thu, Jun 6, 2019, 12:10 PM Aliaksandr Valialkin ***@***.***> wrote: So it seems that VictoriaMetrics support *more* timestamp types than prom does :) VictoriaMetrics just silently replaces invalid time with the current time <https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5cbd4cfca949a3c85354e2408e7ce1a7f9a683bc/app/vmselect/prometheus/prometheus.go#L479>. This needs to be fixed, since it hides the real error - invalid time. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#52?email_source=notifications&email_token=AAMLPHOSH2TOM7F3KSD4W33PZFOJZA5CNFSM4HTU3LM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXD3S4Y#issuecomment-499628403>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMLPHPLRNCFFITCQZGGY33PZFOJZANCNFSM4HTU3LMQ> .

from victoriametrics.

valyala commented on May 14, 2024 1

That's a bit scary, I fixed a similar bug on trickster. Imo if the time is
wrong it should just error :)

Agreed and fixed it in the 667115a :) Proper error handling for invalid time parsing will be available in the next release of VictoriaMetrics.

from victoriametrics.

valyala commented on May 14, 2024

@rumanzo , thanks for the bug report.

In the mean time it would be great if you could provide tcpdump logs for requests generated by Promxy to VictoriaMetrics for the node_cpu_guest_seconds_total query and the corresponding response from VictoriaMetrics.

from victoriametrics.

rumanzo commented on May 14, 2024

I tried to collect all possible useful info and attach archive
victoriametrics_debug.zip

promxy.yml.txt

from victoriametrics.

valyala commented on May 14, 2024

Thanks for the additional info. BTW, which VictoriaMetrics version do you use? The similar issue with Promxy has been fixed in v1.13.1. The version may be determined by running ./victoria-metrics-prod --version. cc'ing @ThomasADavis , the original reporter of the issue #20.

from victoriametrics.

rumanzo commented on May 14, 2024

VictoriaMetrics version victoria-metrics-20190531-150038-tags-v1.18.9-0-gf2cf5d8e
Promxy version v0.0.38
And I saw issue #20 and it is about another case

from victoriametrics.

jacksontj commented on May 14, 2024

From the pcap:

POST /api/v1/query HTTP/1.1
Host: akostin-prometheus-1.openstacklocal:8428
User-Agent: Go-http-client/1.1
Content-Length: 169
Content-Type: application/x-www-form-urlencoded

query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09%3A01%3A14%2B03%3A00HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Thu, 06 Jun 2019 09:24:05 GMT
Content-Length: 63

{"status":"success","data":{"resultType":"matrix","result":[]}}

which means that this host responded with no data to the following query:

{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]&time=2019-06-06T09:01:14+03:00

Although the timestamp is a bit odd (usually its epoch looking) the query looks fine. So for whatever reason the datastore (I'm assuming VictoriaMetrics from this) is not returning data to that query. @rumanzo can you run that same query directly against VictoriaMetrics and see if you get the same result? (I assume you will, but once there is a reproducible query promxy is out of the debug loop-- which should speed things up a bit ;) ).

from victoriametrics.

rumanzo commented on May 14, 2024

I made two requests - to prometheus and to VictoriaMetrics, and VictoriaMetrics returned values, while prometheus is not oO

$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:9090/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \                                                                                                      
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
*   Trying 172.27.42.170:9090...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 9090 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:9090
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:28:56 GMT
< Content-Length: 117
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"error","errorType":"bad_data","error":"invalid parameter 'start': cannot parse \"\" to a valid timestamp"}%

$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
*   Trying 172.27.42.170:8428...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 8428 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:8428
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:29:10 GMT
< Content-Length: 914
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]}]}}

from victoriametrics.

ThomasADavis commented on May 14, 2024

I have not seen anything wrong lately - but we use promxy v0.0.38, with VM 1.18.8, and Prometheus 2.10.0, with Grafana v6.2.0

from victoriametrics.

jacksontj commented on May 14, 2024

From the prometheus API docs:

Input timestamps may be provided either in RFC3339 format or as a Unix timestamp in seconds, with optional decimal places for sub-second precision. Output timestamps are always represented as Unix timestamps in seconds.

So it seems that VictoriaMetrics support more timestamp types than prom does :) Whats interesting is that when I send the timestamp as you have on my prom stack with promxy I do get data back. So interesting, but not the issue we are seeing here.

In your query here you are showing data back when you hit http://akostin-prometheus-3.openstacklocal but the TCPdump shows no data from http://akostin-prometheus-1.openstacklocal. Do your 3 hosts all have different data? Right now you have all 3 hosts configured in the same servergroup -- which means promxy assumes they all have the same data (more detail at https://github.com/jacksontj/promxy#what-is-a-servergroup). If these 3 nodes are actually separate you can either have separate servergroups for each or use the relabel_configs to generate different labels per host.

from victoriametrics.

ThomasADavis commented on May 14, 2024

I get VM works, but Prometheus/Promxy does not, they both complain about the date/timestamp

from victoriametrics.

rumanzo commented on May 14, 2024

I have identical configs (prometheus, promxy, victoriametrics) on 3 identical servers. Promxy.yml config attached earlier.
The important thing is that when you change the port in promxy configuration to 9090 (prometheus) , everything starts working.

from victoriametrics.

jacksontj commented on May 14, 2024

Ah, then definitely sounds like an issue with VictoriaMetrics or the routing of metrics from prom to VictoriaMetrics. Promxy in this configuration will hit all 3 VM servers and take the first one that responds (assuming no "holes" in the data), so if (for example) the prometheus servers were only configured to write to 1 VM server promxy could return no data if one of the other 2 responded first. If the writes are being properly routed then it does sound like some issue VM side.

from victoriametrics.

valyala commented on May 14, 2024

So it seems that VictoriaMetrics support more timestamp types than prom does :)

VictoriaMetrics just silently replaces invalid time with the current time. This needs to be fixed, since it hides the real error - invalid time.

from victoriametrics.

valyala commented on May 14, 2024

@rumanzo , note that Promxy converts the initial request to /api/v1/query_range to range request to /api/v1/query. Could you issue the following request to all the VM nodes and make sure they return identical non-empty results:

http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z

Adjust the time arg in the url if necessary.

Make the same with Prometheus nodes and compare results.

from victoriametrics.

rumanzo commented on May 14, 2024

First I build VictoriaMeters from master and update all nodes to newer version (-20190606-203122-heads-master-0-gf4e63cd)

http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z

I checked on my nodes with this query - now all replies empty {"status":"success","data":{"resultType":"matrix","result":[]}}

And I try another query with another metrics. I get query from grafana query inspector, curl all endpoints, prometheus and victoriametrics gave me normal result, promxy gave me nothing. I found query to victoriametrics from promxy with tcpdump, and try curl with that modified query mannualy:

curl -v 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query?query=%7Binstance%3D~%22akostin-prometheus-1.openstacklocal%3A9100%22%2Cjob%3D~%22akostin-prometheus-servers%22%2C__name__%3D%22node_memory_Cached_bytes%22%7D%5B621s%5D&time=2019-06-06T23%3A56%3A20%2B03%3A00'

And result with prometheus (:9090) is normal, victoriametrics gave me empty result.

And notice that not all queries broken (standard node_exporter)

from victoriametrics.

valyala commented on May 14, 2024

It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled {__name__ op "string"} label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?

from victoriametrics.

rumanzo commented on May 14, 2024

It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled {__name__ op "string"} label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?

Hi. Yes, it works right! Thank you!

from victoriametrics.

valyala commented on May 14, 2024

Great! Closing the issue then.

from victoriametrics.

Drop-in replacement prometheus not work with promxy (Prometheus HA) about victoriametrics HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent