Giter Club home page Giter Club logo

Comments (19)

jacksontj avatar jacksontj commented on May 14, 2024 1

from victoriametrics.

valyala avatar valyala commented on May 14, 2024 1

That's a bit scary, I fixed a similar bug on trickster. Imo if the time is
wrong it should just error :)

Agreed and fixed it in the 667115a :) Proper error handling for invalid time parsing will be available in the next release of VictoriaMetrics.

from victoriametrics.

valyala avatar valyala commented on May 14, 2024

@rumanzo , thanks for the bug report.

In the mean time it would be great if you could provide tcpdump logs for requests generated by Promxy to VictoriaMetrics for the node_cpu_guest_seconds_total query and the corresponding response from VictoriaMetrics.

from victoriametrics.

rumanzo avatar rumanzo commented on May 14, 2024

I tried to collect all possible useful info and attach archive
victoriametrics_debug.zip

promxy.yml.txt

from victoriametrics.

valyala avatar valyala commented on May 14, 2024

Thanks for the additional info. BTW, which VictoriaMetrics version do you use? The similar issue with Promxy has been fixed in v1.13.1. The version may be determined by running ./victoria-metrics-prod --version. cc'ing @ThomasADavis , the original reporter of the issue #20.

from victoriametrics.

rumanzo avatar rumanzo commented on May 14, 2024

VictoriaMetrics version victoria-metrics-20190531-150038-tags-v1.18.9-0-gf2cf5d8e
Promxy version v0.0.38
And I saw issue #20 and it is about another case

from victoriametrics.

jacksontj avatar jacksontj commented on May 14, 2024

From the pcap:

POST /api/v1/query HTTP/1.1
Host: akostin-prometheus-1.openstacklocal:8428
User-Agent: Go-http-client/1.1
Content-Length: 169
Content-Type: application/x-www-form-urlencoded

query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09%3A01%3A14%2B03%3A00HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Thu, 06 Jun 2019 09:24:05 GMT
Content-Length: 63

{"status":"success","data":{"resultType":"matrix","result":[]}}

which means that this host responded with no data to the following query:

{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]&time=2019-06-06T09:01:14+03:00

Although the timestamp is a bit odd (usually its epoch looking) the query looks fine. So for whatever reason the datastore (I'm assuming VictoriaMetrics from this) is not returning data to that query. @rumanzo can you run that same query directly against VictoriaMetrics and see if you get the same result? (I assume you will, but once there is a reproducible query promxy is out of the debug loop-- which should speed things up a bit ;) ).

from victoriametrics.

rumanzo avatar rumanzo commented on May 14, 2024

I made two requests - to prometheus and to VictoriaMetrics, and VictoriaMetrics returned values, while prometheus is not oO

$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:9090/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \                                                                                                      
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
*   Trying 172.27.42.170:9090...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 9090 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:9090
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:28:56 GMT
< Content-Length: 117
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"error","errorType":"bad_data","error":"invalid parameter 'start': cannot parse \"\" to a valid timestamp"}%
$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
*   Trying 172.27.42.170:8428...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 8428 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:8428
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:29:10 GMT
< Content-Length: 914
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]}]}}

from victoriametrics.

ThomasADavis avatar ThomasADavis commented on May 14, 2024

I have not seen anything wrong lately - but we use promxy v0.0.38, with VM 1.18.8, and Prometheus 2.10.0, with Grafana v6.2.0

from victoriametrics.

jacksontj avatar jacksontj commented on May 14, 2024

From the prometheus API docs:

Input timestamps may be provided either in RFC3339 format or as a Unix timestamp in seconds, with optional decimal places for sub-second precision. Output timestamps are always represented as Unix timestamps in seconds.

So it seems that VictoriaMetrics support more timestamp types than prom does :) Whats interesting is that when I send the timestamp as you have on my prom stack with promxy I do get data back. So interesting, but not the issue we are seeing here.

In your query here you are showing data back when you hit http://akostin-prometheus-3.openstacklocal but the TCPdump shows no data from http://akostin-prometheus-1.openstacklocal. Do your 3 hosts all have different data? Right now you have all 3 hosts configured in the same servergroup -- which means promxy assumes they all have the same data (more detail at https://github.com/jacksontj/promxy#what-is-a-servergroup). If these 3 nodes are actually separate you can either have separate servergroups for each or use the relabel_configs to generate different labels per host.

from victoriametrics.

ThomasADavis avatar ThomasADavis commented on May 14, 2024

I get VM works, but Prometheus/Promxy does not, they both complain about the date/timestamp

from victoriametrics.

rumanzo avatar rumanzo commented on May 14, 2024

I have identical configs (prometheus, promxy, victoriametrics) on 3 identical servers. Promxy.yml config attached earlier.
The important thing is that when you change the port in promxy configuration to 9090 (prometheus) , everything starts working.

from victoriametrics.

jacksontj avatar jacksontj commented on May 14, 2024

Ah, then definitely sounds like an issue with VictoriaMetrics or the routing of metrics from prom to VictoriaMetrics. Promxy in this configuration will hit all 3 VM servers and take the first one that responds (assuming no "holes" in the data), so if (for example) the prometheus servers were only configured to write to 1 VM server promxy could return no data if one of the other 2 responded first. If the writes are being properly routed then it does sound like some issue VM side.

from victoriametrics.

valyala avatar valyala commented on May 14, 2024

So it seems that VictoriaMetrics support more timestamp types than prom does :)

VictoriaMetrics just silently replaces invalid time with the current time. This needs to be fixed, since it hides the real error - invalid time.

from victoriametrics.

valyala avatar valyala commented on May 14, 2024

@rumanzo , note that Promxy converts the initial request to /api/v1/query_range to range request to /api/v1/query. Could you issue the following request to all the VM nodes and make sure they return identical non-empty results:

http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z

Adjust the time arg in the url if necessary.

Make the same with Prometheus nodes and compare results.

from victoriametrics.

rumanzo avatar rumanzo commented on May 14, 2024

First I build VictoriaMeters from master and update all nodes to newer version (-20190606-203122-heads-master-0-gf4e63cd)

http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z

I checked on my nodes with this query - now all replies empty {"status":"success","data":{"resultType":"matrix","result":[]}}

And I try another query with another metrics. I get query from grafana query inspector, curl all endpoints, prometheus and victoriametrics gave me normal result, promxy gave me nothing. I found query to victoriametrics from promxy with tcpdump, and try curl with that modified query mannualy:

curl -v 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query?query=%7Binstance%3D~%22akostin-prometheus-1.openstacklocal%3A9100%22%2Cjob%3D~%22akostin-prometheus-servers%22%2C__name__%3D%22node_memory_Cached_bytes%22%7D%5B621s%5D&time=2019-06-06T23%3A56%3A20%2B03%3A00'

And result with prometheus (:9090) is normal, victoriametrics gave me empty result.

And notice that not all queries broken (standard node_exporter)
image

from victoriametrics.

valyala avatar valyala commented on May 14, 2024

It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled {__name__ op "string"} label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?

from victoriametrics.

rumanzo avatar rumanzo commented on May 14, 2024

It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled {__name__ op "string"} label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?

Hi. Yes, it works right! Thank you!

from victoriametrics.

valyala avatar valyala commented on May 14, 2024

Great! Closing the issue then.

from victoriametrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.