Comments (2)
Hi, to add a bit of context, when upgrading from Mimir 2.10 to Mimir 2.12 we started to see increased latency and small error rate on the read path. At the same time we noticed the number of tcp connections to query-scheduler went from stable to going up and down.
This seems to have been caused by this change https://github.com/grafana/mimir/pull/7269/files#diff-7fd5824797e825650064e35cfdea31cf25162114e24bc754f648de77cff4ff06L53
removing extra args from the query-scheduler.
- "-server.grpc.keepalive.max-connection-age=2562047h" # 100000 days, effectively infinity
- "-server.grpc.keepalive.max-connection-age-grace=2562047h" # 100000 days, effectively infinity
Which were previously added as part of https://github.com/grafana/mimir/pull/3262/files.
Looking at sample traces where the requests ended with http status code 500 seems like retries were exhausted before new connection was established.
Another example would be when queries just take roughly 1s to enqueue, succeeding but increasing latency even on light queries.
As mention on slack in this thread https://grafana.slack.com/archives/C039863E8P7/p1715625953274669?thread_ts=1714333917.446309&cid=C039863E8P7, adding this args back to query-scheduler seems to fix / minimize the issue.
- "-server.grpc.keepalive.max-connection-age=2562047h" # 100000 days, effectively infinity
- "-server.grpc.keepalive.max-connection-age-grace=2562047h" # 100000 days, effectively infinity
We made the same change today for now it seems to work, but will update tomorrow / the day after if the issue truly went away for us too.
from mimir.
I can confirm now after having the change deployed in production for a few days it fully fixed the issue for us.
from mimir.
Related Issues (20)
- Mimir rejects samples when exemplar is non-compliant
- Shard `selector BINOP selector offset 5m`
- Add facultative forking mode or systemd-notify at mimir start HOT 3
- OTLP endpoint issues
- [ingester] Owned series support when replication factor != number of zones
- otlp: improve mimir error message for otlp endpoint, in order to give better instruction on collector config
- Docs: Extend TLS documentation to include Memcached and memcached_exporter. HOT 2
- unable to run Mimir in single ingestor - `too many unhealthy instances in the ring`
- Store-gateway overwrites the lazy-loaded.json snapshot while it's still loading blocks at startup
- Store-gateway doesn't release goroutine if request is cancelled while loading index-header HOT 3
- Store-gateway should not hold blocksMx read lock while lazy loading blocks HOT 1
- Add support for secrets in Alertmanager receivers
- otlp: Mimir's OTLP endpoint write request size limit should be configured in the same unit as the metrics producers HOT 3
- otlp: Add metrics to track samples per batch for otlp request
- mimirtool: Unable to configure client TLS for `remote-read` and other commands HOT 1
- High latency every 1 hour HOT 2
- Send logs to prometheus gateway HOT 2
- Ability to set `client_max_body_size` in nginx to be able to fix 502 Bad Gateway HOT 8
- mapping no retryable 5xx errors to retryable error in otlp handler HOT 3
- Mimir Distributor: "received a series with duplicate label name"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.