Comments (3)
This really bit me recently as well. We'd had 27TB of data rack up in the block store, and then tried to start store-gateway nodes; they basically never started / reported ready as the index was so huge.
Also I found the helm chart uses the default pod management policy for the storegateway, and will block spinning up any additional nodes until the previous one has reported ready. I think podManagementPolicy: Parallel
will fix this, though didn't try it as we decided just to destroy the bucket and start fresh. I expect this change would allow multiple nodes to spin up and then be able to decide what tenants/tokens they are responsible for, rather than one starting up and thinking it needs to index everything before anything else is allowed to start.
I think it makes sense for the compactor to do this as it is changing the index anyway when it runs?
Is it possible to optimise this by compiling an index per day, or something? And then store those indices for lazy-loading by the store-gateways if a queries is ran for that period? Seems like then you can start a store-gateway node and it can start taking queries practically straight away?
from mimir.
I think you're bringing up another problem. When the store-gateway starts it downloads from the bucket the index headers for blocks that shard to it. Figuring out which blocks shard to it is fast, but downloading the index headers from the bucket it slow. It's better to do this before starting up; otherwise, this latency would hit queries.
Also I found the helm chart uses the default pod management policy for the storegateway, and will block spinning up any additional nodes until the previous one has reported ready. I think
podManagementPolicy: Parallel
will fix this
This is the other problem. It's already configurable, but making it the default is a breaking change, so we've been saving this for helm chart 6.0 (#4560)
This issue (8166) is about then sampling the index headers when a query comes in. The sampled version is called the "sparse index header" and is also persisted on disk today. Sampling requires reading (effectively) the full index header from disk with a lot of random reads, that's why it's slow. The sparse header is computed lazily. This issue suggests to compute it in the compactor and quickly download it in the store-gateway instead of having to sample the index header if the sparse index header is not already on disk.
Is it possible to optimise this by compiling an index per day, or something?
blocks are already split into 24h ranges; if you're using the split-and-merge compactor, then there can even be multiple blocks per 24h range.
from mimir.
some notes from the comments in the PR: it won't actually be that hard to let the compactor create sparse headers and upload them
I chatted with @ pstibrany and he suggested doing this at the end of
BucketCompactor.runCompactionJob
so that we don't fail compactions is sparse headers can't be uploaded. It makes sense to still keep the ability to create sparse headers in the store-gateways so they are more autonomous and don't depend on the compactor for performance.
Worth noting that the compactors should upload these sparse headers for new blocks only as not to create a very huge backlog upon deploying a new Mimir version. But store-gateways should still be able to construct sparse headers themselves if those aren't available in the bucket.
from mimir.
Related Issues (20)
- Read-Write deployment mode in helm chart
- Idea: log all details of one sample, for large requests
- compactor: panic in r302 HOT 1
- Alertmanager error using custom function queryFromGeneratorURL
- mimir store-gateway is dropping blocks as outdated (and ignoring blocks retention period)
- Grafana Mimir's PVC is full
- Cortex metrics mimir is not displayed
- Intermittent panic in queriers while issuing a native histogram query HOT 7
- Docs: NetworkPolicies HOT 4
- Alertmanager ignores route's active_time_intervals
- Grafana Mimir Ruler error:
- Bug: downstream replied with Unprocessable Entity HOT 2
- It's Over 9000! HOT 1
- kvstore (of e.g. distributor ha_tracker) in redis/valkey HOT 2
- Misleading recording rule for cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate
- Goroutine leak in TestStoreGateway_YYY HOT 5
- Start/end times in query-frontend logs don't reflect step-aligned start/end times
- mimirtool backfill - compactor: can't check block state/block validation failed HOT 1
- Support go 1.23 HOT 3
- feat: add support for docker `healthcheck` to `grafana/mimir` image HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.