Giter Club home page Giter Club logo

Comments (3)

GroovyCarrot avatar GroovyCarrot commented on July 22, 2024

This really bit me recently as well. We'd had 27TB of data rack up in the block store, and then tried to start store-gateway nodes; they basically never started / reported ready as the index was so huge.

Also I found the helm chart uses the default pod management policy for the storegateway, and will block spinning up any additional nodes until the previous one has reported ready. I think podManagementPolicy: Parallel will fix this, though didn't try it as we decided just to destroy the bucket and start fresh. I expect this change would allow multiple nodes to spin up and then be able to decide what tenants/tokens they are responsible for, rather than one starting up and thinking it needs to index everything before anything else is allowed to start.

I think it makes sense for the compactor to do this as it is changing the index anyway when it runs?

Is it possible to optimise this by compiling an index per day, or something? And then store those indices for lazy-loading by the store-gateways if a queries is ran for that period? Seems like then you can start a store-gateway node and it can start taking queries practically straight away?

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on July 22, 2024

I think you're bringing up another problem. When the store-gateway starts it downloads from the bucket the index headers for blocks that shard to it. Figuring out which blocks shard to it is fast, but downloading the index headers from the bucket it slow. It's better to do this before starting up; otherwise, this latency would hit queries.

Also I found the helm chart uses the default pod management policy for the storegateway, and will block spinning up any additional nodes until the previous one has reported ready. I think podManagementPolicy: Parallel will fix this

This is the other problem. It's already configurable, but making it the default is a breaking change, so we've been saving this for helm chart 6.0 (#4560)

This issue (8166) is about then sampling the index headers when a query comes in. The sampled version is called the "sparse index header" and is also persisted on disk today. Sampling requires reading (effectively) the full index header from disk with a lot of random reads, that's why it's slow. The sparse header is computed lazily. This issue suggests to compute it in the compactor and quickly download it in the store-gateway instead of having to sample the index header if the sparse index header is not already on disk.

Is it possible to optimise this by compiling an index per day, or something?

blocks are already split into 24h ranges; if you're using the split-and-merge compactor, then there can even be multiple blocks per 24h range.

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on July 22, 2024

some notes from the comments in the PR: it won't actually be that hard to let the compactor create sparse headers and upload them

I chatted with @ pstibrany and he suggested doing this at the end of BucketCompactor.runCompactionJob so that we don't fail compactions is sparse headers can't be uploaded. It makes sense to still keep the ability to create sparse headers in the store-gateways so they are more autonomous and don't depend on the compactor for performance.

Worth noting that the compactors should upload these sparse headers for new blocks only as not to create a very huge backlog upon deploying a new Mimir version. But store-gateways should still be able to construct sparse headers themselves if those aren't available in the bucket.

from mimir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.