Basically we don't want to ask for the same data twice. Since we know that: <ul di

The questions we want to answer: Is it worth to dive in into t

Remove overlap when querying data that both Source and Store have. about thanos HOT 5 CLOSED

bwplotka commented on May 13, 2024

Remove overlap when querying data that both Source and Store have.

from thanos.

Comments (5)

bwplotka commented on May 13, 2024

The questions we want to answer:

Is it worth to dive in into this NOW
Is there any simpler generic solution that will work for all cases? Cannot see any.
How we will match stores to sources (just arbitrary ID? external labels?)

from thanos.

fabxc commented on May 13, 2024

I think adding per-source information to the store node can blow up rather quickly. At least for the gossip state this might get too big. The query nodes could of course periodically fetch that state via a regular API.

At this stage I'm not sure yet though whether that's already valuable. A generous time buffer is probably fine.

What we also don't quite know yet is whether it's actually better to fetch as much data as possible form store or from source nodes.
Given we currently use the rather inefficient query API on the sidecar side, store nodes might be faster, especially once caching is implemented.

What we could do, and what would be rather cheap in terms of messages being sent around is this: after the source nodes shipped a block, they can periodically ask the store nodes for the highest block timestamp they have for the source.
Then the source nodes gossip a max-synced-time timestamp. Query nodes then can reliably know the minimum timestamp they need to hit that source node with.
That would be cater to the use case where we want to minimize data fetched from source nodes.

But seems like it's still a bit early for this optimizations since we don't know for sure yet what we want.

from thanos.

bwplotka commented on May 13, 2024

+1 but still: max-synced-time -> needs to be PER Store (:

from thanos.

vishnubraj commented on May 13, 2024

from thanos.

bwplotka commented on May 13, 2024

Thanks for bumping. This issue can be now resolved thanks to [--min-time (time slicing)] flags for both Store Gateway and sidecar.

By default both sidecar and store gateway expose all data they have:
However you can:

add --min-time to sidecar with e.g 4h value. This will ensure only 4h of of sidecar data is exposed. Useful when you want to use Store Gateway for older queries.
add --max-time for store gateway for something like retention minus safety buffer of Prometheus. No min-time for sidecar. This will allow Store GW to not expose anything "younger" than X time. Useful if you want to allow Prometheus to be used rather store Gateway. This however is not ideal as this is a requirement still: local compaction has to be disabled when uploading blocks

from thanos.

Recommend Projects

Remove overlap when querying data that both Source and Store have. about thanos HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent