Comments (7)
@BouchaaraAdil Have you checked your compactor metrics? Is it in halt status?
thanos_compact_halted
you can find it via this metric.
https://thanos.io/tip/operating/compactor-backlog.md/#make-sure-compactors-are-running
from thanos.
@yeya24 yes i already did the value is: 0
logs are mostly like:
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=7m8.010456222s duration_ms=428010 cached=24894 returned=24894 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18750
caller=fetcher.go:407 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=60
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=7m8.234159055s duration_ms=428234 cached=24894 returned=24894 partial=0
caller=blocks_cleaner.go:58 level=info msg="cleaning of blocks marked for deletion done"
caller=blocks_cleaner.go:44 level=info msg="started cleaning of blocks marked for deletion"
caller=clean.go:61 level=info msg="cleaning of aborted partial uploads done"
caller=clean.go:34 level=info msg="started cleaning of aborted partial uploads"
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=8m35.588933685s duration_ms=515588 cached=24894 returned=24894 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18750
caller=fetcher.go:407 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=60
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=8m35.392333486s duration_ms=515392 cached=24894 returned=24894 partial=0
caller=blocks_cleaner.go:58 level=info msg="cleaning of blocks marked for deletion done" caller=blocks_cleaner.go:44 level=info msg="started cleaning of blocks marked for deletion"
caller=clean.go:61 level=info msg="cleaning of aborted partial uploads done"
caller=clean.go:34 level=info msg="started cleaning of aborted partial uploads"
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=8m11.753199423s duration_ms=491753 cached=24894 returned=24894 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18750
caller=fetcher.go:407 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=60
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=8m11.674345163s duration_ms=491674 cached=24894 returned=24894 partial=0
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=7m35.466654203s duration_ms=455466 cached=24894 returned=24894 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18750
caller=fetcher.go:407 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=60
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=7m35.686672464s duration_ms=455686 cached=24894 returned=24894 partial=0
caller=blocks_cleaner.go:58 level=info msg="cleaning of blocks marked for deletion done"
caller=blocks_cleaner.go:44 level=info msg="started cleaning of blocks marked for deletion"
caller=clean.go:61 level=info msg="cleaning of aborted partial uploads done"
caller=clean.go:34 level=info msg="started cleaning of aborted partial uploads"
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=6m37.443088828s duration_ms=397443 cached=24894 returned=24894 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18750
caller=fetcher.go:407 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=60
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=6m36.845099529s duration_ms=396845 cached=24894 returned=24894 partial=0
caller=blocks_cleaner.go:58 level=info msg="cleaning of blocks marked for deletion done"
caller=blocks_cleaner.go:44 level=info msg="started cleaning of blocks marked for deletion"
caller=clean.go:61 level=info msg="cleaning of aborted partial uploads done"
caller=clean.go:34 level=info msg="started cleaning of aborted partial uploads"
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=6m39.587074548s duration_ms=399587 cached=24894 returned=24894 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18750
caller=fetcher.go:407 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=60
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=6m40.384243402s duration_ms=400384 cached=24894 returned=24894 partial=0
caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=6m54.588252009s duration_ms=414588 cached=24894 returned=24791 partial=0
caller=fetcher.go:820 level=debug msg="removed replica label" label=prometheus_replica count=18647
caller=fetcher.go:867 level=debug msg="block is too fresh for now" block=01HSJP6XN8DPDBJ9XHZTTWDWRD
caller=fetcher.go:867 level=debug msg="block is too fresh for now" block=01HSJP6XKT2BVBNEB3THRGR0WC
caller=fetcher.go:867 level=debug msg="block is too fresh for now" block=01HSJP6X2G14XJ04WJ53CZXRKC
caller=fetcher.go:867 level=debug msg="block is too fresh for now" block=01HSJP6WVZ01H29HPT1QGK07T5
and the rate of compaction is very low, it starts only after recreating the compactor pod, when I removed the vertical compaction, the compaction rate was high and started to compact even fresher blocks of 2hours to 8,6 hours
from thanos.
@GiedriusS any idea on Error executing query: sum and count timestamps not aligned
behaviour?
from thanos.
@yeya24 any updates?
from thanos.
@bwplotka any idea?
from thanos.
I'm facing the same issue. I'm using Thanos (0.36.0), Prometheus (2.53.0), and a ceph cluster for object storage. I have an HA Prometheus scraping the same set of targets, and the compaction rate is very low on the Thanos compactor. The blocks are uploading successfully to block storage via Thanos sidecar. but I have issues with the compaction and downsampling process.
thanos compactor flags:
- args:
- compact
- --data-dir=/var/thanos/compact
- --log.level=info
- --http-address=0.0.0.0:10902
- --objstore.config-file=/etc/objectstore/monitoring-user-s3-creds.yaml
- --retention.resolution-raw=14d
- --retention.resolution-5m=30d
- --retention.resolution-1h=180d
- --delete-delay=0
- --wait
- --wait-interval=15m
- --web.disable
- --enable-auto-gomemlimit
- --deduplication.replica-label=prometheus_replica
- --deduplication.func=penalty
- --block-files-concurrency=8
- --compact.blocks-fetch-concurrency=8
- --compact.concurrency=8
- --downsample.concurrency=8
Thanos compactor resources:
resources:
limits:
cpu: "8"
memory: 24Gi
requests:
cpu: "8"
memory: 24Gi
And the Thanos compactor is not halting, but the thanos_compact_todo_downsample_blocks
metric is always 0. The backlog is getting heavier. our 2h blocks of Prometheus are about 1G.
The thanos_compact_todo_compaction_blocks
metric:
The vertical compaction is done but the compaction level is stuck on level 2. Also, there is no error or warning in the logs. verbosity is info, but as I checked in debug mode, there was nothing found related to this issue.
Also about the resource usage, I checked this from the node with top
command and it does not use half of it's resources.
from thanos.
@BouchaaraAdil could you fix the issue?
from thanos.
Related Issues (20)
- Thanos receive distributor HPA not working - Traffic is only going to 1 distributor HOT 1
- Blocking certain queries in Thanos HOT 4
- Doc for out of order HOT 1
- sidecar: Do not crash when Object Storage is not accessible HOT 1
- The result of "Use Deduplication" is not as expected HOT 9
- Receiver samples limit should include native histograms
- Docs version dropdown overflows the page
- Rule: Support QueryOffset
- Add support to the Info field in the body of query API responses
- Expose more rueidis clientOpts
- Index Cache: Change fetchMulti interface to return slice rather than map HOT 3
- Sidecar: After a block upload was delayed, all subsequent block uploads were delayed as well HOT 4
- Can we filter out the blocks that are already fully compressed and then downsample them. This way downsampled blocks will not be compressed.There shouldn't be any other problems with this, right? HOT 1
- query/sidecar: same store.limits flags on both components HOT 3
- Thanos update causing compactor to err on sync before retention HOT 3
- query: regression in v0.36.0 - concurrent_gate_queries periodically reaching limit HOT 4
- Thanos querier rate/increase function creating huge spikes/incorrect results when deduplication is enabled
- Make TLS versions and ciphers configurable HOT 1
- FT add feature flag for 'promql-experimental-functions'. HOT 3
- Improvement of OpenSSF Scorecard Score HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thanos.