Comments (10)
Do you have out of order blocks enabled in your setup? I don't think the support for out of order blocks is stable yet.
from thanos.
Do you have out-of-order blocks enabled in your setup? I don't think the support for out-of-order blocks is stable yet.
I did have it enabled early this week.
When this screenshot was taken it was already disabled, but this might have come from an env/context where the changes did not propagate yet. (or at least I'm hoping so:) )
We do have alerts on high head series.
Will monitor those and update the issue if reproduces. Hopefully it was just that:)
from thanos.
@hanem100k you will have to delete the out of order blocks manually if they made it to object storage. Otherwise every time the compactor sees them it might have issues. I'm not sure if the Compactor can gracefully skip out of order blocks.
from thanos.
I only had 6 hours of retention on receivers, so old blocks were just deleted. Not sure if it will/would cause downsampling issues once in object storage.
Either way, the good news is that I haven't seen this reoccur in the past week across many environments.
Closing the issue, thanks for your time and looks!
from thanos.
Hi @douglascamata , I'm expiring the same issue:
Thanos version:
0.35.1
Object Storage Provider:
Azure blob. For receivers, azure disk
{"caller":"db.go:1014","component":"multi-tsdb","err":"add series: out-of-order series added with label set \"{__name__=\\\"aggregator_discovery_aggregation_count_total\\\", cluster=\\\"opsstack\\\", endpoint=\\\"https-metrics\\\", instance=\\\"10.0.240.10:10250\\\", job=\\\"kubelet\\\", metrics_path=\\\"/metrics\\\", namespace=\\\"opsstack\\\", node=\\\"aks-opsstack-40597296-vmss000001\\\", prometheus=\\\"opsstack/opsstack-prom-stack-prometheus\\\", prometheus_replica=\\\"prometheus-opsstack-prom-stack-prometheus-0\\\", service=\\\"opsstack-prom-stack-kubelet\\\"}\"","level":"error","msg":"compaction failed","tenant":"opsstack","ts":"2024-06-19T22:04:59.161392545Z"}
Each error has the exact amount of labels.
In our case, receive is not OOM due high memory limits, but I see an memory leaking situation, since the error occurs:
The error appears since restart of the pod, the version (v0.35.1) has not been changed.
In our case, we have out-of-order enabled.
I guess deleting something may helps (so what?), but it could appears any time again?
from thanos.
@jkroepke out of order is not stable in Thanos, it's still experimental. There might be known and unknown rough edges and bugs. We do not recommend to turn it on in production.
from thanos.
There might be known and unknown rough edges and bugs.
Thats fine, but an bug report is still fine? Or is it in your mind to close all bugs, because the feature in experimental?
from thanos.
Thats fine, but an bug report is still fine? Or is it in your mind to close all bugs, because the feature in experimental?
What are you implying with these questions? Did I say a bug report is not fine? Did I say this should be closed? Did I close it myself?
What I said is: if this feature causes you trouble, disable it. It's experimental, not stable, and potentially buggy. I didn't say anything else.
from thanos.
Did I say a bug report is not fine
I had at least that feeling. Like: Thanks for the report, please disable that feature. Feels like an deny.
from thanos.
A deny is me closing the issue, which I didn't. The author closed it themselves. Me saying "thanks for the report, please disable that feature to avoid issues while it's experimental" is 100% fine. I'm a triager and contributor. Unfortunately I don't know enough about out-of-order to contribute a fix.
So I'm doing some triage and "thanks for the report, please disable that feature to avoid issues while it's experimental" is all I can do as a triager.
from thanos.
Related Issues (20)
- Query: Network bandwidth usage upward of 500MB/s between Querier and configured stores HOT 17
- Codespace doesn't seem to be working
- AWS S3 objectstorage is not working for ap-south-2 region inspite of endpoint and region is mentioned and getting default to dualstack endpoint of us-east-1 HOT 7
- query: If i choose time window shorter than 6 months, i don't see downsampled metrics HOT 4
- compact: Thanos Compactor doesn't delete blocks which are marked for deletion HOT 1
- [Bug] Gaps in sum and avg aggregations when joining histogram quantile with pod labels
- store: disk usage continually increasing over time HOT 6
- Thanos store does not show data in downsampled block
- Receive: compaction failure causing query returning irrelevant time series HOT 2
- found duplicate storeEndpoints producer (sidecar or ruler)
- Compactor: Downsample second pass creates duplicate blocks HOT 2
- thanos compact panic on GetActiveAndPartialBlockIDs HOT 1
- why can't thanos fail properly on errors (like everone else)?
- compactor: Fails cleaning blocks marked for deletion with Access Denied (solved)
- Caching Bucket File Unknown Flag
- compactor: add series: symbol table size exceeds X bytes: X HOT 2
- Enhance Thanos Ruler UI with Deep Linking Capabilities for Specific Alerts
- Thanos Store Does Not Reflect S3 Storage Unavailability in Health Checks HOT 2
- fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thanos.