Comments (25)
Seems it's working now after the night! I'll take it! Thanks for the help!
from mimir.
Hello, it seems that block validation step (done by Mimir after block is fully uploaded, but before it is "accepted") has failed, and now block cannot be uploaded or validated anymore. Such block is treated as incomplete-upload, and will be eventually deleted by Mimir. After that, you can try upload it again.
Alternatively, you can try to delete validation.json
file from inside uploaded block in the Mimir's storage. After that, you can run mimirtool
to upload the block again, and run new validation. (You can also disable validation step by setting -compactor.block-upload-validation-enabled=false
on compactor, and then upload will finish successfully, without doing block-validation. However this allows you to upload any "garbage" as TSDB block, and may cause errors in Mimir later).
from mimir.
Ah good to know thanks! Though it appears I get this issue on every block I try to import 🤔
from mimir.
Is there any way to see how the validation failed? It's a block taken from a classic prometheus instance, I figure it should work without issues? (though maybe it's too old in the past ? )
from mimir.
You should be able to see failure reasons in compactor logs. Based on mimirtool output, these blocks look quite big (40 GB?), so validation can take a while, and compactor must be running during the process without interruption.
from mimir.
Yeah I only get that:
ts=2024-05-21T12:36:57.248928356Z caller=block_upload.go:360 level=error component=compactor user=anonymous feature="block upload" block=01HGC2YXSZ86TQK4Z9G8A2DKZN operation="com
plete block upload" request_id=17d18125344c3e74 msg="error while validating block" err="failed to download block: context canceled"
Guess there is some timeout to tweak somewhere?
from mimir.
ts=2024-05-21T12:39:07.165217295Z caller=block_upload.go:105 level=info component=compactor user=anonymous feature="block upload" block=01HGXF4Z4HTZ22TSNTH9AZXKNN operation="star
t block upload" request_id=17d18157c78e4bb4 msg="started block upload"
ts=2024-05-21T12:46:25.950370338Z caller=block_upload.go:172 level=info component=compactor user=anonymous feature="block upload" block=01HGXF4Z4HTZ22TSNTH9AZXKNN operation="comp
lete block upload" request_id=17d181bdef52b395 msg="validation process started"
tts=2024-05-21T12:47:44.330682597Z caller=block_upload.go:360 level=error component=compactor user=anonymous feature="block upload" block=01HGXF4Z4HTZ22TSNTH9AZXKNN operation="com
plete block upload" request_id=17d181bdef52b395 msg="error while validating block" err="failed to download block: context canceled"
to be precise*, so it seems to fail after 80s ?
from mimir.
I don't know about any timeout on the compactor side for validation / downloading blocks, but individual storage clients can have operation timeouts (eg. Swift storage has request timeout config option).
from mimir.
@pstibrany so you think the error comes from the storage provider? I'm using GCS, and doesn't seem to be any timeouts configurable :/
from mimir.
My thinking was that client-side could have some timeout, but in fact I wouldn't expect that to be logged as context canceled. We use GCS client internally, and don't see timeouts when reading data from it.
I'm wondering ... Is that the full compactor log, or is there possibly more messages that could be related to validation (block upload feature)? Any other error logged by compactor that would explain what's happening?
from mimir.
The 3 lines were next to each other in the compactor logs. So I guess there is no other messages related to this upload ?
from mimir.
The 3 lines were next to each other in the compactor logs. So I guess there is no other messages related to this upload ?
Could you try grepping on block ID, maybe something still shows up?
from mimir.
I just tested with another block, live tailing the logs only gets me that:
ts=2024-05-22T09:04:52.177159745Z caller=block_upload.go:105 level=info component=compactor user=anonymous feature="block upload" block=01HHMMGVFCC3J5EY8VTDK1JAH3 operation="start block upload" request_id=17d1c43b4e8d4ad3 msg="started block upload"
ts=2024-05-22T09:06:56.208641698Z caller=block_upload.go:172 level=info component=compactor user=anonymous feature="block upload" block=01HHMMGVFCC3J5EY8VTDK1JAH3 operation="complete block upload" request_id=17d1c4582f908ec3 msg="validation process started"
ts=2024-05-22T09:08:14.223566493Z caller=block_upload.go:360 level=error component=compactor user=anonymous feature="block upload" block=01HHMMGVFCC3J5EY8VTDK1JAH3 operation="complete block upload" request_id=17d1c4582f908ec3 msg="error while validating block" err="failed to download block: context canceled"
from mimir.
To unblock you, you can disable validation (-compactor.block-upload-validation-enabled=false
), then any blocks will be accepted without further checks. If you are uploading blocks from Prometheus, that should be fine.
from mimir.
Yes it looks like it's uploaded correctly. It's not yet shown in Mimir though, I guess I need to wait some time? Can I refresh reset some cache somewhere?
from mimir.
Yes it looks like it's uploaded correctly. It's not yet shown in Mimir though, I guess I need to wait some time? Can I refresh reset some cache somewhere?
It can take about 30mins to block to become queryable -- compactor needs to include the block in bucket-index, and then queriers and store-gateways need to reload the bucket index and fetch parts of the block (store-gateways do that). If you try to query it before, empty results may be cached in the results cache. You can flush results cache by restarting memcached.
from mimir.
Okay I might see some empty cached results, though I have deployed mimir without memcached enabled, is it still used somewhere?
from mimir.
Well I restarted my 3 store gateways, but still nothing. I'll wait a bit more, but it's been over 1h already 🤔
from mimir.
Well I restarted my 3 store gateways, but still nothing. I'll wait a bit more, but it's been over 1h already 🤔
What time range does the uploaded block cover?
Can you see if store-gateways loaded your new block?
from mimir.
The upload block cover roughly from january 2024 to a few weeks ago.
I can see in the store gateway:
ts=2024-05-22T14:07:01.823562954Z caller=bucket.go:407 level=info user=anonymous msg="loaded new block" elapsed=403.665477ms id=01HMQ525MXFV8SR05WT2BMXBT2
ts=2024-05-22T14:22:39.628135866Z caller=spanlogger.go:109 user=anonymous method=indexheader.NewStreamBinaryReader level=info msg="loading sparse index-header from full index-header" id=01HMQ525MXFV8SR05WT2BMXBT2
ts=2024-05-22T14:22:39.669792351Z caller=spanlogger.go:109 user=anonymous method=indexheader.NewStreamBinaryReader level=info msg="loaded sparse index-header from full index-header" id=01HMQ525MXFV8SR05WT2BMXBT2 elapsed=41.658729ms
ts=2024-05-22T14:22:39.669820198Z caller=spanlogger.go:109 user=anonymous method=indexheader.NewStreamBinaryReader level=info msg="writing sparse index-header to disk" id=01HMQ525MXFV8SR05WT2BMXBT2 path=/data/tsdb-sync/anonymous/01HMQ525MXFV8SR05WT2BMXBT2/sparse-index-header
ts=2024-05-22T14:22:39.68422412Z caller=spanlogger.go:109 user=anonymous method=indexheader.NewStreamBinaryReader level=info msg="wrote sparse index-header to disk" id=01HMQ525MXFV8SR05WT2BMXBT2 path=/data/tsdb-sync/anonymous/01HMQ525MXFV8SR05WT2BMXBT2/sparse-index-header elapsed=14.394569ms
If I grep a block I uploaded. Another for instance:
ts=2024-05-22T14:06:13.466412037Z caller=spanlogger.go:109 user=anonymous method=indexheader.NewStreamBinaryReader level=info msg="loading from sparse index-header from disk" id=01HK2ZTGZXM7543W0ZJE5DXDQ2 path=/data/tsdb-sync/anonymous/01HK2ZTGZXM7543W0ZJE5DXDQ2/sparse-index-header
ts=2024-05-22T14:06:13.468317073Z caller=spanlogger.go:109 user=anonymous method=indexheader.NewStreamBinaryReader level=info msg="loaded sparse index-header from disk" id=01HK2ZTGZXM7543W0ZJE5DXDQ2 path=/data/tsdb-sync/anonymous/01HK2ZTGZXM7543W0ZJE5DXDQ2/sparse-index-header elapsed=1.907567ms
ts=2024-05-22T14:06:13.49157358Z caller=bucket.go:407 level=info user=anonymous msg="loaded new block" elapsed=28.30006ms id=01HK2ZTGZXM7543W0ZJE5DXDQ2
from mimir.
👍 That shows that blocks are now in bucket-index, and store-gateways know about them. Queriers should too, since they load the same bucket-index.
Since you don't have any cache ("I have deployed mimir without memcached enabled"), I don't see other reason why your queries should not work anymore. You can try something like count({__name__!=""})
to see how many series there are between january and now in your Mimir.
from mimir.
Well I'm getting timeout with this query 😅 even on a 5 minutes interval if it's a few weeks ago. In the querier logs:
ts=2024-05-22T15:22:08.666957598Z caller=scheduler_processor.go:240 level=error user=anonymous msg="error notifying scheduler about finished query after the scheduler stream previously failed and returned error" err=EOF addr=10.52.1.244:9095 previousErr="rpc error: code = Canceled desc = context canceled: frontend cancelled query"
Do you have any ideas?
from mimir.
Hum it seems I have a lot of consistency checks on these blocks
from mimir.
Hum it seems I have a lot of consistency checks on these blocks
Consistency checks mean that querier tried to query some block, but no store-gateway that was supposed to have it, replied with data from the block. Perhaps store-gateways were still loading new blocks?
from mimir.
Well I'm getting timeout with this query 😅 even on a 5 minutes interval if it's a few weeks ago. In the querier logs:
ts=2024-05-22T15:22:08.666957598Z caller=scheduler_processor.go:240 level=error user=anonymous msg="error notifying scheduler about finished query after the scheduler stream previously failed and returned error" err=EOF addr=10.52.1.244:9095 previousErr="rpc error: code = Canceled desc = context canceled: frontend cancelled query"
Do you have any ideas?
Unfortunately I'm not familiar with this one.
from mimir.
Related Issues (20)
- MimirRolloutStuck Alerts false positive HOT 1
- Flaky `TestRangeVectorSelectors` HOT 4
- Read-Write deployment mode in helm chart
- Idea: log all details of one sample, for large requests
- compactor: panic in r302 HOT 1
- Alertmanager error using custom function queryFromGeneratorURL
- mimir store-gateway is dropping blocks as outdated (and ignoring blocks retention period)
- Grafana Mimir's PVC is full
- Cortex metrics mimir is not displayed
- Intermittent panic in queriers while issuing a native histogram query HOT 7
- Docs: NetworkPolicies HOT 4
- Alertmanager ignores route's active_time_intervals
- Grafana Mimir Ruler error:
- Bug: downstream replied with Unprocessable Entity HOT 2
- It's Over 9000! HOT 1
- kvstore (of e.g. distributor ha_tracker) in redis/valkey HOT 2
- Misleading recording rule for cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate
- Goroutine leak in TestStoreGateway_YYY HOT 5
- Start/end times in query-frontend logs don't reflect step-aligned start/end times
- mimirtool backfill - compactor: can't check block state/block validation failed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.