Comments (6)
after sending another SIGTERM via kubectl delete pod
to the same pod, the store-gateway logged this and successfully stopped
Logs
ts=2024-02-20T18:34:47.510242666Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=1m39.684969953s id=01H94SF3H0G3BWPV58CHC90G2K
ts=2024-02-20T18:35:28.612765576Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=5m53.154192184s id=01HMJM2V7AGHFJ55VR2GK0DP2D
ts=2024-02-20T18:35:35.53787377Z caller=signals.go:62 level=info msg="=== received SIGINT/SIGTERM ===\n*** exiting"
ts=2024-02-20T18:35:35.538314105Z caller=basic_lifecycler.go:238 level=info msg="ring lifecycler is shutting down" ring=store-gateway
ts=2024-02-20T18:35:35.538904367Z caller=basic_lifecycler.go:272 level=info msg="keeping instance the ring" ring=store-gateway
ts=2024-02-20T18:35:58.04726427Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=2m8.428667103s id=01HE71K9KJ53DD19A8M7CJR0ZJ
ts=2024-02-20T18:35:59.15852666Z caller=bucket.go:405 level=error user=TENANT_ID msg="loading block failed" elapsed=30.545670847s id=01H62WS0AV9QBVH1Z112XX1B68 err="create index header reader: write index header: copy posting offsets: context canceled"
ts=2024-02-20T18:36:01.64761372Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=1m36.716602529s id=01HJ00W1PB4V2D16N9X9QCBCKA
ts=2024-02-20T18:36:06.87322684Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=1m19.362919762s id=01H9PRD4AJQTPH69D1TPH8J35Y
ts=2024-02-20T18:36:07.409658823Z caller=bucket_stores.go:186 level=warn msg="failed to synchronize TSDB blocks" err="context canceled"
ts=2024-02-20T18:36:07.409768038Z caller=mimir.go:870 level=error msg="module failed" module=store-gateway err="starting module store-gateway: context canceled"
ts=2024-02-20T18:36:07.40978213Z caller=memberlist_client.go:720 level=info msg="leaving memberlist cluster"
ts=2024-02-20T18:36:07.409811045Z caller=module_service.go:120 level=info msg="module stopped" module=runtime-config
ts=2024-02-20T18:36:17.436149359Z caller=memberlist_client.go:735 level=warn msg="broadcast messages left in queue" count=4 nodes=438
ts=2024-02-20T18:36:18.078030881Z caller=module_service.go:120 level=info msg="module stopped" module=memberlist-kv
ts=2024-02-20T18:36:18.078137597Z caller=module_service.go:120 level=info msg="module stopped" module=distributor-bi-push-wrapper
ts=2024-02-20T18:36:24.4117936Z caller=server_service.go:55 level=info msg="server stopped"
ts=2024-02-20T18:36:24.411843823Z caller=module_service.go:120 level=info msg="module stopped" module=server
ts=2024-02-20T18:36:24.411900492Z caller=module_service.go:120 level=info msg="module stopped" module=usage-stats
ts=2024-02-20T18:36:24.411910802Z caller=module_service.go:120 level=info msg="module stopped" module=sanity-check
ts=2024-02-20T18:36:24.411939777Z caller=module_service.go:120 level=info msg="module stopped" module=license-manager
ts=2024-02-20T18:36:24.413388804Z caller=module_service.go:120 level=info msg="module stopped" module=activity-tracker
ts=2024-02-20T18:36:24.413423322Z caller=mimir.go:859 level=info msg="Application stopped"
ts=2024-02-20T18:36:24.413536373Z caller=server_util.go:26 level=error original_caller=main.go:266 msg="error running Grafana Enterprise Metrics" err="failed services\ngithub.com/grafana/mimir/pkg/mimir.(*Mimir).Run\n\t/drone/src/vendor/github.com/grafana/m
imir/pkg/mimir/mimir.go:913\ngithub.com/grafana/backend-enterprise/pkg/enterprise/mimir/init.(*MimirEnterprise).Run\n\t/drone/src/pkg/enterprise/mimir/init/mimir.go:196\nmain.main\n\t/drone/src/cmd/enterprise-metrics/main.go:263\nruntime.main\n\t$GOROOT/src
/runtime/proc.go:267\nruntime.goexit\n\t$GOROOT/src/runtime/asm_amd64.s:1650"
from mimir.
the same store-gateway also deleted/unloaded its blocks upon the shutdown
from mimir.
Weird, reading the logs posted it seems like it shutdown after the first SIGTERM
but then started again after about 32s
and then received another SIGTERM
about a minute and 1m20s1
later. I guess it's possible that there's a race condition here but I can't quite see how.
from mimir.
I should've made this more clear. The posted logs in the original issue contain two SIGTERMs. The first at 17:21:22.793 is from a clean shutdown. The one at 17:23:28.702 was from the preemptive shutdown due to rescheduling.
The third shutdown from the comment 18:35:35.53787377Z is a third shutdown that I triggered with kubectl delete pod
from mimir.
on a second look it seems like the store-gateway actually started again at 17:24:11.719.
I wonder if the non-ready state was because it was loading blocks and not because it was stuck in shutdown loop. This might have been caused by a wiped persistent volume, in which case the store-gateway behaved alright.
from mimir.
as noted by @narqo privately the second shutdown in the original logs actually created a new pod with a new container ID. So it seems like the store-gateway shut down properly and started redownloading blocks. This may be explained by a wiped PVC, but I don't see an easy way this is a bug in Mimir. So i'll close this isssue
from mimir.
Related Issues (20)
- [ingester] Ingester service state and lifecycler ring state not synchronized HOT 4
- Compactor fails to upload indexes larger than 1G to swift object storage
- Scrape commit failed" err="write to WAL: log samples: write data/wal/XXXXXXXX: no space left on device HOT 1
- Helm: Missing fields in Topology Spread Constraints
- Ruler Pods OOM/spike in memory observed with warning log closing ingester client stream failed
- store-gateway: add timeout to index-header loading
- Mimir returns HTTP status 422 in cases where 5xx makes more sense
- Docs: Update references to mmap in store-gateway architecture
- Query with aggregation return incorrect num of points HOT 1
- [mimir-distributed] Add additionalRuleLabels to PrometheusRule alerts HOT 1
- Request per Second Metric Does Not Sync with Total Request Count in Mimir Visualization
- mimirtool backfill: failed uploading block HOT 25
- Multi-Tenancy Support for Mimir Ruler HOT 2
- store-gateway: store sparse index headers in object store
- helm: Stateful components emptyDir inMemory option HOT 3
- Make the 'for' period configurable for MimirInconsistentRuntimeConfig alert HOT 1
- Receiving failed to enqueue request 500s
- otlp: Mimir's OTLP endpoint to return marshalled proto bytes as response body HOT 2
- store-gateway: be able to restrict time range of blocks synced from the bucket HOT 1
- Mimir rejects samples when exemplar is non-compliant
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.