Giter Club home page Giter Club logo

Comments (6)

dimitarvdimitrov avatar dimitarvdimitrov commented on May 30, 2024

after sending another SIGTERM via kubectl delete pod to the same pod, the store-gateway logged this and successfully stopped

Logs

ts=2024-02-20T18:34:47.510242666Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=1m39.684969953s id=01H94SF3H0G3BWPV58CHC90G2K
ts=2024-02-20T18:35:28.612765576Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=5m53.154192184s id=01HMJM2V7AGHFJ55VR2GK0DP2D
ts=2024-02-20T18:35:35.53787377Z caller=signals.go:62 level=info msg="=== received SIGINT/SIGTERM ===\n*** exiting"
ts=2024-02-20T18:35:35.538314105Z caller=basic_lifecycler.go:238 level=info msg="ring lifecycler is shutting down" ring=store-gateway
ts=2024-02-20T18:35:35.538904367Z caller=basic_lifecycler.go:272 level=info msg="keeping instance the ring" ring=store-gateway
ts=2024-02-20T18:35:58.04726427Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=2m8.428667103s id=01HE71K9KJ53DD19A8M7CJR0ZJ
ts=2024-02-20T18:35:59.15852666Z caller=bucket.go:405 level=error user=TENANT_ID msg="loading block failed" elapsed=30.545670847s id=01H62WS0AV9QBVH1Z112XX1B68 err="create index header reader: write index header: copy posting offsets: context canceled"
ts=2024-02-20T18:36:01.64761372Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=1m36.716602529s id=01HJ00W1PB4V2D16N9X9QCBCKA
ts=2024-02-20T18:36:06.87322684Z caller=bucket.go:407 level=info user=TENANT_ID msg="loaded new block" elapsed=1m19.362919762s id=01H9PRD4AJQTPH69D1TPH8J35Y
ts=2024-02-20T18:36:07.409658823Z caller=bucket_stores.go:186 level=warn msg="failed to synchronize TSDB blocks" err="context canceled"
ts=2024-02-20T18:36:07.409768038Z caller=mimir.go:870 level=error msg="module failed" module=store-gateway err="starting module store-gateway: context canceled"
ts=2024-02-20T18:36:07.40978213Z caller=memberlist_client.go:720 level=info msg="leaving memberlist cluster"
ts=2024-02-20T18:36:07.409811045Z caller=module_service.go:120 level=info msg="module stopped" module=runtime-config
ts=2024-02-20T18:36:17.436149359Z caller=memberlist_client.go:735 level=warn msg="broadcast messages left in queue" count=4 nodes=438
ts=2024-02-20T18:36:18.078030881Z caller=module_service.go:120 level=info msg="module stopped" module=memberlist-kv
ts=2024-02-20T18:36:18.078137597Z caller=module_service.go:120 level=info msg="module stopped" module=distributor-bi-push-wrapper
ts=2024-02-20T18:36:24.4117936Z caller=server_service.go:55 level=info msg="server stopped"
ts=2024-02-20T18:36:24.411843823Z caller=module_service.go:120 level=info msg="module stopped" module=server
ts=2024-02-20T18:36:24.411900492Z caller=module_service.go:120 level=info msg="module stopped" module=usage-stats
ts=2024-02-20T18:36:24.411910802Z caller=module_service.go:120 level=info msg="module stopped" module=sanity-check
ts=2024-02-20T18:36:24.411939777Z caller=module_service.go:120 level=info msg="module stopped" module=license-manager
ts=2024-02-20T18:36:24.413388804Z caller=module_service.go:120 level=info msg="module stopped" module=activity-tracker
ts=2024-02-20T18:36:24.413423322Z caller=mimir.go:859 level=info msg="Application stopped"
ts=2024-02-20T18:36:24.413536373Z caller=server_util.go:26 level=error original_caller=main.go:266 msg="error running Grafana Enterprise Metrics" err="failed services\ngithub.com/grafana/mimir/pkg/mimir.(*Mimir).Run\n\t/drone/src/vendor/github.com/grafana/m
imir/pkg/mimir/mimir.go:913\ngithub.com/grafana/backend-enterprise/pkg/enterprise/mimir/init.(*MimirEnterprise).Run\n\t/drone/src/pkg/enterprise/mimir/init/mimir.go:196\nmain.main\n\t/drone/src/cmd/enterprise-metrics/main.go:263\nruntime.main\n\t$GOROOT/src
/runtime/proc.go:267\nruntime.goexit\n\t$GOROOT/src/runtime/asm_amd64.s:1650"

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on May 30, 2024

the same store-gateway also deleted/unloaded its blocks upon the shutdown

Screenshot 2024-02-20 at 20 04 15

from mimir.

56quarters avatar 56quarters commented on May 30, 2024

Weird, reading the logs posted it seems like it shutdown after the first SIGTERM but then started again after about 32s and then received another SIGTERM about a minute and 1m20s1 later. I guess it's possible that there's a race condition here but I can't quite see how.

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on May 30, 2024

I should've made this more clear. The posted logs in the original issue contain two SIGTERMs. The first at 17:21:22.793 is from a clean shutdown. The one at 17:23:28.702 was from the preemptive shutdown due to rescheduling.

The third shutdown from the comment 18:35:35.53787377Z is a third shutdown that I triggered with kubectl delete pod

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on May 30, 2024

on a second look it seems like the store-gateway actually started again at 17:24:11.719.

I wonder if the non-ready state was because it was loading blocks and not because it was stuck in shutdown loop. This might have been caused by a wiped persistent volume, in which case the store-gateway behaved alright.

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on May 30, 2024

as noted by @narqo privately the second shutdown in the original logs actually created a new pod with a new container ID. So it seems like the store-gateway shut down properly and started redownloading blocks. This may be explained by a wiped PVC, but I don't see an easy way this is a bug in Mimir. So i'll close this isssue

from mimir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.