Comments (9)
@seizethedave what I'm wondering upon reading the code is whether what we need to do is to modify Ingester.LabelValues
such that it only closes the querier (returned from db.Querier
) after LabelValuesResponse
has been marshaled. I believe tsdb.Block.Close
waits until all pending readers are done, and closing the block index reader decrements the pending reader count. Not sure how we can eventually customize the LabelValuesResponse
marshaling, to take care of the closing. WDYT?
from mimir.
per the backtrace:
- the byte slice
{0xc137890000, 0x2d6a64, 0x2d6a64}
is heap allocated and is ~2.83 MiB. - the sigsegv signal indicates
- address= 0x7976968b824e: yes, this is definitely a wild memory address in the ~120 TiB neighborhood.
- code=0x1: the address isn't in the mapped space for this process.
- is that giant address in the source or destination of the memmove call? seems more likely that it's the slice of source strings. Is there something fancy going on in there?
- Ah. The values are backed by the index file which is mmapped. The strings' byte slices are pointing to memory mapped regions. Note the comment in there:
// It is not safe to use the return value beyond the lifetime of the byte slice // passed into the Reader.
- There's nothing preventing the file from being unmapped while the gRPC/proto machinery marshals the string slice to the caller. This is probably the bug. There are other RPCs in Ingester that would suffer from this same thing.
LabelValuesCardinality
, etc.
from mimir.
Which Mimir version is this?
from mimir.
@dimitarvdimitrov this might be of interest to you. I think we discussed whether the use of yoloString
would be safe in index.Reader.LabelValues
?
from mimir.
Yeah, that is the bug. In one of the crashes, there's a goroutine running ingester's Compact() loop, which calls deleteBlocks
, which closes compacted blocks, which had just called munmap
before this process crashed.
2024-04-09 06:51:51.918 goroutine 3773562 [runnable]:
2024-04-09 06:51:51.918 syscall.Syscall6(0xc092f90ac0?, 0x11?, 0x470100?, 0x0?, 0x0?, 0x7975a8d5bbc8?, 0xa?)
2024-04-09 06:51:51.919 $GOROOT/src/syscall/syscall_linux.go:91 +0x30 fp=0xc0014dedd8 sp=0xc0014ded50 pc=0x4de230
2024-04-09 06:51:51.919 syscall.openat(0x797712069a38?, {0xc092f90ac0?, 0xc000098400?}, 0xc0953b4060?, 0x0)
2024-04-09 06:51:51.919 $GOROOT/src/syscall/zsyscall_linux_amd64.go:83 +0x8d fp=0xc0014dee50 sp=0xc0014dedd8 pc=0x4da9ad
2024-04-09 06:51:51.919 syscall.Open(...)
2024-04-09 06:51:51.919 $GOROOT/src/syscall/syscall_linux.go:272
2024-04-09 06:51:51.919 os.open({0xc092f90ac0?, 0x2f41bc0?}, 0xc0b8ce0001?, 0x14deef8?)
2024-04-09 06:51:51.919 $GOROOT/src/os/file_open_unix.go:15 +0x2b fp=0xc0014dee88 sp=0xc0014dee50 pc=0x507f0b
2024-04-09 06:51:51.919 os.openFileNolog({0xc092f90ac0, 0x11}, 0x0, 0x0)
2024-04-09 06:51:51.919 $GOROOT/src/os/file_unix.go:272 +0x8a fp=0xc0014deed0 sp=0xc0014dee88 pc=0x50938a
2024-04-09 06:51:51.919 os.OpenFile({0xc092f90ac0, 0x11}, 0x0, 0x5384ae0?)
2024-04-09 06:51:51.919 $GOROOT/src/os/file.go:334 +0x3e fp=0xc0014def08 sp=0xc0014deed0 pc=0x5074be
2024-04-09 06:51:51.919 os.Open(...)
2024-04-09 06:51:51.919 $GOROOT/src/os/file.go:314
2024-04-09 06:51:51.919 os.removeAll({0xc092f90ac0, 0x3d})
2024-04-09 06:51:51.919 $GOROOT/src/os/removeall_at.go:38 +0x19a fp=0xc0014defc0 sp=0xc0014def08 pc=0x50afda
2024-04-09 06:51:51.919 os.RemoveAll(...)
2024-04-09 06:51:51.919 $GOROOT/src/os/path.go:67
2024-04-09 06:51:51.919 github.com/prometheus/prometheus/tsdb.(*DB).deleteBlocks(0xc000933900, 0xc0014df448?)
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1752 +0x448 fp=0xc0014df188 sp=0xc0014defc0 pc=0x1acd828
2024-04-09 06:51:51.919 github.com/prometheus/prometheus/tsdb.(*DB).reloadBlocks(0xc000933900)
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1596 +0xaa5 fp=0xc0014dfb48 sp=0xc0014df188 pc=0x1acbea5
2024-04-09 06:51:51.919 github.com/prometheus/prometheus/tsdb.(*DB).compactOOOHead(0xc000933900, {0x3a6c568?, 0x540c3e0?})
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1321 +0xf3 fp=0xc0014dfc98 sp=0xc0014dfb48 pc=0x1ac9993
2024-04-09 06:51:51.919 github.com/prometheus/prometheus/tsdb.(*DB).Compact(0xc000933900, {0x3a6c568, 0x540c3e0})
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1277 +0x68d fp=0xc0014dfdf8 sp=0xc0014dfc98 pc=0x1ac904d
2024-04-09 06:51:51.919 github.com/grafana/mimir/pkg/ingester.(*userTSDB).Compact(...)
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/grafana/mimir/pkg/ingester/user_tsdb.go:181
2024-04-09 06:51:51.919 github.com/grafana/mimir/pkg/ingester.(*Ingester).compactBlocks.func1({0xc000e12a30?, 0xc001475001?}, {0xc00143117b, 0x6})
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/grafana/mimir/pkg/ingester/ingester.go:2919 +0x44a fp=0xc0014dfee8 sp=0xc0014dfdf8 pc=0x2211fea
2024-04-09 06:51:51.919 github.com/grafana/dskit/concurrency.ForEachUser.func1()
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/grafana/dskit/concurrency/runner.go:45 +0x132 fp=0xc0014dffe0 sp=0xc0014dfee8 pc=0x1856192
2024-04-09 06:51:51.919 runtime.goexit()
2024-04-09 06:51:51.919 $GOROOT/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0014dffe8 sp=0xc0014dffe0 pc=0x471f61
2024-04-09 06:51:51.919 created by github.com/grafana/dskit/concurrency.ForEachUser in goroutine 3814
2024-04-09 06:51:51.919 /drone/src/vendor/github.com/grafana/dskit/concurrency/runner.go:36 +0x126
general solutions:
- offer variants of the methods in tsdb's
index.Reader
that return copied strings. - add the notion of a handle that callers can obtain from the index. This would also probably require us to write a custom gRPC-go codec, because we'd need to release the handle after
Marshal
is called. And there'd be the complexity of making ingester+tsdb block have a "closing" state where munmap doesn't happen until all handles are closed. (And new RPCs don't see "closing" blocks.) - others?
from mimir.
Related Issues (20)
- Mismatch between ingester limiter and owned series when new ingesters are PENDING with no tokens HOT 1
- Flaky test: TestIngester_PreparePartitionDownscaleHandler HOT 3
- Failed pushing to ingester: context deadline exceeded error observed in 2.11.0 HOT 2
- Add an ability to specify dnsConfig for alertmanager via Helm chart
- mimir-querier tries to connect to unknown pod IP HOT 1
- Flaky test `Test_shardActiveSeriesMiddleware_mergeResponse_contextCancellation/useZeroAllocationDecoder`
- mimir compactor: index contains 205 postings with out of order labels" HOT 1
- Chart 5.3.0: Mimir push API doesn't allow X-Scope-OrgID with special characters HOT 4
- target_info metric HOT 5
- Flaky TestSchedulerProcessor_processQueriesOnSingleStream
- Docs: where to find/download HOT 1
- Querier: panics in QueryResultRequest marshaling HOT 6
- Rejecting valid exemplars with err-mimir-exemplar-labels-missing HOT 8
- mixin: networking overview renders useless panels for the gateway HOT 1
- helm: Keda prometheus queries are not mimir-specific HOT 3
- Helm: Common volume mount
- [helm] Add support for the remote mode of the ruler in the helm chart
- Helm: missing support for spread-minizing token generation HOT 4
- Improving the Helm chart to integrate external object storage with Mimir. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.