Comments (4)
oh, now I think I understand. So what happens is
2. go DefaultMultiTenantManager.Start()
go Manager.Run()
- run test
DefaultMultiTenantManager.Stop()
TestRuler_TenantFederationFlag
finishes executionManager.Run()
is actually scheduled- panic because
Manager.Run()
attempts to log something viatesting.T
Do I have this right?
- Or, update
github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Stop()
to wait until itsr.userManagers
actually run.
This sounds a bit more intuitive to me because it disallows the stopped -> running
transition. But I'm not sure if there are some cases where one of the r.userManagers
can be stopped before DefaultMultiTenantManager.Stop()
is invoked and that one r.userManagers
is in the process of stopping. Can we not start the Manager
only if it hasn't been stopped yet?
from mimir.
If I follow it right, this happens when a goroutine github.com/prometheus/prometheus/rules.(*Manager).Run()
tries to log into the testing logger AFTER all parents of the logger's *testing.T
have finished.
Update: I managed to reproduce this one (sort of), with an artificially slow logger, and running the tests many times, in a cycle:
% go test -cpu 2 -v ./pkg/ruler/ -count 1000 -run 'TestRuler_TenantFederationFlag' -race
···
panic: Log in goroutine after TestRuler_TenantFederationFlag/tenant_federation_enabled_with_federated_and_regular_groups/SyncPartialRuleGroups() has completed: instance localhost level info component ruler insight true user tenant-1 msg Starting rule manager...
goroutine 491444 [running]:
testing.(*common).logDepth(0xc000addba0, {0xc0007e60e0, 0x66}, 0x3)
/Users/v/.local/opt/[email protected]/src/testing/testing.go:1028 +0x4f8
testing.(*common).log(...)
/Users/v/.local/opt/[email protected]/src/testing/testing.go:1010
testing.(*common).Log(0xc000addba0, {0xc0007658c0, 0xc, 0xc})
/Users/v/.local/opt/[email protected]/src/testing/testing.go:1051 +0x74
github.com/prometheus/prometheus/util/testutil.logger.Log({0x107f6bb00?}, {0xc0007658c0, 0xc, 0xc})
/Users/v/Documents/Code/grafana/mimir/vendor/github.com/prometheus/prometheus/util/testutil/logging.go:33 +0x48
github.com/go-kit/log.(*context).Log(0xc000d66af0, {0xc0000c4100, 0xa, 0xc000f78800?})
/Users/v/Documents/Code/grafana/mimir/vendor/github.com/go-kit/log/log.go:168 +0x444
github.com/go-kit/log/level.(*logger).Log(0xc000f44840, {0xc0000c4100, 0xa, 0x10})
/Users/v/Documents/Code/grafana/mimir/vendor/github.com/go-kit/log/level/level.go:71 +0x1b8
github.com/go-kit/log.(*context).Log(0xc00110e6e0, {0xc000f78800, 0x2, 0x2?})
/Users/v/Documents/Code/grafana/mimir/vendor/github.com/go-kit/log/log.go:168 +0x444
github.com/prometheus/prometheus/rules.(*Manager).Run(0xc000d674f0)
/Users/v/Documents/Code/grafana/mimir/vendor/github.com/prometheus/prometheus/rules/manager.go:177 +0x1d8
created by github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Start in goroutine 491376
/Users/v/Documents/Code/grafana/mimir/pkg/ruler/manager.go:159 +0x110
FAIL github.com/grafana/mimir/pkg/ruler 14.777s
I can think of two ways to fix that:
- Either update
github.com/prometheus/prometheus/rules.(*Manager).Stop()
to wait until itsRun()
exits. - Or, update
github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Stop()
to wait until itsr.userManagers
actually run.
That is, there is a race between how mimir calls rules.(*Manager).Run()
and rules.(*Manager).Stop()
. In practice, though, I doubt this to happen outside the tests.
from mimir.
Isn't there
2. Or, update
github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Stop()
to wait until itsr.userManagers
actually run.
Assuming you meant "wait until its r.userManagers
actually stop:" Isn't this already happening via the WaitGroup?
Lines 378 to 390 in bd60f69
from mimir.
Assuming you meant "wait until its r.userManagers actually stop
Not really :) We spawn a goroutine when we call RulesManager.Run()
(see code chunk 1). This creates a race between r.userManagers
's start and stop.
That is, if the code is equivalent to a simplified version below, nothing inside the Stop
guaranties that Run
has already happened:
func Start() {
go userManagers.Run()
}
func Stop() {
userManagers.Stop() // 👈 can happen before Run() if goroutine inside Start was delaied by the runtime
}
Start()
Stop()
Footnotes
from mimir.
Related Issues (20)
- Azure Workload Identity not working for mimir HOT 2
- Flaky `TestDistributor/caching_unmarshal_data_enabled/series_with_exemplars` HOT 10
- Flaky TestQuerierTenantFederationWithShuffleSharding
- [mimir-distributed] kedaAutoscaling can set threshold to 0 HOT 3
- [mimir-distributed] kube-state-metrics serviceMonitor port should be configurable
- Ruler: backend local support glob format for rules path HOT 2
- Allow for whole-cluster scraping selectors from mimir-distributed Helm-Chart HOT 2
- Docs: Review `docs/sources/mimir/set-up/jsonnet/configure-autoscaling.md`
- Limit per-series data points per minute HOT 2
- [mimir-distributed] [helm] The store gateway memory limit cannot be specified with K/M/G/T units (instead of Ki/Mi/Gi/Ti)
- Docs: document the configuration file's format HOT 4
- Gap in Read and Write when HA Prometheus replica changes HOT 4
- Parameter to change the period of mimir sending data to object storage HOT 1
- The bucket you are attempting to access must be addressed using COS virtual-styled domain HOT 1
- .metricsUsed no more populated when analyzing dashboards list using mimirtool analyze dashboard HOT 9
- Mimir mixin Disk space utilization panels broken for mimir helm chart HOT 1
- Meta-monitoring grafana agent crashing: no such file or directory HOT 2
- Release 2.12 HOT 1
- Flaky TestStoreGatewayStreamReader_ReceivedMoreSeriesThanExpected
- Mimir mixin: alerts make range interval configurable HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.