Giter Club home page Giter Club logo

Comments (4)

dimitarvdimitrov avatar dimitarvdimitrov commented on May 29, 2024 1

oh, now I think I understand. So what happens is
2. go DefaultMultiTenantManager.Start()

  1. go Manager.Run()
  2. run test
  3. DefaultMultiTenantManager.Stop()
  4. TestRuler_TenantFederationFlag finishes execution
  5. Manager.Run() is actually scheduled
  6. panic because Manager.Run() attempts to log something via testing.T

Do I have this right?

  • Or, update github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Stop() to wait until its r.userManagers actually run.

This sounds a bit more intuitive to me because it disallows the stopped -> running transition. But I'm not sure if there are some cases where one of the r.userManagers can be stopped before DefaultMultiTenantManager.Stop() is invoked and that one r.userManagers is in the process of stopping. Can we not start the Manager only if it hasn't been stopped yet?

from mimir.

narqo avatar narqo commented on May 29, 2024

If I follow it right, this happens when a goroutine github.com/prometheus/prometheus/rules.(*Manager).Run() tries to log into the testing logger AFTER all parents of the logger's *testing.T have finished.

Update: I managed to reproduce this one (sort of), with an artificially slow logger, and running the tests many times, in a cycle:

% go test -cpu 2 -v ./pkg/ruler/ -count 1000 -run 'TestRuler_TenantFederationFlag' -race
···
panic: Log in goroutine after TestRuler_TenantFederationFlag/tenant_federation_enabled_with_federated_and_regular_groups/SyncPartialRuleGroups() has completed: instance localhost level info component ruler insight true user tenant-1 msg Starting rule manager...


goroutine 491444 [running]:
testing.(*common).logDepth(0xc000addba0, {0xc0007e60e0, 0x66}, 0x3)
	/Users/v/.local/opt/[email protected]/src/testing/testing.go:1028 +0x4f8
testing.(*common).log(...)
	/Users/v/.local/opt/[email protected]/src/testing/testing.go:1010
testing.(*common).Log(0xc000addba0, {0xc0007658c0, 0xc, 0xc})
	/Users/v/.local/opt/[email protected]/src/testing/testing.go:1051 +0x74
github.com/prometheus/prometheus/util/testutil.logger.Log({0x107f6bb00?}, {0xc0007658c0, 0xc, 0xc})
	/Users/v/Documents/Code/grafana/mimir/vendor/github.com/prometheus/prometheus/util/testutil/logging.go:33 +0x48
github.com/go-kit/log.(*context).Log(0xc000d66af0, {0xc0000c4100, 0xa, 0xc000f78800?})
	/Users/v/Documents/Code/grafana/mimir/vendor/github.com/go-kit/log/log.go:168 +0x444
github.com/go-kit/log/level.(*logger).Log(0xc000f44840, {0xc0000c4100, 0xa, 0x10})
	/Users/v/Documents/Code/grafana/mimir/vendor/github.com/go-kit/log/level/level.go:71 +0x1b8
github.com/go-kit/log.(*context).Log(0xc00110e6e0, {0xc000f78800, 0x2, 0x2?})
	/Users/v/Documents/Code/grafana/mimir/vendor/github.com/go-kit/log/log.go:168 +0x444
github.com/prometheus/prometheus/rules.(*Manager).Run(0xc000d674f0)
	/Users/v/Documents/Code/grafana/mimir/vendor/github.com/prometheus/prometheus/rules/manager.go:177 +0x1d8
created by github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Start in goroutine 491376
	/Users/v/Documents/Code/grafana/mimir/pkg/ruler/manager.go:159 +0x110
FAIL	github.com/grafana/mimir/pkg/ruler	14.777s

I can think of two ways to fix that:

  1. Either update github.com/prometheus/prometheus/rules.(*Manager).Stop() to wait until its Run() exits.
  2. Or, update github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Stop() to wait until its r.userManagers actually run.

That is, there is a race between how mimir calls rules.(*Manager).Run() and rules.(*Manager).Stop(). In practice, though, I doubt this to happen outside the tests.

from mimir.

dimitarvdimitrov avatar dimitarvdimitrov commented on May 29, 2024

Isn't there

2. Or, update github.com/grafana/mimir/pkg/ruler.(*DefaultMultiTenantManager).Stop() to wait until its r.userManagers actually run.

Assuming you meant "wait until its r.userManagers actually stop:" Isn't this already happening via the WaitGroup?

mimir/pkg/ruler/manager.go

Lines 378 to 390 in bd60f69

wg := sync.WaitGroup{}
r.userManagerMtx.Lock()
for userID, manager := range r.userManagers {
level.Debug(r.logger).Log("msg", "shutting down user manager", "user", userID)
wg.Add(1)
go func(manager RulesManager, user string) {
manager.Stop()
wg.Done()
level.Debug(r.logger).Log("msg", "user manager shut down", "user", user)
}(manager, userID)
delete(r.userManagers, userID)
}
wg.Wait()

from mimir.

narqo avatar narqo commented on May 29, 2024

Assuming you meant "wait until its r.userManagers actually stop

Not really :) We spawn a goroutine when we call RulesManager.Run() (see code chunk 1). This creates a race between r.userManagers's start and stop.

That is, if the code is equivalent to a simplified version below, nothing inside the Stop guaranties that Run has already happened:

func Start() {
    go userManagers.Run()
}

func Stop() {
    userManagers.Stop() // 👈 can happen before Run() if goroutine inside Start was delaied by the runtime
}

Start()
Stop()

Footnotes

  1. https://github.com/grafana/mimir/blob/bd60f69e4c172d1d8c10b5adf881b76f0a2bf0ff/pkg/ruler/manager.go#L158-L160

from mimir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.