Giter Club home page Giter Club logo

sidekiq-worker-killer's Introduction

sidekiq-worker-killer

Gem Version CircleCI

Sidekiq is probably the best background processing framework today. At the same time, memory leaks are very hard to tackle in Ruby and we often find ourselves with growing memory consumption. Instead of spending herculean effort fixing leaks, why not kill your processes when they got to be too large?

Highly inspired by Gitlab Sidekiq MemoryKiller and Noxa Sidekiq killer.


NOTE

This gem needs to get the used memory of the Sidekiq process. For this we use GetProcessGem, but be aware that if you are running Sidekiq in Heroku(or any container) the memory usage will not be accurate.


quick-refs: install | usage | available options | development

Install

Use Bundler

gem "sidekiq-worker-killer"

Usage

Add this to your Sidekiq configuration.

require 'sidekiq/worker_killer'

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.add Sidekiq::WorkerKiller, max_rss: 480
  end
end

Available options

The following options can be overridden.

Option Defaults Description
max_rss 0 MB (disabled) Max RSS in megabytes used by the Sidekiq process. Above this, shutdown will be triggered.
grace_time 900 seconds When shutdown is triggered, the Sidekiq process will not accept new job and wait at most 15 minutes for running jobs to finish. If Float::INFINITY specified, will wait forever.
shutdown_wait 30 seconds When the grace time expires, still running jobs get 30 seconds to stop. After that, kill signal is triggered.
kill_signal SIGKILL Signal to use to kill Sidekiq process if it doesn't stop.
gc true Try to run garbage collection before Sidekiq process stops in case of exceeded max_rss.
skip_shutdown_if proc {false} Executes a block of code after max_rss exceeds but before requesting shutdown.
on_shutdown nil Executes a block of code before just before requesting shutdown. This can be used to send custom logs or metrics to external services.

skip_shutdown_if is expected to return anything other than false or nil to skip shutdown.

require 'sidekiq/worker_killer'

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.add Sidekiq::WorkerKiller, max_rss: 480, skip_shutdown_if: ->(worker, job, queue) do
      worker.to_s == 'LongWorker'
    end
  end
end

Development

Pull Requests are very welcome!

There are tasks that may help you along the way in a makefile:

make console # Loads the whole stack in an IRB session.
make test # Run tests.
make lint # Run rubocop linter.

Please make sure that you have tested your code carefully before opening a PR, and make sure as well that you have no style issues.

Authors

See the list of contributors who participated in this project.

License

Please see LICENSE

sidekiq-worker-killer's People

Contributors

ben-j69 avatar bf4 avatar buonomo avatar ccyrille avatar igel avatar leemour avatar maximeflips avatar msxavi avatar nirei avatar quiwin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sidekiq-worker-killer's Issues

Grace time is not honoured

Hey guys, how have you been?

I'm back to seek some advice on understanding this log:

MARK1->>> 2019-11-12T19:47:09.019Z 39340 TID-orxe0j958 BatchBdfWorker JID-e558c0315ac8e0b8c6fad3e8 INFO: start
...
2019-11-12T19:48:27.758Z 39340 TID-orxe11jb0 Searchkick::ReindexV2Job JID-0bc7ec9d4c385bd6e2da3875 WARN: current RSS 1900.5859375 of nep-worker01:39340 exceeds maximum RSS 1900
2019-11-12T19:48:27.760Z 39340 TID-orxe27fjk ElasticSearchIndexer JID-49c3a202ee7c706f6a1b03c2 WARN: current RSS 1900.5859375 of nep-worker01:39340 exceeds maximum RSS 1900
...
MARK2->>>2019-11-12T19:48:28.161Z 39340 TID-orxcridrw WARN: sending TSTP to nep-worker01:39340
2019-11-12T19:48:28.162Z 39340 TID-orxbz008k DEBUG: Got TSTP signal
2019-11-12T19:48:28.162Z 39340 TID-orxbz008k INFO: Received TSTP, no longer accepting new work
2019-11-12T19:48:28.162Z 39340 TID-orxbz008k INFO: Terminating quiet workers
2019-11-12T19:48:28.162Z 39340 TID-orxdypz70 INFO: Scheduler exiting...
2019-11-12T19:48:28.163Z 39340 TID-orxdldnpg Searchkick::ReindexV2Job JID-8aad56ebc9be538cd510cf24 WARN: current RSS 1900.58984375 of nep-worker01:39340 exceeds maximum RSS 1900
2019-11-12T19:48:28.168Z 39340 TID-orxe27fjk ElasticSearchIndexer JID-49c3a202ee7c706f6a1b03c2 INFO: done: 4.424 sec
2019-11-12T19:48:28.170Z 39340 TID-orxdws200 Searchkick::ReindexV2Job JID-b101d9f01a4ea359a1704c31 WARN: current RSS 1900.58984375 of nep-worker01:39340 exceeds maximum RSS 1900
2019-11-12T19:48:28.274Z 39340 TID-orxdldnpg Searchkick::ReindexV2Job JID-8aad56ebc9be538cd510cf24 INFO: done: 4.77 sec
2019-11-12T19:48:28.274Z 39340 TID-orxdws200 Searchkick::ReindexV2Job JID-b101d9f01a4ea359a1704c31 INFO: done: 4.13 sec
MARK3->>>2019-11-12T19:48:33.166Z 39340 TID-orxcridrw WARN: shutting down nep-worker01:39340 in 900 seconds
MARK4->>>2019-11-12T19:48:33.191Z 39340 TID-orxcridrw WARN: sending SIGTERM to nep-worker01:39340
2019-11-12T19:48:33.191Z 39340 TID-orxcridrw WARN: waiting 30 seconds before sending SIGKILL to nep-worker01:39340
2019-11-12T19:48:33.191Z 39340 TID-orxbz008k DEBUG: Got TERM signal
2019-11-12T19:48:33.191Z 39340 TID-orxbz008k INFO: Shutting down
2019-11-12T19:48:33.304Z 39340 TID-orxbz008k INFO: Pausing to allow workers to finish...
2019-11-12T19:48:41.105Z 39340 TID-orxbz008k WARN: Terminating 1 busy worker threads
2019-11-12T19:48:41.105Z 39340 TID-orxbz008k WARN: Work still in progress [#<struct Sidekiq::LimitFetch::UnitOfWork queue="queue:bdf_imports", job="{\"class\":\"BatchBdfWorker\",\"args\":[96709],\"retry\":true,\"queue\":\"bdf_imports\",\"jid\":\"e558c0315ac8e0b8c6fad3e8\",\"created_at\":1573588028.9643388,\"apartment\":\"atwork\",\"enqueued_at\":1573588028.964405}">]
2019-11-12T19:48:41.105Z 39340 TID-orxbz008k DEBUG: Re-queueing terminated jobs
2019-11-12T19:48:41.106Z 39340 TID-orxbz008k INFO: Pushed 1 jobs back to Redis
2019-11-12T19:48:41.106Z 39340 TID-orxbz008k INFO: Bye!
  • MARK1 - a worker starts at 19:47:09
  • MARK2 - sends TSTP signal for exceeding max RSS at 19:48:28 because of a different worker
  • MARK3 - after 5 seconds (sleep(5) introduced in #10) logs shutting down in grace time (900 seconds)
  • MARK4 - sends SIGTERM on the same second, without waiting for grace time.

Could it be that sleep(5) is not enough for this: https://github.com/klaxit/sidekiq-worker-killer/blob/master/lib/sidekiq/worker_killer.rb#L86 to return up-to-date info?

How to stop sidekiq process and start again when using docker

I have sidekiq running as docker container. But when it reaches max_rss, sidekiq will be shutdown, docker container is also stopped then

  • As I see, docker container will be stopped and process will not wait for job to be done ( I guest so )
  • How we can start again ?
  • How we can wait for job done before shutdown sidekiq completely ?

What is max_rss measured against?

  • Memory used by an individual sidekiq process
  • Memory used by all sidekiq processes
  • Memory used server-wide

(Please document the answer)

Kill or restart?

Sorry if this is obvious, but does this kill and not start the worker back up, or will it do a restart. Thanks

Worker killer does not honor grace period

I've been using the newest version of this gem (6eb0dc3) to try out the new options introduced in 955cfea - especially infinite grace periods.

Right now, the grace period isn't honored at all and all currently running jobs are forcefully shut down immediately. The following log shows SIGTERM is sent right after the SIGTSTP (instead of waiting 20 seconds):

2019-05-22T11:28:53.240Z 3376 TID-gqukurqkw MyJob JID-69c524c868805f8ec3229f1a INFO: start
2019-05-22T11:31:20.067Z 3376 TID-gqukurqkw MyJob JID-69c524c868805f8ec3229f1a WARN: current RSS 806.83984375 of my.host:3376 exceeds maximum RSS 512
2019-05-22T11:31:20.077Z 3376 TID-gqukurqkw MyJob JID-69c524c868805f8ec3229f1a INFO: done: 146.838 sec
2019-05-22T11:31:20.078Z 3376 TID-gqukic4eg WARN: sending TSTP to my.host:3376
2019-05-22T11:31:20.078Z 3376 TID-gqukic4eg WARN: shutting down my.host:3376 in 20 seconds
2019-05-22T11:31:20.079Z 3376 TID-gqukey6oo INFO: Received TSTP, no longer accepting new work
2019-05-22T11:31:20.080Z 3376 TID-gqukic4eg WARN: sending SIGTERM to my.host:3376
2019-05-22T11:31:20.080Z 3376 TID-gqukic4eg WARN: waiting 30 seconds before sending SIGKILL to my.host:3376
2019-05-22T11:31:20.080Z 3376 TID-gqukurz8o INFO: Scheduler exiting...
2019-05-22T11:31:20.081Z 3376 TID-gqukey6oo INFO: Terminating quiet workers
2019-05-22T11:31:20.081Z 3376 TID-gqukurpww INFO: Scheduler exiting...
2019-05-22T11:31:20.081Z 3376 TID-gqukey6oo INFO: Shutting down
2019-05-22T11:31:20.582Z 3376 TID-gqukey6oo INFO: Pausing to allow workers to finish...
2019-05-22T11:31:22.586Z 3376 TID-gqukey6oo INFO: Bye!

I've tracked that bug down to this method:

def wait_job_finish_in_grace_time
start = Time.now
loop do
break if grace_time_exceeded?(start)
break if no_jobs_on_quiet_processes?
sleep(1)
end
end

no_jobs_on_quiet_processes? is always true.
This is because there are two implementations of that method, with the second overwriting the first:

def no_jobs_on_quiet_processes?
Sidekiq::ProcessSet.new.each do |process|
return false if !process["busy"].zero? && process["quiet"]
end
true
end

def no_jobs_on_quiet_processes?
Sidekiq::ProcessSet.new.each do |process|
return false if !process["busy"] == 0 && process["quiet"]
end
true
end

The second implementation will always return true because !process["busy"] == 0 is always false (missing parentheses).

The duplicate method was introduced in e06864f. Removing the second implementation should fix this issue.

By the way: The property process["quiet"] always returns a string (at least on Sidekiq 5.2.7 it returns either "true" or "false"), so the second condition is always true.

Add `on_shutdown` hook

This is a proposal to add another hook that is simply called whenever a shutdown finally happens. This would give the client the chance to customize any actions taken before shutdown like logging to custom services or incrementing a Redis counter to ban processes that consistently step out of their quota (this are exactly my real-life usecases, but probably there are more.)

Currently, the library supports the skip_shutdown_if callback to prevent unwanted shutdowns. We plan on using the skip_shutdown_if to implement this actions, but this is semantically troublesome and causes a bad separation of concerns, thus the need for an independent on_shutdown callback.

We could also proceed to implement the current logging message through this facility as the default option, so it's possible for the user to completely override it if desired.

Option Defaults Description
on_shutdown proc {void} Executes a block of code when the shutdown has been requested.

I'd be willing to write a PR myself if this is something that you consider fits with the gem's intent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.