Comments (72)
I think you should have an initializer named scheduler.rb
with the following:
require 'sidekiq-scheduler'
Sidekiq.configure_server do |config|
schedule_file = 'config/schedule.yml'
config.on(:startup) do
if ENV['SCHEDULER'] == 'yes'
Sidekiq.schedule = YAML.load_file(schedule_file)
Sidekiq::Scheduler.listened_queues_only = false
Sidekiq::Scheduler.load_schedule!
end
end
end
You'll need to add your specific configuration for rails env, redis, and the like ...
Then, your Procfile
should be something like:
scheduler: SCHEDULER=yes bundle exec sidekiq
low_worker: bundle exec sidekiq -q low -c $low_worker
slow_worker: bundle exec sidekiq -q slow -c $slow_worker
That would do the trick.
from sidekiq-scheduler.
@jclusso, @cabello In master
branch is avaiable a the new approach that should solve the problem of duplicated cron
and at
jobs. Within this approach, it won't be necessary to have a separate process responsible only of job scheduling.
As it's not part of a release you should modify your Gemfile to having this:
gem 'sidekiq-scheduler', git: 'https://github.com/moove-it/sidekiq-scheduler.git'
And then run bundle update sidekiq-scheduler
. Your resulting Gemfile.lock
should have smth like:
GIT
remote: https://github.com/moove-it/sidekiq-scheduler.git
revision: ee317fd427bf600ba2562f9ae3cc64b9880d0e68
specs:
sidekiq-scheduler (2.0.6)
...
As there's not a generic solution without making the best effort of not increasing the number of Redis operations, we are working on solutions for interval
, in
and every
jobs.
from sidekiq-scheduler.
@polysaturate we're running it in parallel on several VMs on amazon, and @jclusso is running it on an autoscaling group on heroku, so I think it's safe to assume that any other implementation of a similar cloud stack will work without issues, assuming everything's talking to the same redis server/cluster (we've processed about 100k jobs on it, well only about 5k were scheduled, since we deployed to production, so it definitely works).
from sidekiq-scheduler.
I'm using sidekiq-scheduler and haven't noticed that issue. Maybe you can provide some additional information that would help. For example, you can provide the schedule that you're using, and/or a sample of how you're configuring it. Maybe some information about the architecture/environment where you are noticing this issue.
from sidekiq-scheduler.
The ones in red I can confirm run 2x the amount but the others I haven't checked yet. In sidekiq 3 using sidekiq-cron all works well. In sidekiq 4 using sidekiq-cron jobs run half as often where as this gem runs them twice as often.
from sidekiq-scheduler.
@andrewhavens how do you initialize your schedule out of curiosity?
from sidekiq-scheduler.
I have my recurring jobs listed in my sidekiq.yml
file:
# ...standard sidekiq config here...
:schedule:
daily_digests:
description: Generate daily digest emails
cron: "0 8 * * * America/Los_Angeles" # Top of the hour, 8am, every day
class: DigestsJob
args: daily
from sidekiq-scheduler.
@andrewhavens but which way are you loading it.
from sidekiq-scheduler.
@jclusso I experienced this issue on 2.0.0, but I do not see it on 2.0.4, fyi
from sidekiq-scheduler.
@erichummel I'll try upgrading to 2.0.4. Thanks!
from sidekiq-scheduler.
No luck with 2.0.4 @erichummel
from sidekiq-scheduler.
@jclusso My sidekiq.yml is loaded automatically. There is no additional code required.
from sidekiq-scheduler.
I have code in an initializer that loads the schedules for a given environment, I will post it here as soon as I'm back at my home machine ~1 hr. I definitely saw the double scheduling issue, and it was definitely fixed in 2.0.4 for me, hopefully it will help you out
from sidekiq-scheduler.
this is how we load our schedules:
Sidekiq.configure_server do |config|
config.redis = redis_config
config.on(:startup) do
schedule_file = "#{Rails.root}/config/cron.#{Rails.env}.yml"
if File.exists?(schedule_file)
Sidekiq.schedule = YAML.load_file(schedule_file)
Sidekiq::Scheduler.load_schedule!
end
end
end
from sidekiq-scheduler.
^ that is a snippet from our config/initializers/sidekiq.rb file
from sidekiq-scheduler.
Our code is very similar... it's just modified slightly so we can specify different schedules based on the environment. The only notable difference is we have reload_schedule!
instead of load_schedule!
host = ENV['redis_host']
port = ENV['redis_port']
db = ENV['redis_db_sidekiq']
namespace = ENV['redis_namespace']
if Rails.env.development? || Rails.env.test?
redis_url = "redis://#{host}:#{port}/#{db}"
else
password = ENV['redis_password']
redis_url = "redis://:#{password}@#{host}:#{port}/#{db}"
end
Sidekiq.configure_server do |config|
config.redis = { url: redis_url, namespace: namespace }
schedule_file = 'config/schedule.yml'
config.on(:startup) do
Sidekiq.schedule = YAML.load_file(schedule_file)[Rails.env]
Sidekiq::Scheduler.reload_schedule!
end
end
Sidekiq.configure_client do |config|
config.redis = { url: redis_url, namespace: namespace }
end
from sidekiq-scheduler.
Tried running with load_schedule!
and even with the GitHub version of this gem with no luck. This is our schedule.
default: &default
name_of_worker:
cron: '*/10 * * * *'
class: 'ClassName'
queue: low
name_of_worker:
cron: '* * * * *'
class: 'ClassName'
queue: slow
name_of_worker:
cron: '* * * * *'
class: 'ClassName'
queue: slow
name_of_worker:
cron: '* * * * *'
class: 'ClassName'
queue: slow
name_of_worker:
cron: '*/5 * * * *'
class: 'ClassName'
queue: low
name_of_worker:
cron: '0 * * * *'
class: 'ClassName'
queue: slow
name_of_worker:
cron: '0 0 * * *'
class: 'ClassName'
queue: slow
name_of_worker:
cron: '0 * * * *'
class: 'ClassName'
queue: slow
name_of_worker:
cron: '0 * * * *'
class: 'ClassName'
queue: low
development: &development
<< : *default
test: *development
staging: &staging
<< : *default
production: &production
<< : *default
name_of_worker:
cron: '*/5 * * * *'
class: 'ClassName'
queue: low
name_of_worker:
cron: '*/5 * * * *'
class: 'ClassName'
queue: low
from sidekiq-scheduler.
Hi @jclusso.
Are you still experiencing this problem?
Are you sure you don't have 2 or more running sidekiq process?
It would be great if you could isolate the jobs that are being run twice, post the YAML config having only those jobs, and provide a sample scaffold of one of your worker classes.
from sidekiq-scheduler.
@snmgian yes, we are still experiencing this problem. No other code has changed between trying to upgrade from sidekiq 3 to 4 and we don't have these issues. We are experiencing half as many jobs with the sidekiq-cron gem and twice as many jobs with this gem. Something must have changed that is affecting these scheduling gems in an odd way....
from sidekiq-scheduler.
I see you posted this code snippet:
Sidekiq.configure_server do |config|
config.redis = { url: redis_url, namespace: namespace }
schedule_file = 'config/schedule.yml'
config.on(:startup) do
Sidekiq.schedule = YAML.load_file(schedule_file)[Rails.env]
Sidekiq::Scheduler.reload_schedule!
end
end
Sidekiq.configure_client do |config|
config.redis = { url: redis_url, namespace: namespace }
end
Two questions:
- Is it in
config/initializers/sidekiq.rb
? - It's part of a Rails app?
from sidekiq-scheduler.
@snmgian yes and yes.
from sidekiq-scheduler.
I can't see anything strange in your setup, so I have an extra questions for you :)
- Which command do you use to start sidekiq?
- How many sidekiq processes are you running?
- The .yml config you provided is a bit shadowed, I would bet that it is because of some confidentiality policy. Could you provide a more realistic one while keeping the required confidentiality? I see that all of your workers are named the same.
from sidekiq-scheduler.
- we use
bundle exec sidekiq -q slow -c 20
- anywhere from 1-30+ depending on our autoscaler. we are on Heroku and scale up and down all day
- yes it is because we don't want to expose certain information that is confidential. rest assured that all of the classes are unique and we are not running multiple schedules for a single class or something.
from sidekiq-scheduler.
The thing is that each of your sidekiq instances will schedule the jobs. So, if you have 30 instances, each having a sidekiq.yml
containing this schedule:
name_of_worker:
cron: '* * * * *'
class: 'ClassName'
queue: slow
the job will be scheduled 30 times, once per instance.
You could have on sidekiq process responsible only for scheduling the jobs, and then a group of sidekiq processes that dequeue jobs and execute them.
from sidekiq-scheduler.
I'm sure heroku is handling this properly and it what you were saying was the case we wouldn't have exactly 2x of the jobs. We'd have like random amounts from 2-30x+
from sidekiq-scheduler.
Hi @jclusso,
Is it possible for you to include the Procfile you are using?
That could help us a lot on the resolution of this issue.
Thanks
from sidekiq-scheduler.
web: bundle exec puma -C ./config/puma.rb
default_worker: bundle exec sidekiq -c $default_worker
low_worker: bundle exec sidekiq -q low -c $low_worker
slow_worker: bundle exec sidekiq -q slow -c $slow_worker
post_worker: bundle exec sidekiq -q post -c $post_worker
fabric_worker: bundle exec sidekiq -q fabric -c $fabric_worker
elasticsearch_worker: bundle exec sidekiq -q elasticsearch -c $elasticsearch_worker
from sidekiq-scheduler.
Hi @jclusso,
Seems that the same config file (sidekiq.yml) is read by all your sidekiq processes.
Try to point your Gemfile to github. A code change that only schedule jobs for queues that sidekiq is listening on was merged some days ago. This feature is still experimental.
gem 'sidekiq-scheduler', git: 'https://github.com/moove-it/sidekiq-scheduler.git'
from sidekiq-scheduler.
@jclusso, if you point your Gemfile to github, add this to your sidekiq.yml
:scheduler:
:listened_queues_only: true
listened_queues_only
flag set to true
will tell the scheduler to not push jobs whose queues are not being fetched by the local sidekiq process.
from sidekiq-scheduler.
Will give it a try shortly @snmgian
from sidekiq-scheduler.
@snmgian we don't even have a sidekiq.yml
.... we also don't run sidekiq processes with a config file so would I have to change all of them to use the config file instead?
from sidekiq-scheduler.
@jclusso I see you are configuring Sidekiq within a initializer:
config.on(:startup) do
Sidekiq.schedule = YAML.load_file(schedule_file)[Rails.env]
Sidekiq::Scheduler.reload_schedule!
end
Try adding Sidekiq::Scheduler.listened_queues_only = true
before reloading the schedule. That will have the same effect as reading the config from sidekiq.yml
from sidekiq-scheduler.
@snmgian, adding this line gave this error
NoMethodError·undefined method
listened_queues_only=' for Sidekiq::Scheduler:Class`
from sidekiq-scheduler.
Do you have the following entry in your Gemfile
?
gem 'sidekiq-scheduler', git: 'https://github.com/moove-it/sidekiq-scheduler.git'
If still throws NoMethodError
, try executing bundle update sidekiq-scheduler
from sidekiq-scheduler.
Yes and I've done that already.
from sidekiq-scheduler.
I just tested it in a fresh project and didn't face the NoMethodError
problem, these method has been added recently.
Your Gemfile.lock should have the following (bundle update sidekiq-scheduler
should have cared about that)
GIT
remote: https://github.com/moove-it/sidekiq-scheduler.git
revision: fc87e6358141b98bde51c74afe119abf441a740a
specs:
sidekiq-scheduler (2.0.5)
multi_json (~> 1)
redis (~> 3)
rufus-scheduler (~> 3.1.8)
sidekiq (>= 3)
tilt (>= 1.4.0)
Also, make sure that you don't have some other version of sidekiq-scheduler
installed. Verify it with gem list
, and remove it if it appears in the list.
from sidekiq-scheduler.
@snmgian I just noticed that you told @jclusso to add this to the sidekiq.yml
file:
:scheduler:
:listened_queues_only: true
...but isn't the config namespaced under schedule
?
:schedule:
:listened_queues_only: true
from sidekiq-scheduler.
This appears to possibly be working now that I got it updated @snmgian. I had to force it to use 2.0.5 because it wouldn't update just by running bundle update sidekiq-scheduler
.
from sidekiq-scheduler.
@jclusso, it's nice to hear that! Please give it some more runs, a new release is planned for next week.
from sidekiq-scheduler.
@andrewhavens the namespace, it's a new one, is scheduler
I didn't want to put the new flag under schedule
just to not mix up different things like config flags and the proper cron schedule.
from sidekiq-scheduler.
@snmgian actually after testing further it appears jobs run far more often than 2x sometimes. Jobs run more depending on dynos now I think.
from sidekiq-scheduler.
If the number of times a job is scheduled depends on the number of dynos, then the problem could be solved by having only one sidekiq process responsible of scheduling jobs, and then several other sidekiq process that will take the jobs and perform it.
Those sidekiqs could be scaled up and down depending on your needs, but the scheduler will need to remain as only one instance.
If you need help setting up that scenario, please let me know.
from sidekiq-scheduler.
@snmgian yes, how would you go about setting up that scenario?
from sidekiq-scheduler.
@snmgian I see how this would work....
from sidekiq-scheduler.
I think that it will work if you don't scale up scheduler
process type. The other workers can be scaled up depending on your processing needs.
from sidekiq-scheduler.
@jclusso Did you try having the scheduler on a separate process?
from sidekiq-scheduler.
We haven't yet but will soon.
from sidekiq-scheduler.
I am also facing this problem. I have a heroku app using 3x sidekiq instances and every single instance ran my hourly job at the scheduled time while I was expecting only one job to be scheduled.
from sidekiq-scheduler.
To be honest @snmgian I really don't like that solution. It doesn't seem like the proper way this should work. It does seem like a potential work around though.
from sidekiq-scheduler.
By default, within each sidekiq instance there is a sidekiq-scheduler 'instance', that's the reason why jobs are ran more than once.
The solution is to have a sidekiq instance that only runs the scheduler responsible of enqueueing jobs, and then have as many sidekiq instances as needed, running without any schedule configuration, that execute those jobs enqueued by the scheduler instance.
I agree that this is not the behaviour that most people would expect, and having an extra sidekiq instance could not be cheap in some scenarios, so I'll work on this.
It seems that checking if the job has been already enqueued for a particular time execution will be the way to go. I have a working draft, but need some more testing.
from sidekiq-scheduler.
Maybe some inspiration for how it can be done could be gotten from the sidekiq-cron gem. That gem doesn't have this problem, however it has it's own set of issues where it schedules half as often on Sidekiq 4 which is the original reason I started trying out this gem.
from sidekiq-scheduler.
Sure. Actually, I took a look at the source code of that gem when I was evaluating options to handle this issue.
from sidekiq-scheduler.
An idea: leverage redis to record the leader of the schedule, so when you have X instances, only one instance is the leader and has the responsibility of scheduling the jobs.
You would need the leader to perform a heartbeat and keep redis up to date, if there is no leader the another sidekiq instance would take over.
... maybe I am overthinking this.
from sidekiq-scheduler.
The other thing I noticed after installing sidekiq-scheduler
is that the number of threads increased by an unknown factor.
from sidekiq-scheduler.
@cabello Those threads are fired up by rufus-scheduler (https://github.com/jmettraux/rufus-scheduler), it has a thread-pool from which selects a thread and executes a job. That job's only purpose is to push the worker into sidekiq, so they're not doing too much work. You can open a new issue if you think it's appropriate.
from sidekiq-scheduler.
@snmgian, Awesome! We'll test this out in our staging environment shortly and let you know how it runs.
from sidekiq-scheduler.
@snmgian we tested this in our staging environment and it appears to be working fine with multiple dynos. I'll keep you posted how it handles in our production environment.
from sidekiq-scheduler.
We're running this in production with no issues so far.
from sidekiq-scheduler.
All is running fine in production now!
from sidekiq-scheduler.
Thanks guys for testing this.
@cabello did you have any chance to give it some testing?
from sidekiq-scheduler.
@snmgian I would assume the new code on master would work in a situation where you horizontally scale on Elastic Beanstalk or Cloud66 with sidekiq being loaded on each web server?
from sidekiq-scheduler.
Thanks for the support guys! I will be trying it again soon.
from sidekiq-scheduler.
@erichummel Thanks. It was more of a "Pretty sure I followed the whole issue thread and wanted to verify my assumption after reading the code". @snmgian Awesome work on the code!
from sidekiq-scheduler.
@polysaturate no problem. I've been watching it run for the past 30 hours or so and I'm confident it's working properly. Our biggest traffic day (probably 1000-5000x our normal daily traffic in the space of a few hours) of the year is coming up next Saturday, and I'm not nervous about sidekiq-scheduler; just everything else ;)
from sidekiq-scheduler.
Hi @polysaturate. It's worth mentioning what @erichummel said, all of your sidekiq-scheduler instances should be talking to the same Redis.
Some state that tracks scheduling timestamps is being stored in Redis, so to prevent duplicates you'll need to point all your instances to the same Redis. It will work if you have Redis Cluster as well because it's handled transparently by Redis.
In general, the code has spots in which its quality could be improved, so we'll go in that direction soon.
from sidekiq-scheduler.
@snmgian, I was wondering if this could be related to sidekiq-scheduler
. We have one worker that gets a very large number (thousands per minute sometimes) scheduled on it. We noticed we started getting some memory issues on Heroku once we upgraded. When digging deeper it appears to be using more memory on the first instance. For example, the instances each have 1GB of memory and the first one is using 1GB, but the rest are only using around 360MB. Is this because the scheduler is running on that first instance?
After thinking about this, should we maybe be running a separate dyno which you mentioned above to handle the scheduling?
from sidekiq-scheduler.
@jclusso, The scheduler runs on every instance.
All of the instances you mentioned, the one with 1 GB of used memory, and the rest using 360 MB, are running the same process type? (a process type is what you specify in the Procfile)
Could you verify if there's a substantial difference on the number of processed jobs accross different instances?
sidekiq-scheduler
only stores the Hash of schedules (the parsed YAML) in memory, and the internal does not handle too much data. So, it doesn't seem to be possibly using hundreds of MB of memory. It also runs on all the instances, so, if it is suffering some memory leak it would affect all of your instances.
from sidekiq-scheduler.
@snmgian yes, they are running the same processes. I'm not sure of a good way to verify the processes jobs across instances... any suggestions?
Also, after yesterdays comment I tried the schedule on it's own dyno which made no difference. I'm not sure if this issue is related to the scheduler or just Sidekiq 4. Just seems odd since Sidekiq 4 mentions all sorts of improvements in memory and CPU usage.
from sidekiq-scheduler.
Sidekiq writes a log message when a job starts and ends processing, so you could filter those log entries to get the number of processed jobs.
One cause of such difference in memory usage could be due to the first instance processing a much more jobs than the other instances.
I don't know which type of processing are your workers doing, neither how much running jobs at the same time you have, so I'll have to ask if you think that a 360 MB memory usage is reasonable or not.
I'm just looking through some of the instances we have, it seems that differences in memory usage are more related to (all other things being equal) the number of processed jobs (Sidekiq side) in a given timeframe.
from sidekiq-scheduler.
I'll check into our logs and see if we can come up with a clear idea of that. I'm not sure why it wouldn't be pretty equal. That would make me believe there is a much deeper lower level issue in Sidekiq thought. Most of our jobs on the queue we are discussing are making a single API request and updating some models in our database. The majority take about 1-2s and are fairly similar. The spread of when they run is also very equal all throughout the day. The nature of what we are doing is a constant set of work so it's not like we have these big spikes or anything of the sort.
As for what is high memory usage 360MB is not high. We try and optimize our concurrency to hit just under 1GB since our dyno size is 1GB and we want to use as much memory as possible without exceeding. With the older version of Sidekiq and the other scheduling gem we used we were able to hit a concurrency of 45. With the updates I've decreased concurrency all the way to 5 and still saw the memory errors.
What's interesting to note is that the performance impact isn't drastic. We see maybe a 20-30% increase in speed to execute a job by lowering the concurrency from 45 to 5, but changing it from 45 to 30 or even like 45 to 25 barely has an impact.
from sidekiq-scheduler.
Hi @jclusso, do you have any updates on this?
I'm heading to close this issue. You can open a new issue if you think is appropriate :)
from sidekiq-scheduler.
Still haven't solved the issues full on our end. We are digging through the code in our workers to see if we can track down what is causing this.
from sidekiq-scheduler.
Related Issues (20)
- Did sidekiq-scheduler 5.0.0 add a Rails dependency? HOT 3
- Raises an exception when rufus_scheduler_options is nil by default HOT 3
- Could not find gem 'sidekiq-scheduler (= 5.0.1)' with platform 'arm64-darwin-21' HOT 1
- Sidekiq version requirement HOT 1
- Run job multiple times, but never at the same time HOT 3
- Uppercase character in header name: Cache-Control
- Sidekiq 7.1 warns when deprecated Redis commands are used
- Cron expression with day of month does not work HOT 4
- Support for Ruby 3.0 and deconstruct Hashes as keyword arguments. HOT 5
- Move away from deprecated Redis commands HOT 2
- undefined method `next_time' for nil:NilClass HOT 5
- re-use Sidekiq Rack::Static instance for assets
- RedisManager does not appear to be updating in test env HOT 6
- Sidekick (docker) deprecation warning HOT 1
- UI not loading on heroku HOT 7
- Error handler threw an error HOT 2
- [feature request] environment specific schedules HOT 1
- Recommend Sidekiq::Job instead of Sidekiq::Worker HOT 1
- How to test that sidekiq.yml is *correct*? HOT 3
- SidekiqScheduler::Manager enormous inspect output HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sidekiq-scheduler.