Comments (14)
Ah ok. Django-RQ's rqworker
management command closes all DB connections before running worker.work()
, perhaps we'll need to do something similar here.
I can think of two options:
- RQ's
worker-pool
already accepts--worker-class
argument. Django RQ can pass in its ownWorker
class that closes all DB connections prior to working. https://github.com/rq/rq/blob/master/rq/worker_pool.py#L42 - Django RQ creates its own
WorkerPool
class that closes all DB connections before running the workers.
I can provide hooks to make it easier, but it'd be better to have working proof of concepts so I know which hooks to provide. I was thinking we could do this via Worker.before_run()
.
from django-rq.
To add to this: worker-pool works fine as long as the worker count is only 1
E.g. This works
python ./mysite/manage.py rqworker-pool default secondary --worker-class rq.worker.SimpleWorker --settings mysite.settings.rq_settings -num-workers 1
But this fails
python ./mysite/manage.py rqworker-pool default secondary notifications --worker-class rq.worker.SimpleWorker --settings mysite.settings.rq_settings -num-workers 2
A relevant hint may be due to the operation of fork
and psycopg2 - see https://virtualandy.wordpress.com/2019/09/04/a-fix-for-operationalerror-psycopg2-operationalerror-ssl-error-decryption-failed-or-bad-record-mac/
from django-rq.
Another set of hints are found in this article
https://medium.com/@philamersune/fixing-ssl-error-decryption-failed-or-bad-record-mac-d668e71a5409
"The SSL error: decryption failed or bad record mac occurs either when the certificate is invalid or the message hash value has been tampered; in our case it’s because of the latter. Django creates a single database connection when it tries to query for the first time. Any subsequent calls to the database will use this existing connection until it is expired or closed, in which it will automatically create a new one the next time you query. The PostgreSQL engine in Django uses psycopg to talk to the database; according to the document it is level 2 thread safe. Unfortunately, the timeout() method is using multiprocessing module and therefore tampers the SSL MAC. There are different ways to fix this. We can either (1) use basic threads instead of spawning a new process or (2) use a new database connection in the timeout() method. We can also (3) scrap the timeout() method altogether and handle the async task properly via Celery."
from django-rq.
I found a solution but it's unclear how to integrate into the set of libraries since
- It only affects Django/other ORMs (not pure RQ)
- Yet the relevant code to be changed seems to be in
rq
(which should remain agnostic as this)
The change would be to modify the start-up code for each new process - i.e. worker_pool.run_worker
, which is the target
of Process
https://github.com/rq/rq/blob/3ad86083c33ec28b81a07f94dafdcf1cd56429ea/rq/worker_pool.py#L243
The change is as follows - inserting the following lines into the position above (i.e. into the start of the run_worker
function)
from django.db import connections
# another complication arises if someone is using a DB alias that is not default... I guess this would need to be configurable
connections["default"].close()
from django-rq.
Perhaps one API design solution would be to provide a way for users to override the WorkerPool class?
Another could be to provide a hook on_worker_pool_fork
that users can override.
from django-rq.
I was thinking about this a bit more and wanted to share more information:
- I wanted to confirm that the example I gave yesterday worked when I tried in a Heroku server with many rq tasks. When the DB connection is teared down in the new process (i.e. post fork), it seems Django creates a new one.
- What I don't know is whether this has any impact "upstream" to the DB connections in the main process. This would only matter if
ASYNC=False
, so might be treated as an edge case to figure out at a later stage.
As for specific code, I ran something like this. I'm sure there are better ways to reduce duplication and make better use of sub-classing etc. - but it's a POC so you understand the direction:
from multiprocessing import Process
from typing import Optional
from uuid import uuid4
from django.db import connections
from django.db.utils import load_backend
from rq.worker_pool import WorkerPool, run_worker
class Psycopg2CompatibleWorkerPool(WorkerPool):
def start_worker(
self,
count: Optional[int] = None,
burst: bool = True,
_sleep: float = 0,
logging_level: str = "INFO",
):
"""
Starts a worker and adds the data to worker_datas.
* sleep: waits for X seconds before creating worker, for testing purposes
"""
name = uuid4().hex
process = Process(
target=run_worker_with_new_db_connection,
args=(name, self._queue_names, self._connection_class, self._pool_class, self._pool_kwargs),
kwargs={
'_sleep': _sleep,
'burst': burst,
'logging_level': logging_level,
'worker_class': self.worker_class,
'job_class': self.job_class,
'serializer': self.serializer,
},
name=f'Worker {name} (WorkerPool {self.name})',
)
process.start()
worker_data = WorkerData(name=name, pid=process.pid, process=process) # type: ignore
self.worker_dict[name] = worker_data
self.log.debug('Spawned worker: %s with PID %d', name, process.pid)
def run_worker_with_new_db_connection(*args, **kwargs):
alias = "default"
connections[alias].close()
run_worker(*args, **kwargs)
from django-rq.
@jackkinsella take a look at https://github.com/rq/rq/pull/2052/files . I think having this PR merged into RQ would provide a reasonable place for django-rq and other frameworks to hook into.
from django-rq.
@selwin Would the idea be that django_rq (or individual user code) could overrideget_worker_process
and add their own implementation without needing to mess with the larger method?
from django-rq.
from django-rq.
So what would be some pseudo code if someone using the library wanted to override that function? I'm wondering if we need to provide an easy hook - e.g. some way for people to modify their Django settings file to specify a location for the override function?
from django-rq.
Well actually, I guess the Django RQ library would be the ideal place to override it. All Django users will have this problem for all DBs I believe. This is due to how SSL works after forking
from django-rq.
@jackkinsella I released RQ 1.16.1 with worker_pool.get_worker_process()
. It would be great if someone can create a PR implementing this in django-RQ.
from django-rq.
Sure, I can take this on!
from django-rq.
Added feedback here instead https://github.com/rq/django-rq/pull/655/files
from django-rq.
Related Issues (20)
- job_timeout is not killing subprocess
- Separate RQScheduler into its own fake model
- RQ Job Terminated Unexpectedly HOT 2
- Allow Specifying Default Serializer for Django-RQ Queue HOT 2
- How to run django-rq worker via Webhook/API call HOT 1
- Django dumpdata will fail because of unmanaged model HOT 1
- KeyError accessing stats page HOT 2
- Error in job_detail.html at line 226 HOT 2
- Sentinel support broken since 2.9.0 HOT 5
- Add support for floating point intervals in rescheduler command HOT 1
- ValueError: Invalid attribute name/AttributeError: module has no attribute
- keys of command in MULTI calls must be in same slot HOT 1
- TypeError in job_detail.html with Python 12 HOT 3
- RQ WorkerPool is not loading models at all
- `get_scheduler` should support a custom connection
- Tag 2.10.2 on Git HOT 2
- rqworker-pool and --with-scheduler together HOT 1
- Some additional utils
- KeyError raised on `views.stats` view when using Redis Sentinel
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from django-rq.