Comments (16)
Just as an update: I'm currently looking into recreating the MiThermometerPoller object to enforce using a new locking object.
from bt-mqtt-gateway.
I just created a new release of btlewrap v0.0.4 (incl. pypi) and requested a new release from miflora, as I do not have the permissions to do so:
basnijholt/miflora#117
from bt-mqtt-gateway.
Hi
I think its ok to disable interruptingcow inside mithermometer, as it uses its own internal timeout. I think every worker should be able to cleanup and interrupt long operation on its own. Timeout in worker_manager is just last resort to keep gateway as whole stable and running. Thats why timeout there is much bigger than in workers. That was my initial idea for design.
That deadlock, can be also tackled by recreating object somehow ? Or maybe btlewrap has still some design flaw around that locks ? I'm not a Python expert, so its hard to me to see if there is some wrong cleanup handling etc.
Anyway to sum up I'm ok with removing interruptingcow timeout from mithermostat worker, but I will keep long timeout higher in the execution chain not to freeze whole gateway, in case of some weird condition from any worker.
from bt-mqtt-gateway.
Thanks for looking into this. I agree that we should not remove the timeout within workers_manager.py.
That deadlock, can be also tackled by recreating object somehow ?
Theoretically we could recreate the object. But I'm not sure about any side effects this might have on a lower level. That lock is probably there for a reason, perhaps only one client is allowed to access the Bluetooth device in a given moment.
One idea is to recreate the object in those instances where the 35 second time limit is reached. Before the recreation we could explicitly delete the old object, which causes the cleanup routines to be executed. I'll try that and monitor the behavior. I'll report back once I have more information.
Or maybe btlewrap has still some design flaw around that locks ? I'm not a Python expert, so its hard to me to see if there is some wrong cleanup handling etc.
btlewrap has the correct cleanup mechanisms implemented, they're just not reachable. So far, I was unable to debug the exact reason why the thread is unable to be interrupted.
from bt-mqtt-gateway.
Yes only one client can access bluetooth device on linux, at given moment. In case of this gateway, its handled by gateway itself. Everything goes on the same thread using internal queue, so workers doesn't need to take care of locking bluetooth device etc.
As for workers affected by this, from what I understand its only miflora and mithermometer, as they use btlewrap. I'm using mainly thermostat worker, which uses bluepy for bluetooth handling, and there are no locks there etc. From what I've read, interruptingcow uses SIGALARM to interrupt flow, maybe such signal needs to be catched inside btlewrap ( or also other signals ? ), and also do cleanup procedure then ?
from bt-mqtt-gateway.
So far I was unable to reproduce a deadlock within over two weeks now by just removing the timeout in mithermometer.py (and miflora.py for that matter, as it also relies on btlewrap). The 35 second timeout in workers_manager.py is still in place.
Those 35 second seems to be enough time on my setup, but I also only use a single device. I'll move over my 5 miflora sensors from home-assistant to bt-mqtt-gateway and do some more testing.
As for workers affected by this, from what I understand its only miflora and mithermometer, as they use btlewrap. I'm using mainly thermostat worker, which uses bluepy for bluetooth handling, and there are no locks there etc.
I can confirm that only those workers that use btlewrap seems affected by this. The thermostat worker for my five thermostats are running fine.
From what I've read, interruptingcow uses SIGALARM to interrupt flow, maybe such signal needs to be catched inside btlewrap ( or also other signals ? ), and also do cleanup procedure then ?
Might be. I'll look into btlewrap and check on signal handling. By the way: I also tested pnpnpn/timeout-decorator which allows for a different technique. Without any luck though.
from bt-mqtt-gateway.
Found the likely cause of those deadlocks: The underlying POSIX implementation of threads on musl libc doesn't support signal interrupts for lock.acquire(). I can indeed reproduce this behavior within my Docker container with Python 3.7.2 running on Alpine Linux.
Still, running into the deadlock is only a secondary issue that occurs after reaching the timeout and aborting the executing without proper cleanup (and releasing the lock). I'll do some more investigation.
from bt-mqtt-gateway.
Those 35 second seems to be enough time on my setup, but I also only use a single device. I'll move over my 5 miflora sensors from home-assistant to bt-mqtt-gateway and do some more testing.
Been running 3 MiFlora sensors for weeks now with no issues.
from bt-mqtt-gateway.
@bbbenji What is your Python version and Linux distribution (i.e. glibc vs. musl)?
from bt-mqtt-gateway.
Linux DietPi 4.14.66+ #1 SMP PREEMPT Thu Aug 23 05:59:33 UTC 2018 armv7l GNU/Linux
Python 2.7.13
GLIBC 2.24-11+deb9u3
from bt-mqtt-gateway.
I managed to find the actual culprit in btlewrap. When a connect operation to a Bluetooth device is interrupted by any other exception than an internal BluetoothBackendException, a previously acquired lock is not released. This is generally not a problem, because this is normally the only exception type that may occur. That is not the case though, if we use interruptingcow which produces a RuntimeError by default when the timeout hits.
The uninterruptible deadlock I mentioned is only a consequence of the skipped release of the lock object. As the lock was not released the following connect operation deadlocks at lock.acquire().
So how do we fix this? As the lock object is declared at class level, we cannot just recreate the object. We have three options:
- Manually release the lock in mithermometer.py when a exception occurs, like:
_BackendConnection._lock.release()
. This is obviously dirty, as we tinker with the internal state of an external library. - Tell interruptingcow to throw a BluetoothBackendException on timeout. This is also very much the wrong way, as we would depend on a type of a transitive dependency. Some workers might not even depend on btlewrap.
- Fix btlewrap to release the lock unconditionally on exceptions, regardless of the exception type. This is the cleanest approach and in my opinion the correct way to fix this issue. This however means that we have to propose a change upstream.
I forked btlewrap and implemented the needed changes. I also forced pip to use my fork of btlewrap. So far it works without any issues.
from bt-mqtt-gateway.
Thanks for great detective work on this. Maybe lets wait a week, to see if upstream changes gets merged ? If not lets use your fork until things get fixed upstream.
from bt-mqtt-gateway.
@ChristianKuehnel already merged my PR 😄. Including it here is kind off a hassle though. This project doesn't depend on btlewrap directly, but through miflora and mithermometer. I'm pessimistic about pip
s ability to override the version of a transitive dependency.
Currently I do something very ugly:
pip install --no-deps https://github.com/ChristianKuehnel/btlewrap/archive/master.zip mithermometer
pip install bluepy pyserial
pip install -r requirements.txt
So the right way would be creating a release of btlewrap, bumping that dependency in miflora and mithermometer and also create releases of those, and finally include these new releases within bt-mqtt-gateway. 😓
from bt-mqtt-gateway.
Maybe it will work, if we specify forced btlewrap version in workers REQUIEREMENTS ( https://github.com/zewelor/bt-mqtt-gateway/blob/master/workers/mithermometer.py#L5 ) ?
from bt-mqtt-gateway.
Thank you @ChristianKuehnel. I requested the same for mithermometer: hobbypunk90/mithermometer#2
from bt-mqtt-gateway.
@zewelor If we end up with different versions of dependencies, pip will complain:
mithermometer 0.1.2 has requirement btlewrap==0.0.3, but you'll have btlewrap 0.0.4 which is incompatible.
miflora 0.4 has requirement btlewrap==0.0.2, but you'll have btlewrap 0.0.4 which is incompatible.
The pip installation will succeed though ...
from bt-mqtt-gateway.
Related Issues (20)
- node_id = mac.replace(":", "-") = AttributeError: 'dict' object has no attribute 'replace' HOT 1
- Does it work with ESPHome?
- Problem with negative temperature / signed values HOT 1
- Gateway not starting as systemctl service HOT 2
- solved: Type error HOT 1
- hardware compatibility HOT 13
- Clarify why this needs to be run in a virtual environment HOT 1
- null values from LYWSD03MMC HOT 5
- blescanmulti worker not publishing to MQTT
- Systemd Installation like in README does not work, service.sh might have superfluous sudo
- Error message if bt-mqtt-gateway is started as home assistant addon HOT 8
- Support for MQTT climate hold and away modes has been removed in HA. HOT 2
- Missing MQTT messages HOT 2
- Anyway to setup on qnap?
- Docker configure.yaml HOT 1
- maybe use eq3btsmart to modernize eq3 thermostat handling? HOT 2
- 100% cpu load from bluepy-helper HOT 2
- Linak Desk Set commands
- Please, Delete this issue
- README venv installation instructions are incorrect
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bt-mqtt-gateway.