Giter Club home page Giter Club logo

Comments (16)

cybe avatar cybe commented on July 3, 2024 1

Just as an update: I'm currently looking into recreating the MiThermometerPoller object to enforce using a new locking object.

from bt-mqtt-gateway.

ChristianKuehnel avatar ChristianKuehnel commented on July 3, 2024 1

I just created a new release of btlewrap v0.0.4 (incl. pypi) and requested a new release from miflora, as I do not have the permissions to do so:
basnijholt/miflora#117

from bt-mqtt-gateway.

zewelor avatar zewelor commented on July 3, 2024

Hi

I think its ok to disable interruptingcow inside mithermometer, as it uses its own internal timeout. I think every worker should be able to cleanup and interrupt long operation on its own. Timeout in worker_manager is just last resort to keep gateway as whole stable and running. Thats why timeout there is much bigger than in workers. That was my initial idea for design.

That deadlock, can be also tackled by recreating object somehow ? Or maybe btlewrap has still some design flaw around that locks ? I'm not a Python expert, so its hard to me to see if there is some wrong cleanup handling etc.

Anyway to sum up I'm ok with removing interruptingcow timeout from mithermostat worker, but I will keep long timeout higher in the execution chain not to freeze whole gateway, in case of some weird condition from any worker.

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

Thanks for looking into this. I agree that we should not remove the timeout within workers_manager.py.

That deadlock, can be also tackled by recreating object somehow ?

Theoretically we could recreate the object. But I'm not sure about any side effects this might have on a lower level. That lock is probably there for a reason, perhaps only one client is allowed to access the Bluetooth device in a given moment.

One idea is to recreate the object in those instances where the 35 second time limit is reached. Before the recreation we could explicitly delete the old object, which causes the cleanup routines to be executed. I'll try that and monitor the behavior. I'll report back once I have more information.

Or maybe btlewrap has still some design flaw around that locks ? I'm not a Python expert, so its hard to me to see if there is some wrong cleanup handling etc.

btlewrap has the correct cleanup mechanisms implemented, they're just not reachable. So far, I was unable to debug the exact reason why the thread is unable to be interrupted.

from bt-mqtt-gateway.

zewelor avatar zewelor commented on July 3, 2024

Yes only one client can access bluetooth device on linux, at given moment. In case of this gateway, its handled by gateway itself. Everything goes on the same thread using internal queue, so workers doesn't need to take care of locking bluetooth device etc.

As for workers affected by this, from what I understand its only miflora and mithermometer, as they use btlewrap. I'm using mainly thermostat worker, which uses bluepy for bluetooth handling, and there are no locks there etc. From what I've read, interruptingcow uses SIGALARM to interrupt flow, maybe such signal needs to be catched inside btlewrap ( or also other signals ? ), and also do cleanup procedure then ?

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

So far I was unable to reproduce a deadlock within over two weeks now by just removing the timeout in mithermometer.py (and miflora.py for that matter, as it also relies on btlewrap). The 35 second timeout in workers_manager.py is still in place.

Those 35 second seems to be enough time on my setup, but I also only use a single device. I'll move over my 5 miflora sensors from home-assistant to bt-mqtt-gateway and do some more testing.

As for workers affected by this, from what I understand its only miflora and mithermometer, as they use btlewrap. I'm using mainly thermostat worker, which uses bluepy for bluetooth handling, and there are no locks there etc.

I can confirm that only those workers that use btlewrap seems affected by this. The thermostat worker for my five thermostats are running fine.

From what I've read, interruptingcow uses SIGALARM to interrupt flow, maybe such signal needs to be catched inside btlewrap ( or also other signals ? ), and also do cleanup procedure then ?

Might be. I'll look into btlewrap and check on signal handling. By the way: I also tested pnpnpn/timeout-decorator which allows for a different technique. Without any luck though.

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

Found the likely cause of those deadlocks: The underlying POSIX implementation of threads on musl libc doesn't support signal interrupts for lock.acquire(). I can indeed reproduce this behavior within my Docker container with Python 3.7.2 running on Alpine Linux.

Still, running into the deadlock is only a secondary issue that occurs after reaching the timeout and aborting the executing without proper cleanup (and releasing the lock). I'll do some more investigation.

from bt-mqtt-gateway.

bbbenji avatar bbbenji commented on July 3, 2024

Those 35 second seems to be enough time on my setup, but I also only use a single device. I'll move over my 5 miflora sensors from home-assistant to bt-mqtt-gateway and do some more testing.

Been running 3 MiFlora sensors for weeks now with no issues.

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

@bbbenji What is your Python version and Linux distribution (i.e. glibc vs. musl)?

from bt-mqtt-gateway.

bbbenji avatar bbbenji commented on July 3, 2024

Linux DietPi 4.14.66+ #1 SMP PREEMPT Thu Aug 23 05:59:33 UTC 2018 armv7l GNU/Linux
Python 2.7.13
GLIBC 2.24-11+deb9u3

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

I managed to find the actual culprit in btlewrap. When a connect operation to a Bluetooth device is interrupted by any other exception than an internal BluetoothBackendException, a previously acquired lock is not released. This is generally not a problem, because this is normally the only exception type that may occur. That is not the case though, if we use interruptingcow which produces a RuntimeError by default when the timeout hits.

The uninterruptible deadlock I mentioned is only a consequence of the skipped release of the lock object. As the lock was not released the following connect operation deadlocks at lock.acquire().

So how do we fix this? As the lock object is declared at class level, we cannot just recreate the object. We have three options:

  1. Manually release the lock in mithermometer.py when a exception occurs, like: _BackendConnection._lock.release(). This is obviously dirty, as we tinker with the internal state of an external library.
  2. Tell interruptingcow to throw a BluetoothBackendException on timeout. This is also very much the wrong way, as we would depend on a type of a transitive dependency. Some workers might not even depend on btlewrap.
  3. Fix btlewrap to release the lock unconditionally on exceptions, regardless of the exception type. This is the cleanest approach and in my opinion the correct way to fix this issue. This however means that we have to propose a change upstream.

I forked btlewrap and implemented the needed changes. I also forced pip to use my fork of btlewrap. So far it works without any issues.

from bt-mqtt-gateway.

zewelor avatar zewelor commented on July 3, 2024

Thanks for great detective work on this. Maybe lets wait a week, to see if upstream changes gets merged ? If not lets use your fork until things get fixed upstream.

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

@ChristianKuehnel already merged my PR 😄. Including it here is kind off a hassle though. This project doesn't depend on btlewrap directly, but through miflora and mithermometer. I'm pessimistic about pips ability to override the version of a transitive dependency.

Currently I do something very ugly:

pip install --no-deps https://github.com/ChristianKuehnel/btlewrap/archive/master.zip mithermometer
pip install bluepy pyserial
pip install -r requirements.txt

So the right way would be creating a release of btlewrap, bumping that dependency in miflora and mithermometer and also create releases of those, and finally include these new releases within bt-mqtt-gateway. 😓

from bt-mqtt-gateway.

zewelor avatar zewelor commented on July 3, 2024

Maybe it will work, if we specify forced btlewrap version in workers REQUIEREMENTS ( https://github.com/zewelor/bt-mqtt-gateway/blob/master/workers/mithermometer.py#L5 ) ?

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

Thank you @ChristianKuehnel. I requested the same for mithermometer: hobbypunk90/mithermometer#2

from bt-mqtt-gateway.

cybe avatar cybe commented on July 3, 2024

@zewelor If we end up with different versions of dependencies, pip will complain:

mithermometer 0.1.2 has requirement btlewrap==0.0.3, but you'll have btlewrap 0.0.4 which is incompatible.
miflora 0.4 has requirement btlewrap==0.0.2, but you'll have btlewrap 0.0.4 which is incompatible.

The pip installation will succeed though ...

from bt-mqtt-gateway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.