Giter Club home page Giter Club logo

Comments (73)

technyon avatar technyon commented on May 29, 2024 1

Thanks, I'll prepare a new Release with core 2.0.5. Interesting there's still that much to fix about Wifi, considering it's one of the main features of the chip.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024 1

ok, thanks for testing. I'd like to release since it seems to improve WiFi issues, let's just make sure it doesn't make things worse.

Yes an automatic restart from HA is possible of course, it should work in the meantime. It would still be better to have the firmware do it, it can probably detect it a bit faster and more reliable, and makes the feature independent from the frontend used.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024 1

I think I didn't change it. I'll just use incremental numbers for this from now on to make it easier.

I've recompiled with an updated version string:

5.15_core-2.05_nimble-1.40#01

nuki_hub#01.zip

from nuki_hub.

technyon avatar technyon commented on May 29, 2024 1

I'm logging now and I don't any drops in bluetooth. I'm a using LAN module, so maybe this is some interaction between bluetooth and Wifi? They use the same antenna afterall.

from nuki_hub.

alexdelprete avatar alexdelprete commented on May 29, 2024 1

I added the uptime to my Mushroom template card (thanks @Mincka):

image

While I was editing, I noticed it rebooted (after 800+ minutes) because it started from 1. I will monitor restarts by watching the uptime sensor and send a notification.

image

from nuki_hub.

technyon avatar technyon commented on May 29, 2024 1

I think we fixed the problems mentioned here, I'll close for now.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Sorry I'm not really sure what to do about this, and the ticket is open for a long time. Is this kind of behavior still happening? A new Arduino Core for ESP32 was just released (v2.0.5), which includes fixes from Espressif. This may or may not help.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

I have not noticed it again, but I've since changed my restart settings:
8F41A279-61D6-404D-A0BD-255646BA2C22

What might help for diagnosing this and similar issues would be some way to extract logging information remotely (either ex-post for the last n lines or, more likely, by supporting some kind of remote console).

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

I've updated the core to 2.0.5, could you give the attached binary a try?

For fixes in 2.0.5 see here:
https://github.com/espressif/arduino-esp32/releases/tag/2.0.5

nuki_hub_core_2.0.5.zip

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

I can try the binary, but I'm not sure if I can really test anything (except wait and see if the behavior comes back).

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Since updating the firmware, my Mosquitto log has quite a lot of malformed packet errors (unfortunately, I don't have a log from before, but I never noticed them before, certainly not in such a large number).

2022-09-17 22:55:11: Client <nukihost> disconnected due to malformed packet.
2022-09-17 22:55:11: New connection from <ip>:52751 on port <port>.
2022-09-17 22:55:11: New client connected from <ip>:52751 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-17 23:00:38: Saving in-memory database to /data//mosquitto.db.
2022-09-17 23:30:39: Saving in-memory database to /data//mosquitto.db.
2022-09-18 00:00:40: Saving in-memory database to /data//mosquitto.db.
2022-09-18 00:14:08: Client <nukihost> disconnected due to malformed packet.
2022-09-18 00:14:08: New connection from <ip>:52752 on port <port>.
2022-09-18 00:14:08: New client connected from <ip>:52752 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 00:14:11: Client <nukihost> disconnected due to malformed packet.
2022-09-18 00:14:11: New connection from <ip>:52753 on port <port>.
2022-09-18 00:14:11: New client connected from <ip>:52753 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 00:19:38: Client <nukihost> disconnected due to malformed packet.
2022-09-18 00:19:38: New connection from <ip>:52754 on port <port>.
2022-09-18 00:19:38: New client connected from <ip>:52754 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 00:30:41: Saving in-memory database to /data//mosquitto.db.
2022-09-18 01:00:42: Saving in-memory database to /data//mosquitto.db.
2022-09-18 01:08:02: Bad client <nukihost> sending multiple CONNECT messages.
2022-09-18 01:08:02: Client <nukihost> disconnected due to protocol error.
2022-09-18 01:08:02: New connection from <ip>:52755 on port <port>.
2022-09-18 01:08:02: New client connected from <ip>:52755 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 01:30:43: Saving in-memory database to /data//mosquitto.db.
2022-09-18 01:56:59: Client <nukihost> disconnected due to protocol error.
2022-09-18 01:56:59: New connection from <ip>:52756 on port <port>.
2022-09-18 01:56:59: New client connected from <ip>:52756 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 01:58:02: Client <nukihost> disconnected due to malformed packet.
2022-09-18 01:58:02: New connection from <ip>:52757 on port <port>.
2022-09-18 01:58:02: New client connected from <ip>:52757 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 02:00:44: Saving in-memory database to /data//mosquitto.db.
2022-09-18 02:27:14: Client <nukihost> disconnected due to malformed packet.
2022-09-18 02:27:14: New connection from <ip>:52758 on port <port>.
2022-09-18 02:27:14: New client connected from <ip>:52758 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 02:30:45: Saving in-memory database to /data//mosquitto.db.
2022-09-18 03:00:46: Saving in-memory database to /data//mosquitto.db.
2022-09-18 03:11:20: Client <nukihost> disconnected due to protocol error.
2022-09-18 03:11:21: New connection from <ip>:52759 on port <port>.
2022-09-18 03:11:21: New client connected from <ip>:52759 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 03:24:20: Client <nukihost> disconnected due to malformed packet.
2022-09-18 03:24:21: New connection from <ip>:52760 on port <port>.
2022-09-18 03:24:21: New client connected from <ip>:52760 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 03:29:26: Client <nukihost> disconnected due to malformed packet.
2022-09-18 03:29:27: New connection from <ip>:52761 on port <port>.
2022-09-18 03:29:27: New client connected from <ip>:52761 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 03:30:47: Saving in-memory database to /data//mosquitto.db.
2022-09-18 03:56:29: Client <nukihost> disconnected due to malformed packet.
2022-09-18 03:56:29: New connection from <ip>:52762 on port <port>.
2022-09-18 03:56:29: New client connected from <ip>:52762 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 04:00:48: Saving in-memory database to /data//mosquitto.db.
2022-09-18 04:12:29: Client <nukihost> disconnected due to malformed packet.
2022-09-18 04:12:29: New connection from <ip>:52763 on port <port>.
2022-09-18 04:12:29: New client connected from <ip>:52763 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 04:30:49: Saving in-memory database to /data//mosquitto.db.
2022-09-18 04:45:32: Client <nukihost> disconnected due to protocol error.
2022-09-18 04:45:32: New connection from <ip>:52764 on port <port>.
2022-09-18 04:45:32: New client connected from <ip>:52764 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 05:00:50: Saving in-memory database to /data//mosquitto.db.
2022-09-18 05:30:51: Saving in-memory database to /data//mosquitto.db.
2022-09-18 06:00:52: Saving in-memory database to /data//mosquitto.db.
2022-09-18 06:29:08: Client <nukihost> disconnected due to malformed packet.
2022-09-18 06:29:09: New connection from <ip>:52765 on port <port>.
2022-09-18 06:29:09: New client connected from <ip>:52765 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 06:30:53: Saving in-memory database to /data//mosquitto.db.
2022-09-18 07:00:54: Saving in-memory database to /data//mosquitto.db.
2022-09-18 07:08:35: Client <nukihost> disconnected due to malformed packet.
2022-09-18 07:08:36: New connection from <ip>:52766 on port <port>.
2022-09-18 07:08:36: New client connected from <ip>:52766 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 07:30:55: Saving in-memory database to /data//mosquitto.db.
2022-09-18 08:00:56: Saving in-memory database to /data//mosquitto.db.
2022-09-18 08:27:41: Client <nukihost> disconnected due to malformed packet.
2022-09-18 08:27:42: New connection from <ip>:52767 on port <port>.
2022-09-18 08:27:42: New client connected from <ip>:52767 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 08:29:51: Client <nukihost> disconnected due to malformed packet.
2022-09-18 08:29:51: New connection from <ip>:52768 on port <port>.
2022-09-18 08:29:51: New client connected from <ip>:52768 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 08:30:57: Saving in-memory database to /data//mosquitto.db.
2022-09-18 08:39:51: Warning: Received PUBREC from <nukihost> for an unknown packet identifier 12356.
2022-09-18 08:39:51: Bad socket read/write on client <nukihost>: Invalid function arguments provided.
2022-09-18 08:39:51: New connection from <ip>:52769 on port <port>.
2022-09-18 08:39:51: New client connected from <ip>:52769 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 09:00:58: Saving in-memory database to /data//mosquitto.db.
2022-09-18 09:27:33: Client <nukihost> disconnected due to malformed packet.
2022-09-18 09:27:33: New connection from <ip>:52770 on port <port>.
2022-09-18 09:27:33: New client connected from <ip>:52770 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 09:30:59: Saving in-memory database to /data//mosquitto.db.
2022-09-18 09:44:42: Client <nukihost> disconnected due to malformed packet.

(No functional problem, yet. I've no set the restart timeout to -1 again, so let's see what's happening in the next few days.)

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Thanks. Just to be sure, could you get a log with the released firmware? If this does indeed happen with with the new core, I'd postpone updating.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Sure. Is it possible that there are underlying problems that always existed but only with the new core manifest as explicit errors?

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Just to make sure: Should this test release have different version number or just 5.15? (If it should show a different name, I've probably been on stock all along.)

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

It's still 5.15, but I added -core-v2.0.5.

Regarding the previous question, only Espressif knows.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

OK, so the errors were from stock 5.15. I'll try flashing the new binary again.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Huh, so I have hard time doing an OTA. Somehow, it appears the network stack is wonky (or my ESP32 has decided it is a good time to die). Ping veers into several seconds every few packets, and sometimes even times out even when in the same room as the router. I've skipped several version (roughl 5.9 to 5.14 IIRC, but I'm not completely sure which was the earliest). Before 5.15 OTA worked fine (after you cleared up the initial issues with credentials).

Anyway, how can I install the test version via esptool.py, can I just use the other .bin files from the normal release?

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Yes, the other files didn't change. Just replace nuki_hub.bin and that's it.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

P.S.: Do you also have network problems with the "official" 5.15 ?

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

P.S.: Do you also have network problems with the "official" 5.15 ?

Everything above was with the official release, before updating to the new core. Since the update, the Mosquitto log still looks the same (e.g. frequent protocol errors), but the integration seems to work fine. Bluetooth and WiFi RSSI seem to be much more consistent than before. I'll do a ping test and post the logs later.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Much better ping after flashing 5.15#core_2.0.5 (still dropping a packet here and there, but about 10 times less and ping is mostly und 100 ms instead of 2 to 5 seconds).

Here's the current Mosquitto log:

2022-09-19 03:53:01: New connection from <ip>:58404 on port <port>.
2022-09-19 03:53:01: New client connected from <ip>:58404 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 04:01:36: Saving in-memory database to /data//mosquitto.db.
2022-09-19 04:04:15: Client <nukihost> disconnected due to malformed packet.
2022-09-19 04:04:16: New connection from <ip>:58405 on port <port>.
2022-09-19 04:04:16: New client connected from <ip>:58405 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 04:26:07: Client <nukihost> disconnected due to malformed packet.
2022-09-19 04:26:07: New connection from <ip>:58406 on port <port>.
2022-09-19 04:26:07: New client connected from <ip>:58406 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 04:31:37: Saving in-memory database to /data//mosquitto.db.
2022-09-19 05:01:38: Saving in-memory database to /data//mosquitto.db.
2022-09-19 05:31:39: Saving in-memory database to /data//mosquitto.db.
2022-09-19 06:01:40: Saving in-memory database to /data//mosquitto.db.
2022-09-19 06:31:41: Saving in-memory database to /data//mosquitto.db.
2022-09-19 06:58:22: Client <nukihost> disconnected due to malformed packet.
2022-09-19 06:58:22: New connection from <ip>:58407 on port <port>.
2022-09-19 06:58:22: New client connected from <ip>:58407 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 06:59:13: Client <nukihost> disconnected due to protocol error.
2022-09-19 06:59:13: New connection from <ip>:58408 on port <port>.
2022-09-19 06:59:13: New client connected from <ip>:58408 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 07:01:42: Saving in-memory database to /data//mosquitto.db.
2022-09-19 07:04:31: Client <nukihost> disconnected due to malformed packet.
2022-09-19 07:04:31: New connection from <ip>:58409 on port <port>.
2022-09-19 07:04:31: New client connected from <ip>:58409 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 07:31:43: Saving in-memory database to /data//mosquitto.db.
2022-09-19 08:01:44: Saving in-memory database to /data//mosquitto.db.
2022-09-19 08:24:58: Invalid QoS in PUBLISH from <nukihost>, disconnecting.
2022-09-19 08:24:58: Client <nukihost> disconnected due to malformed packet.
2022-09-19 08:24:58: New connection from <ip>:58410 on port <port>.
2022-09-19 08:24:58: New client connected from <ip>:58410 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 08:31:45: Saving in-memory database to /data//mosquitto.db.
2022-09-19 09:01:46: Saving in-memory database to /data//mosquitto.db.
2022-09-19 09:26:43: Client <nukihost> disconnected due to malformed packet.
2022-09-19 09:26:43: New connection from <ip>:58411 on port <port>.
2022-09-19 09:26:43: New client connected from <ip>:58411 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 09:31:47: Saving in-memory database to /data//mosquitto.db.
2022-09-19 10:01:48: Saving in-memory database to /data//mosquitto.db.
2022-09-19 10:31:49: Saving in-memory database to /data//mosquitto.db.
2022-09-19 10:34:34: Client <nukihost> disconnected due to malformed packet.
2022-09-19 10:34:34: New connection from <ip>:58412 on port <port>.
2022-09-19 10:34:34: New client connected from <ip>:58412 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 10:37:34: Client <nukihost> disconnected due to malformed packet.
2022-09-19 10:37:34: New connection from <ip>:58413 on port <port>.
2022-09-19 10:37:34: New client connected from <ip>:58413 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:01:50: Saving in-memory database to /data//mosquitto.db.
2022-09-19 11:24:46: Client <nukihost> disconnected due to protocol error.
2022-09-19 11:24:46: New connection from <ip>:58414 on port <port>.
2022-09-19 11:24:46: New client connected from <ip>:58414 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:31:51: Saving in-memory database to /data//mosquitto.db.
2022-09-19 11:40:13: Client <nukihost> disconnected due to malformed packet.
2022-09-19 11:40:13: New connection from <ip>:58415 on port <port>.
2022-09-19 11:40:13: New client connected from <ip>:58415 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:56:01: Client <nukihost> disconnected due to malformed packet.
2022-09-19 11:56:01: New connection from <ip>:58416 on port <port>.
2022-09-19 11:56:01: New client connected from <ip>:58416 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:56:04: Client <nukihost> disconnected due to malformed packet.
2022-09-19 11:56:04: New connection from <ip>:58417 on port <port>.
2022-09-19 11:56:04: New client connected from <ip>:58417 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 12:01:52: Saving in-memory database to /data//mosquitto.db.
2022-09-19 12:31:53: Saving in-memory database to /data//mosquitto.db.
2022-09-19 13:01:54: Saving in-memory database to /data//mosquitto.db.
2022-09-19 13:31:55: Saving in-memory database to /data//mosquitto.db.
2022-09-19 13:34:19: Client <nukihost> disconnected due to malformed packet.
2022-09-19 13:34:19: New connection from <ip>:58418 on port <port>.
2022-09-19 13:34:19: New client connected from <ip>:58418 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 13:56:08: Client <nukihost> disconnected due to malformed packet.
2022-09-19 13:56:08: New connection from <ip>:58419 on port <port>.
2022-09-19 13:56:08: New client connected from <ip>:58419 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 14:01:56: Saving in-memory database to /data//mosquitto.db.
2022-09-19 14:06:39: Client <nukihost> disconnected due to malformed packet.
2022-09-19 14:06:39: New connection from <ip>:58420 on port <port>.
2022-09-19 14:06:39: New client connected from <ip>:58420 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 14:31:57: Saving in-memory database to /data//mosquitto.db.
2022-09-19 14:39:15: Client <nukihost> disconnected due to malformed packet.
2022-09-19 14:39:15: New connection from <ip>:58421 on port <port>.
2022-09-19 14:39:15: New client connected from <ip>:58421 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 15:01:58: Saving in-memory database to /data//mosquitto.db.
2022-09-19 15:03:01: Client <nukihost> disconnected due to malformed packet.
2022-09-19 15:03:01: New connection from <ip>:58422 on port <port>.
2022-09-19 15:03:01: New client connected from <ip>:58422 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 15:31:59: Saving in-memory database to /data//mosquitto.db.
2022-09-19 15:39:13: Client <nukihost> disconnected due to malformed packet.
2022-09-19 15:39:13: New connection from <ip>:58423 on port <port>.
2022-09-19 15:39:13: New client connected from <ip>:58423 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 16:02:00: Saving in-memory database to /data//mosquitto.db.
2022-09-19 16:32:01: Saving in-memory database to /data//mosquitto.db.
2022-09-19 17:02:02: Saving in-memory database to /data//mosquitto.db.
2022-09-19 17:02:34: Client <nukihost> disconnected due to malformed packet.
2022-09-19 17:02:34: New connection from <ip>:58424 on port <port>.
2022-09-19 17:02:34: New client connected from <ip>:58424 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 17:05:58: Client <nukihost> disconnected due to malformed packet.
2022-09-19 17:05:58: New connection from <ip>:58425 on port <port>.
2022-09-19 17:05:58: New client connected from <ip>:58425 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 17:22:34: Client <nukihost> disconnected due to malformed packet.

No real change here. I am assuming the default log level of the Mosquitto container changed during a recent update, so that might explain why those malformed packets weren't there when I checked the log with earlier versions.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

So this is the Bluetooth signal of the opener from yesterday till now. The huge drop in the middle is when I took the ESP32 elsewhere for flashing.

image

Not sure about the small drop this afternoon. It coincides when a member of the household arrived home, but I don't have an explanation for the change.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

I think I've found the reason for the unresponsive device and it has nothing to do with either WiFi connectivity or MQTT. This diagram shows the Bluetooth RSSI for both the Lock and the Opener:
image

The horizontal line between roughly 6:42 and 9:55 is the same for both devices - there were no new measurements coming in via MQTT, but in this diagram, adding WiFi RSSI you can see that device was always online.

image

Apparently, the BT stack got into a weird state and stopped reporting anything until some kind of reset fixed things. (Probably a reboot of the ESP32, it's hard to tell from the graph, but there is a one minute flatline on the WiFi just before BT reporting resumes.)

image

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Very strange. Is this the old or the new core?

As a workaround I could reset the ESP when no beacons are received and thus the bluetooth RSSI isn't updated, but I don't really like such workarounds.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

P.S.: Might be worth trying to update the ble lib:

https://github.com/h2zero/NimBLE-Arduino

There's a new release from end of june

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Very strange. Is this the old or the new core?

As a workaround I could reset the ESP when no beacons are received and thus the bluetooth RSSI isn't updated, but I don't really like such workarounds.

The graph is from the new core, but I'm by now 100 % certain that that's what happened before. I found a similar episode in the old logs from a few weeks ago and it happened again today.

Updating the BLE library certainly sounds like a good idea, but so does a failsafe trigger similar the network timers. I agree that finding the root cause would be better, but that may not be possible without some extensive long-time logging (which is difficult in operating conditions - though it might be possible if you could implement a wireless way to access the logs).

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Hi,

sorry I was rather busy the last few days. I'll look into it, let's first try updating the bluetooth library.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

The attached binary is compiled with

  • core 2.05
  • nimble 1.40

Could you give it another try? If this doesn't fix it, I'll add another restart option ... and yes remote logging would help indeed.

nuki_hub_nimble-1.40.zip

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

I've updated the ESP using the new binary (had some issues with it - seems the OTA is not as smooth at was before for me, feels a bit like before the issue with credentials was fixed. (There's a chance it correlates with the device switching to a "bad" access point and sticking to it between reboots, and it's not since updating to the original 5.15).

I will monitor the issue, but it will take at least three days to have some confidence it's not occurring anymore. I've added an automation to HA to trigger a restart via MQTT if the RSSI value for both Lock and Opener does not change for over 60 seconds. (The automation also logs an error and sends me a notification.)

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Unfortunately, I can already tell you that the new library does not help (and that my automation also has some kind of error because it did not trigger). Had a BT dropout during the night.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Since I don't really have an explanation, let's try the automatic reboot. The attached binary restarts when no beacons have been received for ~2 minutes. Can you check if it improves things? If yes, I'd make the timeout configurable.

nuki_hub_ble_reboot.zip

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Thanks. What's the version string supposed to be on that one (to make sure the flashing took)?

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Well, weirdly enough the drop out last night was NOT the same as the previous BT dropouts. Instead, there apparently where no MQTT changes transmitted to HA (i.e. the WiFi RSSI also dropped out). I don't see anything in the router logs to suggest that WiFi actually dropped out (there are two reconnections at the beginning and the end of the four hour drop out period, and a DHCP request in the middle, so the network must have been up). Unfortunately, the Mosquitto logs have already rolled over.

Anyway, I'll monitor this version during the next week, let's see if there are any more (non-WiFi/MQTT) drop-outs.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

I'm logging Bluetooth RSSI too now, so far no issues. I've nevertheless released Version 6.0 with the libs.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

I'm logging now and I don't any drops in bluetooth. I'm a using LAN module, so maybe this is some interaction between bluetooth and Wifi? They use the same antenna afterall.

I was too hasty. There was a dropout again yesterday for about 15 minutes (and my automation trigger is still not working 🥲).

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

So if it dropped for 15 minutes the automatic restart doesn't work either?

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Is the automatic restart also in 6.0? I switched to that version after it was released.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

ok I see. No only in the previous 5.15 for testing

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

ok I see. No only in the previous 5.15 for testing

So as an update, I've fixed my automation (a stray " in the trigger template) which now works very nicely. I've set the timeout to 1 minute, which is about twice the longest "natural" time between RSSI updates I've seen in my logs.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Just to confirm, you reset after not receiving any bluetooth RSSI updates?

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Just to confirm, you reset after not receiving any bluetooth RSSI updates?

Yes, I reset after not receiving BT RSSI updates for either the opener or the lock (though one would suffice) for longer than 60 seconds. This is my automation:

trigger:
  - platform: template
    value_template: >-
      {{ as_timestamp( now() ) - as_timestamp(
      states.sensor.nukilock_bluetooth_signal_strength.last_changed )  > 60 }}
  - platform: template
    value_template: >-
      {{ as_timestamp( now() ) - as_timestamp(
      states.sensor.nukiopener_bluetooth_signal_strength.last_changed ) > 60 }}
condition: []
action:
  - service: system_log.write
    data:
      logger: nukihub
      message: No BLE beacons detected for over a minute. Restarting.
  - service: mqtt.publish
    data:
      topic: nuki/maintenance/reset
      payload: "1"
      qos: "2"
  - service: notify.mobile_app_myphone
    data:
      message: NukiHub neu gestartet.
mode: single

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

I'll add it to the firmware then.

from nuki_hub.

adamlc avatar adamlc commented on May 29, 2024

I am also having the exact same issue described here, I'm going to try the above automation for now to see if it helps 👍

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Hi ... any updates on this? Have you tried the new firmware with updated libraries?

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

If that's directed at me, I am still waiting for the automatic restart feature to be added to the firmware (as of #28 (comment)). I am currently on version 6.3, but the automation is still needed to restart the device every few days at least (sometimes more often).

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Sorry I completely lost track of that. I'll look into it.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Pleases check release 6.4, you can set the timeout under NUKI configuration.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Thx, I've updated and set the timeout. Will monitor the ESPs behavior for the next week or so.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

I use the presence topic in HA and I monitor the updates of the presence topic in HA like this:

mqtt:
  device_tracker:
    - name: "Beacon Julien"
      state_topic: "nuki_entree/presence/devices"
      value_template: "{{ 'home' if 'xx:xx:xx:xx:xx:xx' in value else 'not_home' }}"
alias: ESP32 Presence not updating
trigger:
  - platform: template
    value_template: >-
      {{ (now() - states.device_tracker.beacon_julien.last_changed).seconds >
      120}}
condition: []
action:
  - service: notify.mobile_app_telephone_julien
    data:
      message: ESP32 Presence is not updating
mode: single

Usually, there's at least an update every 5 seconds but sometimes, it just stops.

Despite having the new option "Restart if bluetooth beacons not received" set to 30 seconds, I still get notified every few hours that the presence topic was not updated for more than 120 seconds. However, the updates come back few seconds later. I'm going to increase my notification period to 180 seconds.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Pleases check release 6.4, you can set the timeout under NUKI configuration.

Some results from the field test (I've set a BT timeout of 40 seconds on the ESP and my HA automation timeout to 120 seconds to prevent "double firing"):

  • The HA automation hasn't been triggered except once (and that was when my Opener stopped responding to BT again).
  • When a device stops responding to the BT API (confirmed to happen with the official app as well), the ESP will basically enter into a boot loop.
  • To allow for reconfiguration, in such cases, the boot loop should be detected and further reboots be prevented.
  • It would be good if some diagnostic topic could be set (allowing me to send a notification that e.g. the Opener needs to be reset by removing its batteries) if the ESP is unable to read the device state several times in a row.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

In my case, I have "Restart if bluetooth beacons not received" set to 30 seconds and notification + MQTT reset (thanks to your code @mundschenk-at) is there's no update of Presence sensor for 60 seconds.

I get notifications every few hours (between 2 and 6 I would say). The ESP seems more resilient when adding with MQTT reset triggered from HA. Before that, I needed to unplug and plug it again (was it stuck in what you call "boot loop"?).

Since I get the notification after 60 seconds without updates, it means the ESP was not rebooted after 30 seconds. I think the internal check of nuki_hub for BT issues may not working as expected.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

Right now, I have two unresponsive devices. I published on the maintenance/reset topic but I can see remotely that the presence or RSSI does not update at all. Is it in "boot loop"? I don't know what's causing this.

@mundschenk-at : In your case, do you mean at some point, and for unknown reasons, the Nuki device will stop reporting over BT to the ESP, and this is this behavior that is causing the ESP to "crash"?

What's your recommended approach for now? Don't trust capability of nuki_hub to recover from connection dropouts and manage the restart using the maintenance topic?
So, in nuki_hub:

  1. Disable periodic restart
  2. Disable restart if BT drop in nuki_hub

Anyway, if some behavior of the Nuki device leads to nuki_hub ESP going in boot loop, it's already too late to recover. The best solution would be that nuki_hub is sufficiently resilient to recover from connection dropouts.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

@Mincka There are two issues I think, namely that sometimes the ESP BT stack gives up and can only be recovered by a reboot. The other appears to be a firmware bug in the latest Opener FW that makes the device lock up (coincidentally also needing a reboot, which can only be triggered by removing the batteries). The ESP BT stack issue can be mitigated using the new BT watchdog setting, but a boot loop prevention should be added to the Nuki_Hub firmware to make it more tolerant of missing devices.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

I don't think this issue is only related to the Opener. I don't have the Opener but my ESPs become unresponsive after few hours and they cannot recover with Restart on disconnect + Restart timer + Restart when no beacon update.

Actually, I came back from home and I just confirmed that I cannot connect to the devices over Wi-Fi. I need to unplug them. Either there are in a boot loop or maybe they just cannot reconnect to the Wi-Fi. I will try to disable Wi-Fi for a few minutes to see if that's causing this. I will also try to monitor the serial output.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

I don't think this issue is only related to the Opener. I don't have the Opener but my ESPs become unresponsive after few hours and they cannot recover with Restart on disconnect + Restart timer + Restart when no beacon update.

You are right, the issue with the ESP BT stack is not related to the Opener. I didn't intend to imply that it is, sorry if I was not clear about this.

What I meant is that the mitigation via "restart if no BT packets are received in X second" can currently trigger a boot loop if a paired device does not respond.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

I just fully reinstall one ESP and during the setup I had an issue with the ESP falling back to the Wi-Fi Manager because it did not manage to join the already configured Wi-Fi for some reason. I realized this could also be something silly like this in my case. The ESP would not be in a boot loop but maybe stuck on the Wi-Fi Manager page... Next time, I will check if the ESP Access Point SSID is up. I don't know if an already configured ESP will permanently retry to connect and reboot when the home Wi-Fi is already configured.

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

If WifiManager can't connect to the Wifi, it will reopen the ESP access point so you can reconfigure your Wifi. There's a timeout for the Wifi config portal, but it's several minutes, so there's plenty of time to fix the wifi

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

The boot loop should be fixed in the latest release.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

I'm on 6.10 with serial attached. The ESP stopped updating presence at some point yesterday. It was unresponsive, status not updated, did not respond to commands.
image
image

I had a look at the logs on the console, only the "usual" error messages:
image

The only thing I did was closing the serial console and connect again in the browser. It seemed to trigger a reboot of the ESP since I got logs and few seconds after, Presence started updated again.

image

My options:
image
image

I would hope that either Network restart or BT restart would be triggered but it just stayed stuck.
It seems that the ESP crashed 27 minutes after its setup.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Since the WiFi RSSI stopped updating as well, this looks more like a network/MQTT issue than a bluetooth dropout. With a BT dropout, the WiFi connection is still there (see the second diagram in #28 (comment) for an example).

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

Indeed, you're right. I can even see in the HA log that the lacking BT updates automation was triggered at that time and the maintenance/reset topic was published on both locks but only my second lock was able to restart, not the affected one. My automation is currently basic and dumb, if lock 1 is not updated, it publishes maintenance/reset on both locks.
image

Ok, so this means I should not pollute your thread. I'll try to find another related issue or maybe create a new one.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Oh, you might get both, eventually. I've been able to prevent WiFi connection issues by using both the Network Timeout setting and an automatic reboot once per day (via Restart timer).

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

Sadly, it is still not enough in my case.
Not sure that I saw another topic describing my issue so I guess I'll create another issue. :(
I still lose the ESP despite the following options (aggressive Restart timer every 30 minutes):
image
image
image

This time, the ESP crashed 12 minutes after the boot. There is no suspicious message on the serial console other than things shown in #28 (comment)
I really don't know how to diagnose this. The ESP seems just crashed without warnings in logs in an unrecoverable state. It happens on my two ESPs (M5 Atom). No sign of Wi-Fi dropout or other things in HA logs at that time. I will try to test with another ESP model but @alexdelprete also have them and does not seem to have this issue.

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Anything on the routers logs?

from nuki_hub.

alexdelprete avatar alexdelprete commented on May 29, 2024

ESP model but @alexdelprete also have them and does not seem to have this issue.

No, I don't have this issue with the M5 Atom:

image
image

from nuki_hub.

technyon avatar technyon commented on May 29, 2024

Is your router configured to automatically change Wifi channel by any chance. I have a hunch that could trigger Wifi dropouts.

from nuki_hub.

alexdelprete avatar alexdelprete commented on May 29, 2024

Yes my APs (I have a mesh network) manage channels automatically...but I didn't experience any issues, and I have a lot of ESP devices around the house (40-45 more or less, can't remember exactly).

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

Anything on the routers logs?

Nothing special (using OpenWRT) and no other device has this kind of issue. I have around 50 devices on Wi-Fi and other ESP-based devices (Shelly first and second generation) and no issues with them.

Is your router configured to automatically change Wifi channel by any chance. I have a hunch that could trigger Wifi dropouts.

I use fixed channels per AP for 2.4 Ghz with minimum overlap of channels and also a dedicated channel for Zigbee that is far from Wi-Fi channels. No mesh, only 3 APs connected with cables.

If the issue was "only" related to network, the ESP should restart because of connection loss OR restart at some point with the Restart timer set to 30 minutes. But nothing happens for hours . I also checked if the ESP was in AP mode but no trace of its Wi-Fi SSID.

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024

I think I've found the culprit for my crashing ESPs. I was using the USB output of a Surface Pro for one and for the other, it was on a power strip with USB ports. I plugged both ESPs on Samsung mobile chargers (at least 1 A) and from now, no more crash. Still, the ESPs are often rebooting (Restart timer is disabled) but they always manage to reconnect. Since I don't see restart reason in the logs, I guess it's a Wi-Fi dropout of the ESP.

image

from nuki_hub.

alexdelprete avatar alexdelprete commented on May 29, 2024

How are you monitoring the uptime? Ping?

from nuki_hub.

Mincka avatar Mincka commented on May 29, 2024
mqtt:
  sensor:
    - name: "Nuki Entrée Log"
      state_topic: "nuki_entree/maintenance/log"
    - name: "Nuki Cuisine Log"
      state_topic: "nuki_cuisine/maintenance/log"
    - name: "Nuki Entrée Uptime"
      state_topic: "nuki_entree/maintenance/uptime"
      device_class: duration
      unit_of_measurement: min 
    - name: "Nuki Cuisine Uptime"
      state_topic: "nuki_cuisine/maintenance/uptime"
      device_class: duration
      unit_of_measurement: min 

That's why I want to include all topics in discovery. ;)

from nuki_hub.

alexdelprete avatar alexdelprete commented on May 29, 2024

I didn't even know we had those measurements. :)

Yes, we definitely need to improve autodiscovery. ;)

What do we need to do? If I can help let me know...

from nuki_hub.

mundschenk-at avatar mundschenk-at commented on May 29, 2024

Yeah, it seems unlikely we will ever definitely find the underlying cause and the new watchdog works very well.

from nuki_hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.