Hi Roger,
We are experiencing behavior problems on 1.4.7 with message subscription volumes on the order of 20,000 messages per minute of 60 byte messages.
Since we're seeing no issues in publishing messages to the server at these rates, and the server utilization and network volumes are low, it appears that the issue is related to the subscribing message code path.
The first connected client seems to work without errors, but subsequent clients on the same topic keep failing. But this may be just the effects of additional load.
The problem appears to occur with both Qos 0 and 1 with mosquitto_sub and Paho Java clients.
From the trace (below), the server is logging PUBLISH (and PINGRESP) messages to these stalled clients, but the clients aren't receiving the messages.
From a browse of the code, I see there are a number of steps and a packet queue between logging (using debug) the PUBLISH and when the real socket write occurs.
The _mosquitto_packet_queue and and _mosquitto_packet_write routines are daunting to follow given their mutex lock, linked list, callback, and state tests.
Roger, we'd appreciate your assistance to resolve this. We've spent a few days tracing the behavior and browsing the code.
Perhaps we should we try a version that predates the Web Socket support modifications?
Thanks,
Gary
There are no observed problems at lower volumes on the order of 1,000 messages/second.
If I start multiple clients on the same workstation to the server, each will ultimately fail at different times subscribing to the same topic. Watching the terminal windows side-by-side, I will see one window stall while others continue to scroll by.
The server is running at a low utilization (<10%) with mosquitto running in a Docker container at 1-3% average utilization. The server is hosted in an Amazon Linux image and is assigned a single core of a Intel CPU. All the workload on this Linux AMI is in containers running Java apps for publishing and subscribing. One would expect that my AMI is sharing CPU resources with many others.
After the client stalls, it sends a PINGREQ after some number of seconds, but this does not restart the stream. A minute later, the client disconnects (and then reconnects).
There are currently only two topics, with Paho clients publishing to each. One device and topic is about 300 messages/second. The other topic is fed from a message aggregator at 20,000 messages/second. There are normally only three or four subscribers doing logging and analytics. The problem is when a subscriber includes the high-volume topic (it uses # to combine the two topics, but also fails using just the higher-volume topic name).
Server is 1.4.7 using Dockerfile slightly modified from https://github.com/toke/docker-mosquitto (debian:jessie on top of ubuntu 14.04 AMI)
Java clients are Paho 1.0.2 with Java 8 on AWS Linux and Apple MB El Capitan.
mosquitto_sub version 1.4.5 running on libmosquitto 1.4.5 on Apple MB.
Client trace with q0 - Client stopped receiving PUBLISH messages, client sent PINGREQ, and a PINGRESP was logged by server.
But no PUBLISH stream was received. After 60 seconds there was a disconnect and reconnect
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (62 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,A,8h3OwjQKP@4GCJPPP121IoCol54cd0AwwwwwwWraTP0,2_66
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (46 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,Dh3OwjPflnfpMaF9HNAF9HNqF9H,2_63
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,A,133nbePP00PD>i8MDF7h0?w02><,0*0D Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes)) Source/AIS/AISHuB !AIVDM,1,1,,B,13aOBk00000BbIbMfR0Q31W20>
<,0_36
Client mosqsub/58985-GMG-MB-Pr sending PINGREQ
Client mosqsub/58985-GMG-MB-Pr sending CONNECT
Client mosqsub/58985-GMG-MB-Pr received CONNACK
Client mosqsub/58985-GMG-MB-Pr sending SUBSCRIBE (Mid: 2, Topic: Source/AIS/#, QoS: 0)
Client mosqsub/58985-GMG-MB-Pr received SUBACK Subscribed (mid: 2): 0
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,36:Bd7Q000T97NFkTDd<GH>2>
<,0_1A
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,16:=HgPP00d<:0AtiFRCOv<0>
<,0_64
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (71 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,85NoHR1Kf1HgQVqnP?uKWerkmrKi3J9ptKFg@1urjDiKqn0N@a58,0_57
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (46 bytes))
Server Log for this client:
cat server-58985.log | grep "mosqsub/58985" | grep -v "PUBLISH"
1452806532: New client connected from 100.35.218.11 as mosqsub/58985-GMG-MB-Pr (c1, k60).
1452806532: Sending CONNACK to mosqsub/58985-GMG-MB-Pr (0, 0)
1452806532: Received SUBSCRIBE from mosqsub/58985-GMG-MB-Pr
1452806532: Sending SUBACK to mosqsub/58985-GMG-MB-Pr
1452806592: Received PINGREQ from mosqsub/58985-GMG-MB-Pr
1452806592: Sending PINGRESP to mosqsub/58985-GMG-MB-Pr
1452806652: Socket error on client mosqsub/58985-GMG-MB-Pr, disconnecting.
1452806653: New client connected from 100.35.218.11 as mosqsub/58985-GMG-MB-Pr (c1, k60).
1452806653: Sending CONNACK to mosqsub/58985-GMG-MB-Pr (0, 0)
1452806653: Received SUBSCRIBE from mosqsub/58985-GMG-MB-Pr
1452806653: Sending SUBACK to mosqsub/58985-GMG-MB-Pr
Note on above: The PINGREQ was exactly 60 seconds after SUBACK, although the PUBLISH messages stopped a number of seconds earlier.
Note on above: The disconnect was similarly exactly 60 seconds after the PINGREQ/PINGRESP sequence. No PUBLISH messages displayed durng this interval.
Note on below: The server was logging PUBLISH messages in the same second -- before and after -- PINGREQ/PINGRESP, but nothing was displayed on mosquitto_sub terminal:
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (113 bytes))
1452806592: Received PINGREQ from mosqsub/58985-GMG-MB-Pr
1452806592: Sending PINGRESP to mosqsub/58985-GMG-MB-Pr
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/JC', ... (47 bytes))
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (34 bytes))
Thus we conclude there is an issue after logging outbound messages that is causing them not to reach the session socket correctly.
Similar situation with q=1
Display of messages stops, subsequent PINGREQ, nothing displayed for 60 seconds, then disconnect
Trace of mosquitto_sub client that stalled and subsequently sent a PINGREQ
mosquitto_sub -h 54.172.21.249 -t "Source/AIS/AISHuB" -v -d -q 1
Client mosqsub/57156-GMG-MB-Pr received PUBLISH (d0, q1, r0, m16680, 'Source/AIS/AISHuB', ... (47 bytes))
Client mosqsub/57156-GMG-MB-Pr sending PUBACK (Mid: 16680)
Source/AIS/AISHuB !AIVDM,1,1,,A,15NDTLgP1Gre39hG@hhWo?wR2>`<,0_71
Client mosqsub/57156-GMG-MB-Pr received PUBLISH (d0, q1, r0, m16681, 'Source/AIS/AISHuB', ... (65 bytes))
Client mosqsub/57156-GMG-MB-Pr sending PUBACK (Mid: 16681)
Source/AIS/AISHuB !AIVDM,1,1,,A,E02E340W6@1WPab3bPa200000000:usB?9TV@00003v010,4_1F
Client mosqsub/57156-GMG-MB-Pr received PUBLISH (d0, q1, r0, m16682, 'Source/AIS/AISHuB', ... (46 bytes))
Client mosqsub/57156-GMG-MB-Pr sending PUBACK (Mid: 16682)
Source/AIS/AISHuB !AIVDM,1,1,,B,Dh3OwjPflnfpMaF9HNAF9HNqF9H,2*63
Client mosqsub/57156-GMG-MB-Pr sending PINGREQ
^C
---
Server Config:
Place your local configuration in /etc/mosquitto/conf.d/
pid_file /mqtt/data/mosquitto.pid
persistence true
persistence_location /mqtt/data/
max_inflight_messages 1000
max_queued_messages 40000
queue_qos0_messages true
message_size_limit 8192
persistent_client_expiration 2d
autosave_interval 1800
user ubuntu
log_dest file /mqtt/log/mosquitto.log
log_type debug
log_type error
log_type warning
log_type notice
connection_messages true
log_timestamp true
include_dir /mqtt/config/conf.d
Stats:
$SYS/broker/version mosquitto version 1.4.7
$SYS/broker/timestamp Tue, 22 Dec 2015 12:47:28 +0000
$SYS/broker/uptime 2959 seconds
$SYS/broker/clients/total 6
$SYS/broker/clients/inactive 0
$SYS/broker/clients/disconnected 0
$SYS/broker/clients/active 6
$SYS/broker/clients/connected 6
$SYS/broker/clients/expired 0
$SYS/broker/messages/stored 137
$SYS/broker/messages/received 2116842
$SYS/broker/messages/sent 2151495
$SYS/broker/subscriptions/count 4
$SYS/broker/retained messages/count 47
$SYS/broker/heap/current 42224
$SYS/broker/heap/maximum 3738960
$SYS/broker/publish/messages/dropped 0
$SYS/broker/publish/messages/received 1021941
$SYS/broker/publish/messages/sent 34633
$SYS/broker/publish/bytes/received 58142385
$SYS/broker/publish/bytes/sent 62518744
$SYS/broker/bytes/received 86347945
$SYS/broker/bytes/sent 90662285
$SYS/broker/load/messages/received/1min 42973.26
$SYS/broker/load/messages/received/5min 42390.26
$SYS/broker/load/messages/received/15min 40997.44
$SYS/broker/load/messages/sent/1min 43135.45
$SYS/broker/load/messages/sent/5min 42651.92
$SYS/broker/load/messages/sent/15min 41628.45
$SYS/broker/load/publish/received/1min 20687.61
$SYS/broker/load/publish/received/5min 20424.22
$SYS/broker/load/publish/received/15min 19793.77
$SYS/broker/load/publish/sent/1min 157.93
$SYS/broker/load/publish/sent/5min 261.30
$SYS/broker/load/publish/sent/15min 630.87
$SYS/broker/load/bytes/received/1min 1747734.36
$SYS/broker/load/bytes/received/5min 1728984.96
$SYS/broker/load/bytes/received/15min 1674080.57
$SYS/broker/load/bytes/sent/1min 1795965.65
$SYS/broker/load/bytes/sent/5min 1783664.37
$SYS/broker/load/bytes/sent/15min 1754350.49
$SYS/broker/load/sockets/1min 0.06
$SYS/broker/load/sockets/5min 0.20
$SYS/broker/load/sockets/15min 0.14
$SYS/broker/load/connections/1min 0.06
$SYS/broker/load/connections/5min 0.20
$SYS/broker/load/connections/15min 0.14
-- end --