Comments (9)
Thanks. We also found another issue which exactly coincides with your hypothesis. Our http api implementation has a bug that if any exception thrown our server does not return a response and client waits forever. We are fixing this issue
from emqx.
Thanks for reporting this. It looks like the timeout happens when our HTTP client timeouts while doing the request. Can you send your configuration for the HTTP authentication? In particular it would be interesting to see what the the value is for the request_timeout
option. Is it possible that your HTTP server gets slower at certain time periods and that could cause the timeouts?
from emqx.
I have seen once such behaviour before: if the the http server silently drops a HTTP request (e.g. due to rate limit) without responding with an error code, and does not close the connection either, the HTTP client (at HTTP layer) will wait indefinitely for a HTTP response or socket close -- this is however just a guesswork, would be nice if @thehellmaker you can help to look on the server side (logs maybe) to verify my guess.
Nonetheless, we plan to do something at application layer: reconnect if timeout happens.
NOTE: The application layer did not do retries or reconnects because in most cases, it's a server overload, retry or reconnect is likely going to increase the load even more. so the coming fix will have to be very careful when picking the dfaults.
from emqx.
@kjellwinblad Here is the authn config on emqx.conf. request_timeout is default 15 i presume as I do not provide it.
And not the load on the server is <20% CPU utilization @P99 so i do not think its server slowing down. However there are times when the server is unavailable on non blue green deployment at times.
We have 2 different listeners, one is with the below authn and other with mTLS certificate verification. The username based http based listener shown below is what our mobile application connect to and mtls based listener is what our devices connect to. While the username pwd http based auth suffers from this issue the mtls based devices are absolutely fine and they are able to connect. I can also confirm both my application cluster and emqx are running on the same machine so network partitions/connectivity issues are not a possibility. I think what @zmstone might also be possible as this issue starts building up slowly where these count of the timeout occurrences build up slowly until it starts happening to all requests. So it seems like the connections in the pool start getting into inconsistent state slowly for some reason I dnt know yet.
authentication = [
{
mechanism = password_based
backend = http
enable = true
method = post
url = "***",
body {
clientid = "${clientid}"
username = "${username}"
password = "${password}"
peerhost = "${peerhost}"
cert_subject = "${cert_subject}"
cert_common_name = "${cert_common_name}"
}
headers {
"Content-Type" = "application/json"
"X-Request-Source" = "EMQX"
}
}
]
the default pool size is 8 so if more than 8 requests come at the same time it should get pipelined. However that can timeout the requests as well if some of these are starved regularly. Our mobile applications which are connecting to this listener have infinite retries on this failure so initially once in a while connection requests fail, after sometime the first 2 fail almost regularly and then it connects, and then it increased to 5 reconnects before it connects and finally all reconnects start failing. I have now changed the config to below which has increased pool_size parameter and stricter timeouts and trying.
authentication = [
{
mechanism = password_based
backend = http
enable = true
method = post
url = "***",
pool_size=24
enable_pipelining=100
connect_timeout = 10
request_timeout = 5
body {
clientid = "${clientid}"
username = "${username}"
password = "${password}"
peerhost = "${peerhost}"
cert_subject = "${cert_subject}"
cert_common_name = "${cert_common_name}"
}
headers {
"Content-Type" = "application/json"
"X-Request-Source" = "EMQX"
}
}
]
@zmstone Since emqx is only giving these logs and my entire application is running just fine with other devices connecting to mtls listener.
I have 2 options to confirm your hypothesis
- Stress testing a lot of simultaneous connect requests so that the server is overloaded to simulate this scenario.
- Simulate by not responding to any authn requests by the http auth api
from emqx.
@zmstone it looks like your hunch was right. This is happening when the http server is unable to respond. Our deployments are not bluegreen right now. And the entire http service is unavailable during deployment during which time the http sever will be unable to respond. We have been able to verify that the more deployments we do this issue gets worse progressively.
from emqx.
@zmstone it looks like your hunch was right. This is happening when the http server is unable to respond. Our deployments are not bluegreen right now. And the entire http service is unavailable during deployment during which time the http sever will be unable to respond. We have been able to verify that the more deployments we do this issue gets worse progressively.
Thank you for the confirmation.
I wonder why the server doesn’t reply error codes such as 503, or disconnect.
from emqx.
I am not super sure but what I can confirm is that there are always mqtt connections and new requests coming consistently so there could be a possibillity that a request came right before the deployment started and the server was stopped after the handshake.
Can we introduce client side timeout configurations for the http pool so that clients can configure accordingly and if they don't return a response then the connection is returned to the pool timing it out?
from emqx.
yeah sure. I will work on a patch. Will be in 5.7.1 or 5.8.0
from emqx.
Thanks. We also found another issue which exactly coincides with your hypothesis. Our http api implementation has a bug that if any exception thrown our server does not return a response and client waits forever. We are fixing this issue
This is maybe the only cause. Or should at least buy some time before we release the enhancement.
from emqx.
Related Issues (20)
- 使用 MongoDB4.0 进行密码认证,无法成功接入数据源。 HOT 11
- Can't add users on touch screen device HOT 7
- Action output to external HTTP server - authorization header issue HOT 2
- The client disconnects after publish message whose topic starts with '$' HOT 2
- Not getting Emqx dashboard HOT 1
- EMQX Clustering - Message Replication HOT 4
- STOMP GW does not send heartbeat to client HOT 4
- EMQX cluster cannot restart after persistence HOT 1
- After STOMP enables authentication, authentication fails when the account password carries a colon
- runq_overload everyday for few minutes HOT 4
- Receive Maximum Not Sent in Bridge CONNACK HOT 4
- ~10ms latency on publishing and receiving message on the same machine on windows HOT 6
- webhook监听上下线事件时,事件时序有误,原因不明(connected and disconnected events may out of order) HOT 3
- For my project required kafka and kerberos integration with EMQX opensource,So any how is it possible to do these integration with Emqx OPenSource version HOT 2
- Get acknowledgement from subscriber(s) after publish messages HOT 2
- Restored retained message have no payload HOT 2
- error messages received and MQTT broker keep running up and down, very HOT 4
- Several protocol violations or bugs in EMQX HOT 26
- Upgrade to Openssl 3.0 or higher HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emqx.