Giter Club home page Giter Club logo

Comments (9)

thehellmaker avatar thehellmaker commented on May 26, 2024 1

Thanks. We also found another issue which exactly coincides with your hypothesis. Our http api implementation has a bug that if any exception thrown our server does not return a response and client waits forever. We are fixing this issue

from emqx.

kjellwinblad avatar kjellwinblad commented on May 26, 2024

Thanks for reporting this. It looks like the timeout happens when our HTTP client timeouts while doing the request. Can you send your configuration for the HTTP authentication? In particular it would be interesting to see what the the value is for the request_timeout option. Is it possible that your HTTP server gets slower at certain time periods and that could cause the timeouts?

from emqx.

zmstone avatar zmstone commented on May 26, 2024

I have seen once such behaviour before: if the the http server silently drops a HTTP request (e.g. due to rate limit) without responding with an error code, and does not close the connection either, the HTTP client (at HTTP layer) will wait indefinitely for a HTTP response or socket close -- this is however just a guesswork, would be nice if @thehellmaker you can help to look on the server side (logs maybe) to verify my guess.

Nonetheless, we plan to do something at application layer: reconnect if timeout happens.
NOTE: The application layer did not do retries or reconnects because in most cases, it's a server overload, retry or reconnect is likely going to increase the load even more. so the coming fix will have to be very careful when picking the dfaults.

from emqx.

thehellmaker avatar thehellmaker commented on May 26, 2024

@kjellwinblad Here is the authn config on emqx.conf. request_timeout is default 15 i presume as I do not provide it.
And not the load on the server is <20% CPU utilization @P99 so i do not think its server slowing down. However there are times when the server is unavailable on non blue green deployment at times.

We have 2 different listeners, one is with the below authn and other with mTLS certificate verification. The username based http based listener shown below is what our mobile application connect to and mtls based listener is what our devices connect to. While the username pwd http based auth suffers from this issue the mtls based devices are absolutely fine and they are able to connect. I can also confirm both my application cluster and emqx are running on the same machine so network partitions/connectivity issues are not a possibility. I think what @zmstone might also be possible as this issue starts building up slowly where these count of the timeout occurrences build up slowly until it starts happening to all requests. So it seems like the connections in the pool start getting into inconsistent state slowly for some reason I dnt know yet.

authentication = [
{
mechanism = password_based
backend = http
enable = true

        method = post
        url = "***",
        body {
            clientid = "${clientid}"
            username = "${username}"
            password = "${password}"
            peerhost = "${peerhost}"
            cert_subject = "${cert_subject}"
            cert_common_name = "${cert_common_name}"
        }
        headers {
            "Content-Type" = "application/json"
            "X-Request-Source" = "EMQX"
        }
    }
]

the default pool size is 8 so if more than 8 requests come at the same time it should get pipelined. However that can timeout the requests as well if some of these are starved regularly. Our mobile applications which are connecting to this listener have infinite retries on this failure so initially once in a while connection requests fail, after sometime the first 2 fail almost regularly and then it connects, and then it increased to 5 reconnects before it connects and finally all reconnects start failing. I have now changed the config to below which has increased pool_size parameter and stricter timeouts and trying.

authentication = [
{
mechanism = password_based
backend = http
enable = true

        method = post
        url = "***",
        pool_size=24
        enable_pipelining=100
        connect_timeout = 10
        request_timeout = 5
        body {
            clientid = "${clientid}"
            username = "${username}"
            password = "${password}"
            peerhost = "${peerhost}"
            cert_subject = "${cert_subject}"
            cert_common_name = "${cert_common_name}"
        }
        headers {
            "Content-Type" = "application/json"
            "X-Request-Source" = "EMQX"
        }
    }
]

@zmstone Since emqx is only giving these logs and my entire application is running just fine with other devices connecting to mtls listener.
I have 2 options to confirm your hypothesis

  1. Stress testing a lot of simultaneous connect requests so that the server is overloaded to simulate this scenario.
  2. Simulate by not responding to any authn requests by the http auth api

from emqx.

thehellmaker avatar thehellmaker commented on May 26, 2024

@zmstone it looks like your hunch was right. This is happening when the http server is unable to respond. Our deployments are not bluegreen right now. And the entire http service is unavailable during deployment during which time the http sever will be unable to respond. We have been able to verify that the more deployments we do this issue gets worse progressively.

from emqx.

zmstone avatar zmstone commented on May 26, 2024

@zmstone it looks like your hunch was right. This is happening when the http server is unable to respond. Our deployments are not bluegreen right now. And the entire http service is unavailable during deployment during which time the http sever will be unable to respond. We have been able to verify that the more deployments we do this issue gets worse progressively.

Thank you for the confirmation.
I wonder why the server doesn’t reply error codes such as 503, or disconnect.

from emqx.

thehellmaker avatar thehellmaker commented on May 26, 2024

I am not super sure but what I can confirm is that there are always mqtt connections and new requests coming consistently so there could be a possibillity that a request came right before the deployment started and the server was stopped after the handshake.

Can we introduce client side timeout configurations for the http pool so that clients can configure accordingly and if they don't return a response then the connection is returned to the pool timing it out?

from emqx.

zmstone avatar zmstone commented on May 26, 2024

yeah sure. I will work on a patch. Will be in 5.7.1 or 5.8.0

from emqx.

zmstone avatar zmstone commented on May 26, 2024

Thanks. We also found another issue which exactly coincides with your hypothesis. Our http api implementation has a bug that if any exception thrown our server does not return a response and client waits forever. We are fixing this issue

This is maybe the only cause. Or should at least buy some time before we release the enhancement.

from emqx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.