Giter Club home page Giter Club logo

Comments (5)

palkan avatar palkan commented on August 27, 2024

Also it is worth noting that Go routines value is also high anycable_go_goroutines_num=6983.

The number of goroutines is O(N) of the number of connections. That means, the connections are still "alive" from the anycable-go server perspective (and thus, the metrics look correct).

You turned off the traffic, but the ingress (load balancer) itself was running, right? It could still keep connections to the backend (anycable-go), thus, making them look alive.

For how long have you been monitoring anycable-go after dropping services? Another hypothesis is that the connections got stuck in a half-closed state (due to non-client initiated disconnect), and they could stay in this state for dozens of minutes (depending on the OS tcp settings, see https://docs.anycable.io/anycable-go/os_tuning?id=tcp-keepalive).

from anycable-go.

mabrikan avatar mabrikan commented on August 27, 2024

Hi @palkan, and thank you for your response.

I just want to clarify that what happened is that we physically deleted the ALB to make sure there is no traffic going to the affected instance, that is why we were surprised to see number of clients not changing.

After deleting the ALB, we monitored the instance for 24-hours and no change happened.

Another hypothesis is that the connections got stuck in a half-closed state (due to non-client initiated disconnect), and they could stay in this state for dozens of minutes (depending on the OS tcp settings, see https://docs.anycable.io/anycable-go/os_tuning?id=tcp-keepalive).

Inspecting the OS, here are the variables:

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200

Correct me if I'm wrong, but according to this, I think after ~2hours and a minute or so, number of clients should drop, which was not the case from our monitoring.

from anycable-go.

palkan avatar palkan commented on August 27, 2024

number of clients should drop, which was not the case from our monitoring.

Yeah, it should.

Just to confirm: you see the fresh data points in the monitoring system reflecting the huge numbers of goroutines/connections (not some interpolation)?

The goroutines_num value is updated periodically and highly unlikely to show stale system information. So, something happens at the connection handling layer. Do you have any logs (at the time you shut down the load balancer)?

from anycable-go.

mabrikan avatar mabrikan commented on August 27, 2024

No, unfortunately we don't have the logs.

What we did is that we implemented a monkey-patch by using keepAlive probe in the pod for anycable-go.

The probe sends a WebSockets connection to localhost with a specific timeout.

If the timeout is reached (i.e. most probably disconnect_queue and goroutines_num are high), the pod will restart.

Not a perfect solution, but it gets the job done 😄

from anycable-go.

palkan avatar palkan commented on August 27, 2024

Closing as stale

from anycable-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.