Giter Club home page Giter Club logo

Comments (14)

mikegibney avatar mikegibney commented on September 22, 2024 1

And for any future visitors, this is roughly the ping logic we're using:

    var connected = false;

    connection = new client.Connection({
        // settings
        onConnect: function () {
            connected = true;
            setInterval(() => {
                if(connection && connected) {
                    connection.sendPing(
                        () => {} // empty callback to avoid "unknown corrlation ID"
                    );
                }
            }, 10 * 1000);
        // connection logic
        
        },
        onError: function (err) {
            connected = false;
            // whatever error-handling logic
            throw err;
            });
        },
        onClose: (err) => {
            connected = false;
        }
    });

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

My suspicion is that Event Store is closing the connection due to a timeout. If you need to maintain a long-running connection to Event Store, it's necessary to send data occasionally for the server to consider the connection to be active. The easiest way to do this is to call the ping method on a regular basis. See #24 for another issue describing this scenario.

from event-store-client.

mikegibney avatar mikegibney commented on September 22, 2024

Is there an idiomatic way of checking the current state of the connection, or would I need to track that myself in the onConnect/onError/onClose callbacks?

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

You would need to track that yourself from those events.

from event-store-client.

mikegibney avatar mikegibney commented on September 22, 2024

As near as we've been able to determine, this problem stems from the automated heartbeat messages that Event Store sends (adding the sendPing helped but the connection still eventually died). In event-store-client/lib/connection.js, method receiveMessage:

        // Handle the message
        if (command == Commands.HeartbeatRequest) {
            // Automatically respond to heartbeat requests
            instance.sendMessage(correlationId, Commands.HeartbeatResponse);

        }

For whatever reason, it seems that sometimes the connection can be closed between receiving a heartbeat request and sending the response. Is this something that we should be accounting for in our application?

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

Ok, if you're suspecting the heartbeat responses are a problem then we'll need some more information to diagnose it.

I would recommend looking at the Event Store logs to see what it has to say about the connection (that will tell you whether it has closed the connection, or it thinks everything is okay). The other thing that would help would be a packet capture from Wireshark, so that we can see if it's a particular sequence of events that is causing the problem, or if there's something wrong with the heartbeat response packets themselves.

from event-store-client.

mikegibney avatar mikegibney commented on September 22, 2024

Here's the relevant bit from /var/log/eventstore/:

[PID:09313:025 2018.03.27 15:28:05.991 INFO  TcpConnectionManager] Connection 'external-normal' [10.10.0.32:54834, {e4fc4784-93a0-4111-b8bb-d037ce0d8708}] closed: Success.
[PID:09313:031 2018.03.27 15:28:05.992 DEBUG PersistentSubscripti] Lost connection from 10.10.0.32:54834
[PID:09313:032 2018.03.27 15:28:05.998 INFO  TcpConnection       ] ES TcpConnection closed [15:28:05.998: N10.10.0.32:54836, L10.10.0.34:1113, {a1e065d1-662f-429a-8a12-80ef35b81915}]:Received bytes: 106, Sent bytes: 16141
[PID:09313:032 2018.03.27 15:28:05.999 INFO  TcpConnection       ] ES TcpConnection closed [15:28:05.999: N10.10.0.32:54836, L10.10.0.34:1113, {a1e065d1-662f-429a-8a12-80ef35b81915}]:Send calls: 1, callbacks: 1
[PID:09313:032 2018.03.27 15:28:06.000 INFO  TcpConnection       ] ES TcpConnection closed [15:28:06.000: N10.10.0.32:54836, L10.10.0.34:1113, {a1e065d1-662f-429a-8a12-80ef35b81915}]:Receive calls: 2, callbacks: 2
[PID:09313:032 2018.03.27 15:28:06.000 INFO  TcpConnection       ] ES TcpConnection closed [15:28:06.000: N10.10.0.32:54836, L10.10.0.34:1113, {a1e065d1-662f-429a-8a12-80ef35b81915}]:Close reason: [Success] Socket closed
[PID:09313:031 2018.03.27 15:28:06.001 INFO  TcpService          ] External TCP connection accepted: [Normal, 10.10.0.32:54838, L10.10.0.34:1113, {d2c0aaee-e55a-49b5-883d-e517f673618a}].
[PID:09313:032 2018.03.27 15:28:06.003 INFO  TcpConnectionManager] Connection 'external-normal' [10.10.0.32:54836, {a1e065d1-662f-429a-8a12-80ef35b81915}] closed: Success.
[PID:09313:021 2018.03.27 15:28:06.004 DEBUG PersistentSubscripti] Lost connection from 10.10.0.32:54836
[PID:09313:019 2018.03.27 15:28:09.891 TRACE TcpConnectionManager] Closing connection 'external-normal' [10.10.0.32:54504, L10.10.0.34:1113, {0b40a3c5-8a8e-448f-b9aa-932d2b7ac315}] cleanly. Reason: HEARTBEAT TIMEOUT at msgNum 7
[PID:09313:019 2018.03.27 15:28:09.892 INFO  TcpConnection       ] ES TcpConnection closed [15:28:09.892: N10.10.0.32:54504, L10.10.0.34:1113, {0b40a3c5-8a8e-448f-b9aa-932d2b7ac315}]:Received bytes: 382, Sent bytes: 759
[PID:09313:019 2018.03.27 15:28:09.893 INFO  TcpConnection       ] ES TcpConnection closed [15:28:09.893: N10.10.0.32:54504, L10.10.0.34:1113, {0b40a3c5-8a8e-448f-b9aa-932d2b7ac315}]:Send calls: 8, callbacks: 8
[PID:09313:019 2018.03.27 15:28:09.893 INFO  TcpConnection       ] ES TcpConnection closed [15:28:09.893: N10.10.0.32:54504, L10.10.0.34:1113, {0b40a3c5-8a8e-448f-b9aa-932d2b7ac315}]:Receive calls: 8, callbacks: 7
[PID:09313:019 2018.03.27 15:28:09.894 INFO  TcpConnection       ] ES TcpConnection closed [15:28:09.894: N10.10.0.32:54504, L10.10.0.34:1113, {0b40a3c5-8a8e-448f-b9aa-932d2b7ac315}]:Close reason: [Success] HEARTBEAT TIMEOUT at msgNum 7
[PID:09313:019 2018.03.27 15:28:09.895 INFO  TcpConnectionManager] Connection 'external-normal' [10.10.0.32:54504, {0b40a3c5-8a8e-448f-b9aa-932d2b7ac315}] closed: Success.
[PID:09313:017 2018.03.27 15:28:09.896 DEBUG PersistentSubscripti] Lost connection from 10.10.0.32:54504
[PID:09313:021 2018.03.27 15:28:10.242 INFO  TcpConnection       ] ES TcpConnection closed [15:28:10.242: N10.10.0.32:54838, L10.10.0.34:1113, {d2c0aaee-e55a-49b5-883d-e517f673618a}]:Received bytes: 106, Sent bytes: 16230
[PID:09313:021 2018.03.27 15:28:10.243 INFO  TcpConnection       ] ES TcpConnection closed [15:28:10.243: N10.10.0.32:54838, L10.10.0.34:1113, {d2c0aaee-e55a-49b5-883d-e517f673618a}]:Send calls: 2, callbacks: 2
[PID:09313:021 2018.03.27 15:28:10.244 INFO  TcpConnection       ] ES TcpConnection closed [15:28:10.244: N10.10.0.32:54838, L10.10.0.34:1113, {d2c0aaee-e55a-49b5-883d-e517f673618a}]:Receive calls: 2, callbacks: 2
[PID:09313:021 2018.03.27 15:28:10.244 INFO  TcpConnection       ] ES TcpConnection closed [15:28:10.244: N10.10.0.32:54838, L10.10.0.34:1113, {d2c0aaee-e55a-49b5-883d-e517f673618a}]:Close reason: [Success] Socket closed
[PID:09313:021 2018.03.27 15:28:10.245 INFO  TcpConnectionManager] Connection 'external-normal' [10.10.0.32:54838, {d2c0aaee-e55a-49b5-883d-e517f673618a}] closed: Success.
[PID:09313:026 2018.03.27 15:28:10.246 DEBUG PersistentSubscripti] Lost connection from 10.10.0.32:54838
[PID:09313:027 2018.03.27 15:36:46.456 TRACE InMemoryBus         ] SLOW BUS MSG [manager input bus]: GetStatistics - 54ms. Handler: ProjectionManager.
[PID:09313:027 2018.03.27 15:36:46.457 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projections Master]: GetStatistics - 54ms. Q: 0/0.
[PID:09313:027 2018.03.27 15:48:10.475 TRACE InMemoryBus         ] SLOW BUS MSG [manager input bus]: GetStatistics - 75ms. Handler: ProjectionManager.
[PID:09313:027 2018.03.27 15:48:10.476 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projections Master]: GetStatistics - 76ms. Q: 0/0.

The exact timestamp of the error from the container was 2018-03-27T15:28:10.225626957Z. 10.10.0.32 is the IP of the consuming service. Still working on getting a packet capture.

from event-store-client.

mikegibney avatar mikegibney commented on September 22, 2024

Sorry I lost the thread on this, had to deal with a couple other issues that popped up. What would you want from a packet capture? This is running on Ubuntu server, so it looks I'd be using tshark or tcpdump but if I'm being completely honest I'm in way over my head with this. I'm far from a TCP expert so I'm not even sure what I'm looking for or how to go about getting it.

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

So the Event Store logs are suggesting that we didn't reply to the Heartbeat request with a Heartbeat response. A packet capture should be able to tell us if that is the case, and if not, might identify what the confusion is.

All we need to capture is the TCP data on port 1113 when this issue is happening. I'm no expert on tcpdump (I usually use Wireshark myself), but it looks like tcpdump port 1113 -w capture.pcap should capture the data we need and save it to a file that you can then attach to this issue.

from event-store-client.

mikegibney avatar mikegibney commented on September 22, 2024

After a certain degree of bumbling and false starts, I have successfully obtained a packet capture....which is chock full of a bunch of proprietary data that I don't think I can share. :/ Is there an easy way to extract the heartbeat packets?

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

The heartbeat packets by themselves probably aren't that useful, it's more useful to see the full stream of packets, in case there's something wrong with the order in which they're being sent.

Three possible approaches from here:

  • Share the capture with me privately via email (my first name dot my last name at gmail dot com) so that I can investigate.
  • Create a copy of your application and simplify it down to the code that deals with Event Store, and make sure it still reproduces the issue. Get it to send dummy data rather than proprietary data. Share the code to that application, so that I can reproduce the issue.
  • Dig through the packet capture yourself in Wireshark to see if you can see any odd behaviour.

from event-store-client.

mikegibney avatar mikegibney commented on September 22, 2024

Okay, I managed to randomize/obfuscate the sensitive data with relative ease. Let me know if this is what you need.

capture.zip

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

Ok, there's definitely something wrong with the heartbeat responses. Here's a list of packets from your capture showing the server requesting a heartbeat, but the client carries on regardless without sending a heartbeat response:

heartbeatrequestbutnoresponse

from event-store-client.

x-cubed avatar x-cubed commented on September 22, 2024

So the interesting thing to note there is that it does respond to the heartbeat request in packet 81343, but then fails to respond later to the second request in 81567. That would suggest that the receive logic is not handling incoming messages after the ReadStreamEventsForwardCompleted is received.

You could try enabling the debug flag on the connection to get log output from the client, to see if it recognizes the second HeartbeatRequest.

from event-store-client.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.