Recently, there have been several exceptions about this issue happening every day with

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Connection prematurely closed BEFORE response on half closed connection about reactor-netty HOT 19 CLOSED

asasas234 commented on June 6, 2024

Connection prematurely closed BEFORE response on half closed connection

from reactor-netty.

Comments (19)

pderop commented on June 6, 2024 1

Can you confirm that the exact exception is `Connection prematurely closed BEFORE response", correct ?

Some suggestions:

For the destination server for which you sometimes get the premature close exception:

did you try to know if the server (or if some proxies) is/are using any kind of idle timers ? Then when you said you have configured the max idle time, is the max idle timer lower than the server (or proxies) ?
did you try to know if the server (or if some proxies) is/are configured with some max simultaneous connections ?
Maybe you should try to reduce the connection pool size to a lower value than the max configured in the server (or in proxy(ies)) ?
If you cannot manage to know if the server (or proxies ?) are using any idle timers, then maybe you can consider doing this:

if the remote server is not using https, then use nc in order to guess if the server is using an idle timer:

nc <server ip> <server port>

Replace the server ip by your proxy ip and server port by your proxy port if you are using a proxy.
If after some times, the nc connection is closed, it's likely that the remote server (or proxy) s configured with that delay as the idle timer, so configure the connection pool to a lower value.

if the remote server is configured with https, then you can use openssl to connect to the server and see if sometimes later, the connection is closed:

openssl s_client  -connect <server ip> <server port>

If you are using an HTTP proxy, use proxytunnel.
same as with nc, if the openssl connection is closed after some inactive delay, then configure the pool to a lower delay.

to detect who is sending RST/FIN signals, can you consider to just run tcpdump with filters, but only for FIN/RST ?, and if you know that the problem happens for a particular server, then also only filters on the server address, something like:

tcpdump -n -i any   'tcp[tcpflags] & (tcp-fin|tcp-rst) != 0 and host <server ipaddr>'

from reactor-netty.

asasas234 commented on June 6, 2024 1

@pderop Okay, thank you. Tomorrow I will contact the personnel of the object storage service and raise a work order to inquire about the reason for closing the connection.

from reactor-netty.

pderop commented on June 6, 2024

I interpret the documentation like this:

if the connection pool size is very high, then if at some point if the connection pool needs to establish a lot of connections simulteneously, then the server (or the proxy) may not be able to accept the connections timely, and in this case some http client requests may fail with a ConnectException: Operation timed out for example, or if https is used, with a SslHandshakeTimeoutException.
Now, if the server is configured with a limit regarding max number of accepted connections, then the server (or the proxy ?) may also possibly immediately close a just accepted connection when the limit is reached, and in this case the client would get a "PrematureCloseException: Connection prematurely closed BEFORE response".

Now, what are the versions of the components you are using (especially the reactor-netty version) ?

Can you share the exact stacktrace ? is it a PrematureCloseException, a Connect exception, a PoolAcquirePendingLimitException ?

thanks

from reactor-netty.

asasas234 commented on June 6, 2024

reactor-netty:1.1.12
reactor-core: 3.5.11
Currently, I have set up a separate HttpClient for the problematic HTTP services. I have also configured their max idle time and LIFO (Last In, First Out) settings. However, I am currently unable to handle new exceptions temporarily.

Thank you for your explanation, I understand that we need to consider not only the max idle time but also the maximum number of connections acceptable by the remote server.

from reactor-netty.

pderop commented on June 6, 2024

if the exception you got was really a Connection prematurely closed BEFORE response, please have a deep look at https://projectreactor.io/docs/netty/release/reference/index.html#faq.connection-closed

Especially, if you now seem to not have the issue anymore after setting a max idle time in the connection pool, please check #764 (comment):

You can configure a max idle time that is lower than the remote network appliances lowest max idle time. Here's some common ones we've seen:

If many cases, remote servers (or proxy) do have some kind of idle timers, so the key thing is to configure in your connection pool an idle timer that is lower than the remote server idle timer, else there may be a race condition: at some point the HttpClient may reuse a connection, but at the same time, the remote server closes it because it's idle. In this case, you might experience a PrematureCloseException. Setting in the http client pool an idle timer that is lower than the remote server will help a lot in order to avoid that problem.

But please, check the whole FAQ from https://projectreactor.io/docs/netty/release/reference/index.html#faq.connection-closed, because there may be other reasons causing PrematureCloseExceptions.

let me know.
thanks.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop Currently, I have modified the maximum idle time, and this issue has been resolved.

I have carefully read the introduction in the document about how to locate this issue before. But I can't keep tcpdump or wiretap logs running 24/7. It consumes too much server performance, and this issue only occurs a few times a day. Even if I were to enable it when the problem happens, it would be of no use as the issue has already occurred. So I am not quite accepting the troubleshooting solution mentioned in the documentation.

from reactor-netty.

pderop commented on June 6, 2024

so, if the issue is resolved, I'm closing this one, we can reopen it if needed.
thanks.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop This exception has occurred again, it seems that it has not been completely resolved. I have added monitoring to the httpclient connection pool and found that reactor_netty_connection_provider_pending_connections is always 0. Does this mean that the problem is not on the client side but rather that the other end closed the connection? Because this problem occurs infrequently, I cannot keep tcpdump open all the time. Is there any other way to locate this issue?

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop Can you reopen this question again?

from reactor-netty.

asasas234 commented on June 6, 2024

I asked about the maximum idle time on the other end, which is 60 seconds. I have already set it to 50 seconds. I also inquired about the maximum number of connections on the other end. Although I haven't set a maximum number of connections, from my monitoring, the current number of connections is far below their limit.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop I am not very familiar with tcpdump. Regarding the fourth item, may I use a domain name? I cannot determine the IP address.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop I found the IP of the domain name through the ping command. Although I cannot be sure if there are any others, I can indeed find connections related to this IP through netstat -anp. I have already used the command you provided and replaced the corresponding IP to start tcpdump. If this anomaly occurs again, I will send the captured information here.

from reactor-netty.

pderop commented on June 6, 2024

ok, now, I have a clue:

in your original issue description, it is stated:

it is like uploading audio files to object storage. And in terms of concurrency, its proportion is very high compared to the entire service.

So, is it about a large post http request (large file upload) ? In that case, you might face this same kind of issue (we have made a fix in for this issue, but you already have it because you are using the recent 1.1.12 reactor-netty version).

So, the scenario is the following (I'm assuming that it might be your use case, but you will need to investigate):

1- the client starts to initiate a large post http request (with a large payload data)
2- then the server decides to stop reading the request body parts, and sends an early final response (e.g., 400 bad request, or 413 Content Too Large for example) and closes the connection without reading the request post data which is still in flight, being sent by the client to the server
3- at this point, there are two cases: if the client has time to read while it's finishing to write the request body part, then it is ok, else, if the client is still writing, at some point, the server will send a TCP/RST, and as you know, the TCP/RST is not reliable, so even if the early response has already been sent to the client (even if it's in the client's kernel, but not yet consumed by the client process), then the early response may be dropped. In this case you may (or may not) get a PrematureCloseException.

So, first, to verify this assumption, you need to try to identify the size of the request that is causing the premature close exception, if it's really a big one.
then on the server side, you should check if there is a limit regarding max request payload size. As an example, for Tomcat, you need to know that there is an interesting propety: maxSwallowSize, so if you know that Tomcat is used in the server and if you have access to it, you could try to check how this property is configured. Here is a description of this property:

The maximum number of request body bytes (excluding transfer encoding overhead) that will be swallowed by Tomcat for an aborted upload. An aborted upload is when Tomcat knows that the request body is going to be ignored but the client still sends it. If Tomcat does not swallow the body the client is unlikely to see the response. If not specified the default of 2097152 (2 megabytes) will be used. A value of less than zero indicates that no limit should be enforced.

As you can see in the description, when the server sends an early response and closes immediately the connection without consuming some remaining bytes, then the client may not be able to consume the early response (beceause of TCP/RST).

So, all this is my assumption, but you need to investigate if this use case could match your current problem.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop This won't be very big. It is the size of the TTS subtitles. From my exception log, it seems that the corresponding text for the audio is not very long.

Target service is an object storage cloud service. I had no problems uploading videos before, so the audio should not have the issue you mentioned.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop Although my server has not encountered the error of connection being prematurely closed, I have discovered the following logs through tcpdump.

User
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404358 IP 169.254.0.47.https > 10.0.0.12.50300: Flags [R], seq 1136788998, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404364 IP 169.254.0.47.https > 172.17.0.2.50300: Flags [R], seq 1136788998, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404366 IP 169.254.0.47.https > 172.17.0.2.50300: Flags [R], seq 1136788998, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404401 IP 169.254.0.47.https > 10.0.0.12.50300: Flags [R], seq 1136788998, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404403 IP 169.254.0.47.https > 172.17.0.2.50300: Flags [R], seq 1136788998, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404404 IP 169.254.0.47.https > 172.17.0.2.50300: Flags [R], seq 1136788998, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404465 IP 169.254.0.47.https > 10.0.0.12.50298: Flags [R], seq 1085723424, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404469 IP 169.254.0.47.https > 172.17.0.2.50298: Flags [R], seq 1085723424, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404471 IP 169.254.0.47.https > 172.17.0.2.50298: Flags [R], seq 1085723424, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404520 IP 169.254.0.47.https > 10.0.0.12.50298: Flags [R], seq 1085723424, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404522 IP 169.254.0.47.https > 172.17.0.2.50298: Flags [R], seq 1085723424, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:49:37.404523 IP 169.254.0.47.https > 172.17.0.2.50298: Flags [R], seq 1085723424, win 0, length 0
Thu Nov 30 23:54:33 CST 2023: 2023-11-30 23:51:42.757492 IP 172.17.0.2.53036 > 169.254.0.47.https: Flags [F.], seq 3061579063, ack 3436209965, win 373, length 0

Can this log indicate that the target service forcibly closed the connection due to some error?

from reactor-netty.

pderop commented on June 6, 2024

[R] flags means a TCP/RST (https://amits-notes.readthedocs.io/en/latest/networking/tcpdump.html)

So, here, the 169.254.0.47 machine has sent some TCP/RST to 10.0.0.12, 172.17.0.2, and 10.0.0.12 ip addresses.
And the machine on 172.17.0.2 has sent a FIN ([F.]) to 169.254.0.47

from reactor-netty.

pderop commented on June 6, 2024

please indicate on which address the service is running, in order to clarify.

from reactor-netty.

pderop commented on June 6, 2024

closing this one for the moment, we can reopen if you need.

from reactor-netty.

asasas234 commented on June 6, 2024

@pderop Today I reported another exception related to the connection：org.springframework.web.reactive.function.client.WebClientRequestException: recvAddress(..) failed: Connection reset by peer

Then, after communicating with the engineer of the object storage, the conclusion I got was that during the minute when the exception occurred, coincidentally one of their machines went down.

However, since the machine went down and automatically disconnected, they cannot guarantee that it was definitely due to this issue. However, considering the timing is very coincidental, they can only assume it is because of this problem for now.

Apart from this, the rest of the observations seem to be fine.

from reactor-netty.

Connection prematurely closed BEFORE response on half closed connection about reactor-netty HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent