I have the use-case where we have a client that sends a multipart to a server and if t

<a target="_blank" rel="noopener noreferrer" href="https://user-images.githubusercontent.com/227204/

Oh, and thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hov

Oh, and thanks <a class="user-mention notranslate" data-hovercard-type="u

So considering what <a class="user-mention notranslate" data-hovercard-ty

Sometimes fails to retrieve a response when the server doesn't read the body about ureq HOT 9 CLOSED

johan-bjareholt commented on August 30, 2024

Sometimes fails to retrieve a response when the server doesn't read the body

from ureq.

Comments (9)

algesten commented on August 30, 2024

Wireshark of this.

Basically ureq is still in "I'm sending request body mode", when we get the RST. At this point we never check whether there maybe have been some response data sent before the socket hangup.

We bail using ? here: https://github.com/algesten/ureq/blob/main/src/unit.rs#L281 – which means we don't try to read any data before handing back to the user.

Shouldn't be too hard to fix. Rather than bailing with ?, retain the error, attempt to read incoming response (I assume the socket is still good for that), then return Ok(Response) instead.

This could maybe be a breaking change for some users, but it's probably a correct change.

from ureq.

algesten commented on August 30, 2024

Oh, and thanks @johan-bjareholt, and welcome to ureq! :)

from ureq.

jsha commented on August 30, 2024

Indeed, thanks for the report. I suspect it's not possible at the TCP level to actually fulfill the stated requirements in a reliable way. The "normal" way to finish a TCP connection is for each side to send a FIN when it's done sending data. That puts the connection in a half closed state and the other side can reliably read everything up to the FIN. So for instance ideally in your situation the server would send a FIN so the client can reliably read the response headers and see the clean shutdown. Of course, for your use case that may not be possible. Most HTTP 1.1 clients (like ureq) don't try to read anything from the server until they're done sending the body. So if the server replied with an HTTP header and a FIN, the client would still be trying to push through the massive upload. If the server stopped reading bytes off the connection, the client would presumably hang once the congestion window filled up with no replies. The server would also be unhappy since it would have a bunch of half-open connections hanging around. So RST makes a lot of sense- that's how a TCP peer can say "I'm done! I don't care what you have to say, I don't want to think about this connection *any more*." It's an unclean shutdown. The server gets to recover its resources for the connection, and the client very quickly gets an error sending bytes on the connection so it stops sending more. The problem is: when the client gets an error writing bytes because the connection was reset, will it be able to *read* the bytes that were sent prior to the RST? It will not. It says here <http://blog.olivierlanglois.net/index.php/2010/02/06/tcp_rst_flag_subtleties> :

Upon reception of RST segment, the receiving side will immediately abort

the connection. This statement has more implications than just meaning that you will not be able to receive or send any more data to/from this connection. It also implies that any unread data still in the TCP reception buffer will be lost. (This also matches my personal rough understanding of TCP semantics) So how to solve this problem? For the "more common" use case you describe:

Another use-case that might be more common where this issue might also

occur, would be if a server responds early with a "413 Request Entity Too Large". This is solvable! It's what the HTTP Expect: 100-continue <https://www.rfc-editor.org/rfc/rfc9110#field.expect> header is for. The client sends the request headers, *including Content-Length*, along with the `Expect: 100-continue` header. The server examines those headers and replies with either HTTP 200 OK or HTTP 413 Request Entity Too Large (or an authentication error, or something else). Because the client sent `Expect: 100-continue`, it knows that from an HTTP protocol perspective, it has to read the response header from the server before continuing to send. If the server rejects the request, the client will never attempt to send the body of the request at all. Using this technique is extra cool because it doesn't even require destroying the TCP connection; it can be reused for another request later. Neither RST nor FIN is sent. How to solve it for the more specific use case you have? If things have to be exactly as you say, then I think the only option is to accept that some errors will happen at the protocol layer (connection reset) rather than at the status code layer (413 Request Entity Too Large). If you have control of the server, I have some ideas for different architectures that might achieve what you want, but I will avoid typing them out unless that's of use to you. :-D

from ureq.

johan-bjareholt commented on August 30, 2024

Oh, and thanks @johan-bjareholt, and welcome to ureq! :)

Thank you, and thanks for the detailed answers!

Hm, this seems a little bit more complicated than I expected.

So considering what @jsha said, I assume that the following might not work

Shouldn't be too hard to fix. Rather than bailing with ?, retain the error, attempt to read incoming response (I assume the socket is still good for that), then return Ok(Response) instead.

So if the server responded with a RST, it is not guaranteed that the response will be able to be read as TCP closes that socket for reading? If it closes down in some other manner than an instant RST though, it might?

If you have control of the server, I have some ideas for different architectures that might achieve what you want, but I will avoid typing them out unless that's of use to you. :-D

It's kind of mixed. I don't own the server that we are connecting to but I am in close contact with those who do, so it might be an option. But we'd prefer to avoid changes if possible, as it would be breaking. But considering what you are saying, it sounds hard to solve without breaking the API.

Tomorrow I will try to get a network trace of the real use-case with the server I'm interacting with, to see how it behaves.

Of course, for your use case that may not be possible. Most HTTP 1.1 clients (like ureq) don't try to read anything from the server until they're done sending the body. So if the server replied with an HTTP header and a FIN, the client would still be trying to push through the massive upload. If the server stopped reading bytes off the connection, the client would presumably hang once the congestion window filled up with no replies. The server would also be unhappy since it would have a bunch of half-open connections hanging around.

So RST makes a lot of sense- that's how a TCP peer can say "I'm done! I
don't care what you have to say, I don't want to think about this
connection any more." It's an unclean shutdown. The server gets to
recover its resources for the connection, and the client very quickly gets
an error sending bytes on the connection so it stops sending more.

But in theory, wouldn't it be possible for the server to respond with the HTTP headers and a FIN and wait for a FIN from the client side, but maybe have a 5s timeout if it doesn't get a FIN and then send an RST? If the client does not support reading the response before sending the request it would get a RST, otherwise it would be handled gracefully if the client responds with a FIN within the timeout?

from ureq.

algesten commented on August 30, 2024

So considering what @jsha said, I assume that the following might not work

Correct. We can implement it, but depending on how prevalent the RST is (which sounds like it's the way most servers do it), ureq users would generally not be able to use it. If you find your server is doing FIN instead of RST, we could of course consider it since I don't believe anything would break.

from ureq.

jsha commented on August 30, 2024

So if the server responded with a RST, it is not guaranteed that the response will be able to be read as TCP closes that socket for reading? If it closes down in some other manner than an instant RST though, it might?

That's right. Specifically, when the client's kernel / TCP stack receives the RST it will discard any bytes in the receive buffer, whether the application has read those bytes or not.

I don't own the server that we are connecting to but I am in close contact with those who do, so it might be an option. But we'd prefer to avoid changes if possible, as it would be breaking. But considering what you are saying, it sounds hard to solve without breaking the API.

I'm guessing the details of the server, what it's used for, etc, are private. But if they're at all public (or can be made so), knowing what the server software is and so on would be helpful.

Also can you detail your client-side use case more? Is something like "if I get a 400 Bad Request, I'll stop trying, but if I get a connection reset I'll retry?"

You mention an API. Are there other applications talking to this same API? How do they handle the RST case? Also, I think the Expect: 100-continue approach could be implemented on the server without affecting how other clients interact with it.

wouldn't it be possible for the server to respond with the HTTP headers and a FIN and wait for a FIN from the client side, but maybe have a 5s timeout if it doesn't get a FIN and then send an RST?

Yeah I think this should work in theory. Some important caveats. Instead of saying "wait for a FIN from the client" I would say "wait for the end of the request body." Because in most cases the client won't close the TCP connection after it finishes sending a request. It will keep the connection alive for future reuse.

The other important caveat: I think the implicit goals of this server feature are (a) to avoid wasting server bandwidth reading bytes from a client request that will never succeed and (b) to give the end-user a meaningful error as soon as possible. If the uploads are very large (like 30s of upload time, depending on the end-user bandwidth), the server will never read the full request body before the 5s timeout, and you'll wind up with the same RST problem you have today.

If the server folks are amenable, I'd take a look at how S3 does multipart uploads: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

from ureq.

johan-bjareholt commented on August 30, 2024

Okay, so it seems like on these 400 Bad Request cases from the real server we get RST without any FIN before.

I'm guessing the details of the server, what it's used for, etc, are private. But if they're at all public (or can be made so), knowing what the server software is and so on would be helpful.

Unfortunately what we are working on is not public yet.

Also can you detail your client-side use case more? Is something like "if I get a 400 Bad Request, I'll stop trying, but if I get a connection reset I'll retry?"

Not really. The case is pretty much that we are uploading a large HTTP multipart, where the first part is metadata about the following larger part of data. The server returns a 400 Bad Request if the first part is not valid in some way. The reason for having to see the response is mostly to just be able to see how many uploads were successful on the client side and if they are not we are saving stats about what the reasons for the failures were.

You mention an API. Are there other applications talking to this same API? How do they handle the RST case? Also, I think the Expect: 100-continue approach could be implemented on the server without affecting how other clients interact with it.

It's actually the opposite, as a client we are providing an API that a few servers implement. The most commonly used server is the only one that right now validates the first multipart and returns a 400 Bad Request. So this issue is only in regards to one of a few server implementations.

Yeah I think this should work in theory. Some important caveats. Instead of saying "wait for a FIN from the client" I would say "wait for the end of the request body." Because in most cases the client won't close the TCP connection after it finishes sending a request. It will keep the connection alive for future reuse.

The other important caveat: I think the implicit goals of this server feature are (a) to avoid wasting server bandwidth reading bytes from a client request that will never succeed and (b) to give the end-user a meaningful error as soon as possible. If the uploads are very large (like 30s of upload time, depending on the end-user bandwidth), the server will never read the full request body before the 5s timeout, and you'll wind up with the same RST problem you have today.

So I guess that the short-term solution for us would be to require that the server always reads the whole request.

But the long-term, we will have to start using "Expect: 100-continue", especially if the server returns a 401 it shouldn't have to read the whole request.
Regarding the 400, one option would be to not have a multipart at all and instead move the metadata into the HTTP headers. Together with "Expect: 100-continue", that should work but it might be a lot of headers.
Another option would be to use HTTP2 instead and have individual streams for each part.
Will give this some thought. However, I don't believe we should change anything in ureq anymore, we should probably just fix the badly designed API 😄

Regardless, thanks a lot for the help! I've learnt a lot about both TCP and HTTP.

from ureq.

johan-bjareholt commented on August 30, 2024

Sidetrack, but how do I implement "Expect: 100-continue" in ureq?

from ureq.

jsha commented on August 30, 2024

But the long-term, we will have to start using "Expect: 100-continue", especially if the server returns a 401 it shouldn't have to read the whole request.
Regarding the 400, one option would be to not have a multipart at all and instead move the metadata into the HTTP headers.

That's great. I think that's the right approach.

Sidetrack, but how do I implement "Expect: 100-continue" in ureq?

In connect_inner (somewhat misnamed; this does the request), you'll want to insert some logic that happens after sending headers (send_prelude) and before sending the body. Right about here:

ureq/src/unit.rs

Line 281 in d3e9b2d

body::send_body(body, unit.is_chunked, &mut stream)?;

The logic should probably be something like:

If the request headers contained Expect: 100-continue, try to read response headers from the connection
If that errors, error the whole request
If that succeeds but has a status code other than 100, error the request with the appropriate status code
Otherwise, continue with sending the body

from ureq.

Sometimes fails to retrieve a response when the server doesn't read the body about ureq HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent