Giter Club home page Giter Club logo

Comments (9)

algesten avatar algesten commented on August 30, 2024
image

Wireshark of this.

Basically ureq is still in "I'm sending request body mode", when we get the RST. At this point we never check whether there maybe have been some response data sent before the socket hangup.

We bail using ? here: https://github.com/algesten/ureq/blob/main/src/unit.rs#L281 – which means we don't try to read any data before handing back to the user.

Shouldn't be too hard to fix. Rather than bailing with ?, retain the error, attempt to read incoming response (I assume the socket is still good for that), then return Ok(Response) instead.

This could maybe be a breaking change for some users, but it's probably a correct change.

from ureq.

algesten avatar algesten commented on August 30, 2024

Oh, and thanks @johan-bjareholt, and welcome to ureq! :)

from ureq.

jsha avatar jsha commented on August 30, 2024

from ureq.

johan-bjareholt avatar johan-bjareholt commented on August 30, 2024

Oh, and thanks @johan-bjareholt, and welcome to ureq! :)

Thank you, and thanks for the detailed answers!

Hm, this seems a little bit more complicated than I expected.

So considering what @jsha said, I assume that the following might not work

Shouldn't be too hard to fix. Rather than bailing with ?, retain the error, attempt to read incoming response (I assume the socket is still good for that), then return Ok(Response) instead.

So if the server responded with a RST, it is not guaranteed that the response will be able to be read as TCP closes that socket for reading? If it closes down in some other manner than an instant RST though, it might?

If you have control of the server, I have some ideas for different architectures that might achieve what you want, but I will avoid typing them out unless that's of use to you. :-D

It's kind of mixed. I don't own the server that we are connecting to but I am in close contact with those who do, so it might be an option. But we'd prefer to avoid changes if possible, as it would be breaking. But considering what you are saying, it sounds hard to solve without breaking the API.

Tomorrow I will try to get a network trace of the real use-case with the server I'm interacting with, to see how it behaves.

Of course, for your use case that may not be possible. Most HTTP 1.1 clients (like ureq) don't try to read anything from the server until they're done sending the body. So if the server replied with an HTTP header and a FIN, the client would still be trying to push through the massive upload. If the server stopped reading bytes off the connection, the client would presumably hang once the congestion window filled up with no replies. The server would also be unhappy since it would have a bunch of half-open connections hanging around.

So RST makes a lot of sense- that's how a TCP peer can say "I'm done! I
don't care what you have to say, I don't want to think about this
connection any more." It's an unclean shutdown. The server gets to
recover its resources for the connection, and the client very quickly gets
an error sending bytes on the connection so it stops sending more.

But in theory, wouldn't it be possible for the server to respond with the HTTP headers and a FIN and wait for a FIN from the client side, but maybe have a 5s timeout if it doesn't get a FIN and then send an RST? If the client does not support reading the response before sending the request it would get a RST, otherwise it would be handled gracefully if the client responds with a FIN within the timeout?

from ureq.

algesten avatar algesten commented on August 30, 2024

So considering what @jsha said, I assume that the following might not work

Correct. We can implement it, but depending on how prevalent the RST is (which sounds like it's the way most servers do it), ureq users would generally not be able to use it. If you find your server is doing FIN instead of RST, we could of course consider it since I don't believe anything would break.

from ureq.

jsha avatar jsha commented on August 30, 2024

So if the server responded with a RST, it is not guaranteed that the response will be able to be read as TCP closes that socket for reading? If it closes down in some other manner than an instant RST though, it might?

That's right. Specifically, when the client's kernel / TCP stack receives the RST it will discard any bytes in the receive buffer, whether the application has read those bytes or not.

I don't own the server that we are connecting to but I am in close contact with those who do, so it might be an option. But we'd prefer to avoid changes if possible, as it would be breaking. But considering what you are saying, it sounds hard to solve without breaking the API.

I'm guessing the details of the server, what it's used for, etc, are private. But if they're at all public (or can be made so), knowing what the server software is and so on would be helpful.

Also can you detail your client-side use case more? Is something like "if I get a 400 Bad Request, I'll stop trying, but if I get a connection reset I'll retry?"

You mention an API. Are there other applications talking to this same API? How do they handle the RST case? Also, I think the Expect: 100-continue approach could be implemented on the server without affecting how other clients interact with it.

wouldn't it be possible for the server to respond with the HTTP headers and a FIN and wait for a FIN from the client side, but maybe have a 5s timeout if it doesn't get a FIN and then send an RST?

Yeah I think this should work in theory. Some important caveats. Instead of saying "wait for a FIN from the client" I would say "wait for the end of the request body." Because in most cases the client won't close the TCP connection after it finishes sending a request. It will keep the connection alive for future reuse.

The other important caveat: I think the implicit goals of this server feature are (a) to avoid wasting server bandwidth reading bytes from a client request that will never succeed and (b) to give the end-user a meaningful error as soon as possible. If the uploads are very large (like 30s of upload time, depending on the end-user bandwidth), the server will never read the full request body before the 5s timeout, and you'll wind up with the same RST problem you have today.

If the server folks are amenable, I'd take a look at how S3 does multipart uploads: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

from ureq.

johan-bjareholt avatar johan-bjareholt commented on August 30, 2024

Okay, so it seems like on these 400 Bad Request cases from the real server we get RST without any FIN before.

I'm guessing the details of the server, what it's used for, etc, are private. But if they're at all public (or can be made so), knowing what the server software is and so on would be helpful.

Unfortunately what we are working on is not public yet.

Also can you detail your client-side use case more? Is something like "if I get a 400 Bad Request, I'll stop trying, but if I get a connection reset I'll retry?"

Not really. The case is pretty much that we are uploading a large HTTP multipart, where the first part is metadata about the following larger part of data. The server returns a 400 Bad Request if the first part is not valid in some way. The reason for having to see the response is mostly to just be able to see how many uploads were successful on the client side and if they are not we are saving stats about what the reasons for the failures were.

You mention an API. Are there other applications talking to this same API? How do they handle the RST case? Also, I think the Expect: 100-continue approach could be implemented on the server without affecting how other clients interact with it.

It's actually the opposite, as a client we are providing an API that a few servers implement. The most commonly used server is the only one that right now validates the first multipart and returns a 400 Bad Request. So this issue is only in regards to one of a few server implementations.

Yeah I think this should work in theory. Some important caveats. Instead of saying "wait for a FIN from the client" I would say "wait for the end of the request body." Because in most cases the client won't close the TCP connection after it finishes sending a request. It will keep the connection alive for future reuse.

The other important caveat: I think the implicit goals of this server feature are (a) to avoid wasting server bandwidth reading bytes from a client request that will never succeed and (b) to give the end-user a meaningful error as soon as possible. If the uploads are very large (like 30s of upload time, depending on the end-user bandwidth), the server will never read the full request body before the 5s timeout, and you'll wind up with the same RST problem you have today.

So I guess that the short-term solution for us would be to require that the server always reads the whole request.

But the long-term, we will have to start using "Expect: 100-continue", especially if the server returns a 401 it shouldn't have to read the whole request.
Regarding the 400, one option would be to not have a multipart at all and instead move the metadata into the HTTP headers. Together with "Expect: 100-continue", that should work but it might be a lot of headers.
Another option would be to use HTTP2 instead and have individual streams for each part.
Will give this some thought. However, I don't believe we should change anything in ureq anymore, we should probably just fix the badly designed API 😄

Regardless, thanks a lot for the help! I've learnt a lot about both TCP and HTTP.

from ureq.

johan-bjareholt avatar johan-bjareholt commented on August 30, 2024

Sidetrack, but how do I implement "Expect: 100-continue" in ureq?

from ureq.

jsha avatar jsha commented on August 30, 2024

But the long-term, we will have to start using "Expect: 100-continue", especially if the server returns a 401 it shouldn't have to read the whole request.
Regarding the 400, one option would be to not have a multipart at all and instead move the metadata into the HTTP headers.

That's great. I think that's the right approach.

Sidetrack, but how do I implement "Expect: 100-continue" in ureq?

In connect_inner (somewhat misnamed; this does the request), you'll want to insert some logic that happens after sending headers (send_prelude) and before sending the body. Right about here:

body::send_body(body, unit.is_chunked, &mut stream)?;

The logic should probably be something like:

  • If the request headers contained Expect: 100-continue, try to read response headers from the connection
  • If that errors, error the whole request
  • If that succeeds but has a status code other than 100, error the request with the appropriate status code
  • Otherwise, continue with sending the body

from ureq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.