Giter Club home page Giter Club logo

Comments (18)

andrewgodwin avatar andrewgodwin commented on July 19, 2024

We already looked at this in #51 - I am curious if there is a use for the raw path other than those discussed there. Specifically the fact that while WSGI claims to give you the raw path, this is not always actually true.

from asgiref.

blueyed avatar blueyed commented on July 19, 2024

Oh, sorry for missing #51. I've only skimmed this quickly - you said there:

if we provided a "raw" one then people might rely on it only for it to break later on when a webserver in front did exactly the same thing (webservers: not good at adhering perfectly to the URI RFC)

So because others might break it it gets broken upfront? :)
It basically means that any proxy using ASGI is forced into the group of "webservers: not good at adhering perfectly to the URI RFC" then.

I can see that it is too late to change path now probably, but something like raw_path should be added.

I currently have to get it in via setting a custom header from nginx, updating the request.url from there then. This works for me, but should not be necessary really.

from asgiref.

andrewgodwin avatar andrewgodwin commented on July 19, 2024

It's more that this falls into the bucket of "well, it's useful for 1% of people", and I'm not personally convinced the job of a spec is to get all those corner cases - e.g. ASGI also doesn't allow for certain kinds of responses that are valid in HTTP.

Is there an app you're developing that is not low-level HTTP but does need raw path? If I may ask, what specifically about the path do you need?

from asgiref.

blueyed avatar blueyed commented on July 19, 2024

I think it is more like "99% do not notice / are not affected".
The app I've noticed this with is a proxy after all.. :)

from asgiref.

blueyed avatar blueyed commented on July 19, 2024

And it is used e.g. with https://firebasestorage.googleapis.com/v0/b/noticeable-service.appspot.com/o/users%2F9s6zABHvXDeoEK3Pk0G3Z7gro513%2Fprojects%2FbYyIewUV308AvkMztxix%2Flogo?alt=media&token=9ad915fd-fd45-46fe-a82c-0d581d3113d8 - with ASGI the "users%2F" gets turned into "users/" then.

from asgiref.

davidism avatar davidism commented on July 19, 2024

From what I remember of the pycon discussion that led to the original decision, basically we can't standardize this because we can't control what web servers return. So we have to standardize to the lowest common denominator, which is slash unquoting. ASGI servers can choose to send a raw url as an extension, but there's no guarantee it will be any more accurate. Ultimately the dev would need to configure their server correctly then choose the raw url.

from asgiref.

andrewgodwin avatar andrewgodwin commented on July 19, 2024

Right, that's pretty much what I was getting at, thanks @davidism. I'd be open to defining this as an optional extra, but only if we think a decent number of servers are going to be able to supply it correctly, and with the caveat being "make it correct or don't supply it at all".

from asgiref.

blueyed avatar blueyed commented on July 19, 2024

Wouldn't the lowest denominator be to not touch it all, followed by turning it into unicode, and only then apply more processing (i.e. slash unquoting)?

(Just for some example, nginx will not touch it with proxy_pass http://upstream:8080, but with proxy_pass http://upstream:8080/ (trailing slash), so basically only when the path is adjusted (although "/" is a special case, but it applies to /mounted/ also))
Given that I could understand if servers would e.g. unquote an adjusted root_path, and then path there also maybe, but it should be available at the root level in the original form - this would also be useful for mounted apps then, too.

from asgiref.

andrewgodwin avatar andrewgodwin commented on July 19, 2024

No, the main problem is that some servers auto-unquote for you - thus the lowest common denominator is actually the unquoted path. We went through a lot of revisions of path in the ASGI spec to get to this - I tried to keep it "raw", but there just kept being server bugs around things coming in already-unquoted.

from asgiref.

tomchristie avatar tomchristie commented on July 19, 2024

but there just kept being server bugs around things coming in already-unquoted.

For context are we talking about "server bugs" at the Python layer there, or does that appear to include behavior of HTTP intermediaries in the wild? Eg. fronting proxies, or whatevs.

Presumably some client implementations may also end up send unquoted values.

from asgiref.

tomchristie avatar tomchristie commented on July 19, 2024

Answering my own question by reference to #51

The path is subject to re-interpretation by both reverse-proxying webservers and downstream proxies

from asgiref.

blueyed avatar blueyed commented on July 19, 2024

No, the main problem is that some servers auto-unquote for you - thus the lowest common denominator is actually the unquoted path

I still do not get why this is an argument: since other servers might unquote it already (or later), it gets unquoted by ASGI servers already?

(This also assumes that it would get unquoted only once by misbehaving upstreams - it still would not reflect what two misbehaving upstreams would do / produce)

I am setting / using a custom server via nginx now myself, but without having control of that you would have to do ugly things like rendering "%252F" if you want to get back %2F etc - assuming that only the ASGI server unquotes it once.

from asgiref.

davidism avatar davidism commented on July 19, 2024

The only thing the servers are unquoting is %2F, the client doesn't need to do further escaping to support multiple proxies. The problem is that we don't know what the servers are doing. If they're not unquoting, there's no problem, you'll get the full value. If they are unquoting, and most users are not aware of what different servers default behaviors is, then depending on your routing, you'll get 404s or different endpoints.

So it's safer to have everyone design to the more restrictive behavior, because then their routes won't break unexpectedly moving from development to production. Again, if you control the whole deploy and can ensure that the URLs will be sent as you expect, you can use a middleware to swap to the raw value, assuming the ASGI server you use captures it.

from asgiref.

blueyed avatar blueyed commented on July 19, 2024

The only thing the servers are unquoting is %2F

No, every percent-encode is decoded, including e.g. %20 for spaces - that's what urllib.parse.unquote is doing, and at least uvicorn and hypercorn use this.

With regard to different routes during development / in production: I think it can be assumed that development typically means less servers being involved, and often only the ASGI server itself.
Unfortunately the unquoting happens there already now - therefore "surprises" start there already.

If this would not happen, the worst case would be that a client requests "foo%2Fbar", which a misbehaving server would turn into "foo/bar" already, and then an endpoint "foo/bar" would be called, although not requested. But that happens now already because of the unquoting through the ASGI server.
Without any unquoting a request for "foo%2Fbar" would not be routed to "foo/bar" (a 404), but that is OK, isn't it?

if you control the whole deploy and can ensure that the URLs will be sent as you expect, you can use a middleware to swap to the raw value, assuming the ASGI server you use captures it.

The ASGI servers do not capture them, because it's not in the spec.. ;)

I can see that it might be too late or unwanted to change the behavior for "path", but there are certainly use cases for having a raw_path, so that middlewares could swap to the raw value etc.

from asgiref.

andrewgodwin avatar andrewgodwin commented on July 19, 2024

I am agreed on having an optional raw_path object in the scope which is a bytestring of what is received on the wire. Adding an optional extra field will also not need a version bump.

If someone wants to write up the spec change, I'm happy to review it; otherwise, I will draft one when I have the Django async stuff more in line.

from asgiref.

simonw avatar simonw commented on July 19, 2024

I'd love to see an optional (for servers that can support it) raw_path bytes key added to the spec.

For Datasette I've resolved myself to using an alternative scheme but the argument above about supporting proxies really resonates with me - ASGI feels like a great fit for writing simple proxies.

from asgiref.

simonw avatar simonw commented on July 19, 2024

I had a go at adding this to the spec: https://github.com/django/asgiref/pull/92/files

from asgiref.

simonw avatar simonw commented on July 19, 2024

I think we can close this now.

from asgiref.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.