Comments (18)
We already looked at this in #51 - I am curious if there is a use for the raw path other than those discussed there. Specifically the fact that while WSGI claims to give you the raw path, this is not always actually true.
from asgiref.
Oh, sorry for missing #51. I've only skimmed this quickly - you said there:
if we provided a "raw" one then people might rely on it only for it to break later on when a webserver in front did exactly the same thing (webservers: not good at adhering perfectly to the URI RFC)
So because others might break it it gets broken upfront? :)
It basically means that any proxy using ASGI is forced into the group of "webservers: not good at adhering perfectly to the URI RFC" then.
I can see that it is too late to change path
now probably, but something like raw_path
should be added.
I currently have to get it in via setting a custom header from nginx, updating the request.url
from there then. This works for me, but should not be necessary really.
from asgiref.
It's more that this falls into the bucket of "well, it's useful for 1% of people", and I'm not personally convinced the job of a spec is to get all those corner cases - e.g. ASGI also doesn't allow for certain kinds of responses that are valid in HTTP.
Is there an app you're developing that is not low-level HTTP but does need raw path? If I may ask, what specifically about the path do you need?
from asgiref.
I think it is more like "99% do not notice / are not affected".
The app I've noticed this with is a proxy after all.. :)
from asgiref.
And it is used e.g. with https://firebasestorage.googleapis.com/v0/b/noticeable-service.appspot.com/o/users%2F9s6zABHvXDeoEK3Pk0G3Z7gro513%2Fprojects%2FbYyIewUV308AvkMztxix%2Flogo?alt=media&token=9ad915fd-fd45-46fe-a82c-0d581d3113d8 - with ASGI the "users%2F" gets turned into "users/" then.
from asgiref.
From what I remember of the pycon discussion that led to the original decision, basically we can't standardize this because we can't control what web servers return. So we have to standardize to the lowest common denominator, which is slash unquoting. ASGI servers can choose to send a raw url as an extension, but there's no guarantee it will be any more accurate. Ultimately the dev would need to configure their server correctly then choose the raw url.
from asgiref.
Right, that's pretty much what I was getting at, thanks @davidism. I'd be open to defining this as an optional extra, but only if we think a decent number of servers are going to be able to supply it correctly, and with the caveat being "make it correct or don't supply it at all".
from asgiref.
Wouldn't the lowest denominator be to not touch it all, followed by turning it into unicode, and only then apply more processing (i.e. slash unquoting)?
(Just for some example, nginx will not touch it with proxy_pass http://upstream:8080
, but with proxy_pass http://upstream:8080/
(trailing slash), so basically only when the path is adjusted (although "/" is a special case, but it applies to /mounted/
also))
Given that I could understand if servers would e.g. unquote an adjusted root_path
, and then path
there also maybe, but it should be available at the root level in the original form - this would also be useful for mounted apps then, too.
from asgiref.
No, the main problem is that some servers auto-unquote for you - thus the lowest common denominator is actually the unquoted path. We went through a lot of revisions of path
in the ASGI spec to get to this - I tried to keep it "raw", but there just kept being server bugs around things coming in already-unquoted.
from asgiref.
but there just kept being server bugs around things coming in already-unquoted.
For context are we talking about "server bugs" at the Python layer there, or does that appear to include behavior of HTTP intermediaries in the wild? Eg. fronting proxies, or whatevs.
Presumably some client implementations may also end up send unquoted values.
from asgiref.
Answering my own question by reference to #51
The path is subject to re-interpretation by both reverse-proxying webservers and downstream proxies
from asgiref.
No, the main problem is that some servers auto-unquote for you - thus the lowest common denominator is actually the unquoted path
I still do not get why this is an argument: since other servers might unquote it already (or later), it gets unquoted by ASGI servers already?
(This also assumes that it would get unquoted only once by misbehaving upstreams - it still would not reflect what two misbehaving upstreams would do / produce)
I am setting / using a custom server via nginx now myself, but without having control of that you would have to do ugly things like rendering "%252F"
if you want to get back %2F
etc - assuming that only the ASGI server unquotes it once.
from asgiref.
The only thing the servers are unquoting is %2F
, the client doesn't need to do further escaping to support multiple proxies. The problem is that we don't know what the servers are doing. If they're not unquoting, there's no problem, you'll get the full value. If they are unquoting, and most users are not aware of what different servers default behaviors is, then depending on your routing, you'll get 404s or different endpoints.
So it's safer to have everyone design to the more restrictive behavior, because then their routes won't break unexpectedly moving from development to production. Again, if you control the whole deploy and can ensure that the URLs will be sent as you expect, you can use a middleware to swap to the raw value, assuming the ASGI server you use captures it.
from asgiref.
The only thing the servers are unquoting is %2F
No, every percent-encode is decoded, including e.g. %20
for spaces - that's what urllib.parse.unquote
is doing, and at least uvicorn and hypercorn use this.
With regard to different routes during development / in production: I think it can be assumed that development typically means less servers being involved, and often only the ASGI server itself.
Unfortunately the unquoting happens there already now - therefore "surprises" start there already.
If this would not happen, the worst case would be that a client requests "foo%2Fbar", which a misbehaving server would turn into "foo/bar" already, and then an endpoint "foo/bar" would be called, although not requested. But that happens now already because of the unquoting through the ASGI server.
Without any unquoting a request for "foo%2Fbar" would not be routed to "foo/bar" (a 404), but that is OK, isn't it?
if you control the whole deploy and can ensure that the URLs will be sent as you expect, you can use a middleware to swap to the raw value, assuming the ASGI server you use captures it.
The ASGI servers do not capture them, because it's not in the spec.. ;)
I can see that it might be too late or unwanted to change the behavior for "path", but there are certainly use cases for having a raw_path
, so that middlewares could swap to the raw value etc.
from asgiref.
I am agreed on having an optional raw_path object in the scope which is a bytestring of what is received on the wire. Adding an optional extra field will also not need a version bump.
If someone wants to write up the spec change, I'm happy to review it; otherwise, I will draft one when I have the Django async stuff more in line.
from asgiref.
I'd love to see an optional (for servers that can support it) raw_path
bytes key added to the spec.
For Datasette I've resolved myself to using an alternative scheme but the argument above about supporting proxies really resonates with me - ASGI feels like a great fit for writing simple proxies.
from asgiref.
I had a go at adding this to the spec: https://github.com/django/asgiref/pull/92/files
from asgiref.
I think we can close this now.
from asgiref.
Related Issues (20)
- Add Type Annotations to `asgiref` Module HOT 1
- Add Type Annotations to `asgiref` Module Functions HOT 2
- Parameters to Generic[...] must all be type variables when use with sys.setprofile HOT 3
- Can asig http extensions be used in WebSocket Denial Response? HOT 2
- Using asyncio.shield hangs/deadlocks when used with sync middleware. HOT 10
- What am I allowed to do with the send/receive callables? HOT 1
- Wording: should “extra coroutines” actually be “extra tasks”? HOT 4
- async_to_sync does not use the correct thread when used as decorator HOT 3
- Contradiction between general and HTTP-specific sections on handling of send on a closed connection HOT 2
- `PATH_INFO` set incorrectly by `WsgiToAsgiInstance.build_environ()` HOT 1
- Compatibility with gevent monkey-patching? HOT 1
- Failure in exception handling in current_thread_executor.py _WorkItem.run() HOT 1
- Why `http.response.start` exists? HOT 1
- sdist is missing `tox.ini` HOT 1
- Task was destroyed but it is pending! HOT 1
- Regression in v3.8.0 HOT 9
- Issue in sync.py's SyncToAsync class as new ThreadPoolExecutor executors with daemon threads getting created for requests. HOT 4
- Spec question: `websocket.disconnect` doesn't support the `reason` field HOT 10
- Suggestion: set `NotRequired` on all fields that aren't required HOT 8
- DatabaseError with new version of asgiref HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asgiref.