Comments (37)
Possibly a tangent, but if I were to pick the naming, I would use req.target
to mean "what the client provided literally" and req.uri
to mean "Resolved (full, absolute) URI"
And if I were modifying the Node.js API, I would deprecate req.url
or alias it to req.target
instead.
from web-server-frameworks.
It makes sense to me to offer both (obviously under different names) - ie, req.url
as the string, and req.getURL()
as the lazily-computed-but-memoized URL object.
from web-server-frameworks.
I think what'd need to make this work is: URL.from({ host, path, protocol })
. In this way we can skip the non-needed parts of the algorithm that are really expensive.
Essentially pushing forward the use of the native URL
object requires some investigation on how to make its creation more performant.
from web-server-frameworks.
I'm still generically -1 to the idea of using
new URL(`${protocol}://${host}${path}`) // simplified
in a hot path by default.
It creates unnecessary computation by generating a string that is then parsed. It also creates a few objects which are slow to create as well (UrlSearchParams).
A similar discussion can be made for the Headers
object of fetch and the relative standards.
None of this was defined with throughput in mind. If we want Node.js to shine, it needs to be faster, not slower.
from web-server-frameworks.
could we do something like req.url
returns the faster solution and then req.url()
would return the new URL()
version? This way users have to opt-in to the performance decrease if they want it?
Additionally, could we look into improving host and protocol parsing
? Or is has that already been attempted?
from web-server-frameworks.
https://github.com/nodejs/next-10/blob/master/VALUES_AND_PRIORITIZAION.md might also be worth considering, i.e. "Performance" vs "Web API compatibility".
from web-server-frameworks.
I hadn't clicked your link, but yeah this looks good. I will dig into what it is doing, thanks for the link!
from web-server-frameworks.
Overall, I think providing a way to retrieve req.url
as a URL would be a great addition to Node core because the URL object is the future, url.parse()
is a legacy of the past, before there was a standard.
Using standard JS URL
objects allows code to be shared between the browser, Node.js, etc.
One of the big foot-guns today is parsing url query string params. Consider this code:
const { parse } = require('url');
const handler = (req, res) => {
let { pathname = '/', query = {} } = parse(req.url || '', true);
const { user = '' } = query;
console.log(user.toLowerCase());
}
This works fine with https://example.com/api?user=foo
but throws when two parameters with the same name are detected https://example.com/api?user=foo&user=bar
because user
suddenly becomes an array instead of a string.
Consider this similar code using URL:
const handler = (req, res) => {
const url = new URL(req.url, 'https://example.com');
const user = url.searchParams.get('user') || '';
console.log(user.toLowerCase());
}
This bug doesn't happen with URL because it separates getAll()
vs get()
. Now we won't crash in production.
My workaround for the time being looks like this:
export function getURL(req) {
const proto = req.headers['x-forwarded-proto'] || 'https';
const host = req.headers['x-forwarded-host'] || req.headers.host || 'example.com';
return new URL(req.url || '/', `${proto}://${host}`);
}
from web-server-frameworks.
Modifying req.url to return a URL object instead of a string would be a breaking change and it doesn't seem worth the churn to the ecosystem.
I don't think this is an issue if we are discussing a whole new API
from web-server-frameworks.
And what is wrong with using:
Nothing wrong, just an extra step and technically it would need to be some combination of this and the following (I have not tested all of this so it might not be completely accurate, and there might be other edge cases I am missing because URL
does not parse relative urls):
const addr = server.address()
const proto = getProto(req) /* as I mentioned, would need the above linked handling for proto */
const url = new URL(req.url, req.headers.host || `${proto}//${addr.address}`);
I think the more this conversation goes on we need more clarity on what the use case are. I think that most cases will be that only the pathname is used and that might lean more toward just telling users to parse it themselves if necessary.
from web-server-frameworks.
I see. I've been bouncing around different issues and all seem to be dead ends.
from web-server-frameworks.
@mcollina I'm not sure that solves the problem of parsing req.url into some object. Would you be able to URL.from(req.url) or even URL.from(req)?
I think URL.from(req)
would be fantastic. We probably can't shipt that API, but possibly http.getURL(req)
, which would be ok. Also req.getURL()
might work.
from web-server-frameworks.
would you like to send a PR for this?
Looks like someone already did: watson/original-url#11
from web-server-frameworks.
I'm highly -1 on using new URL()
for req.url
. It's extremely slow and inefficient due to the host and protocol parsing.
from web-server-frameworks.
I usually use, https://www.npmjs.com/package/request-target. Maybe worth looking into?
from web-server-frameworks.
This is why I was strongly questioning adding that (nodejs/node#35323) in yesterdays meeting.
The perf is a big concern I was not aware of (do we have benchmarks on this, or details on why?), but is also a good reason to push back on the web api compat technical value issue.
from web-server-frameworks.
do we have benchmarks on this, or details on why and what parts are slow?
From https://www.npmjs.com/package/request-target
$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
request-target x 552,748 ops/sec ±0.54% (344809 samples)
from web-server-frameworks.
Well I very quickly ended on this line, and I am fairly confident we will be able to find a better (and even better perf) way than a big regex.
But this leads me to my next question: do we really want to deviate from the existing URL apis?
Seems like this would add "yet another url" to node core, right? Maybe we could get fast parsing but also expose an api which looks like a URL
instance, but this gets us into a similar situation of "its a stream but not". Anyway, just thinking out loud on this.
from web-server-frameworks.
It's extremely slow and inefficient due to the host and protocol parsing.
Is the slow performance of new URL()
inherent to the way its spec'ed or just the Node.js implementation?
If its the latter, then this would be a good excuse to re-evaluate the implementation to squeeze out better performance.
from web-server-frameworks.
It makes sense to me to offer both
I think before we make any decisions on this, we need to know if we can help fix the perf issues of URL
. If we cannot do that, then I think we should consider alternate api's to expose both. If not, I think one api is best.
My workaround for the time being looks like this:
@styfle There are security concerns here, but if we "do it right" I think it is a reasonable approach (if you know those headers can only be set by trusted proxies, you should be fine). The added support for the meta headers would also change this a bit, but not in a bad way.
Also, we would default them to the values computed from server.address()
I think.
from web-server-frameworks.
It makes sense to me to offer both (obviously under different names) - ie,
req.url
as the string, andreq.getURL()
as the lazily-computed-but-memoized URL object.
Why both when there's URL::href
?
from web-server-frameworks.
Modifying req.url
to return a URL object instead of a string would be a breaking change and it doesn't seem worth the churn to the ecosystem.
from web-server-frameworks.
Maybe I misunderstood the intent of this discussion 🤔
I thought req
in the original post was referring to the IncomingMessage class, the first parameter to createServer listener.
If you modify the url
property on IncomingMessage, then it would be a breaking change, right?
from web-server-frameworks.
Maybe I misunderstood the intent of this discussion
This discussion is related to this: #55
Because the entire api will be new, breaking is fine.
I would use req.target to mean "what the client provided literally" and req.uri to mean "Resolved (full, absolute) URI"
While I see the goal of this naming, I am strongly against bikeshedding names (at least at this point). I also do not think this would be a meaningful change, nor one which would make it easier/better for any of our constituencies. Once we nail down the particulars of what we need to offer on this api, we can discuss again if name changes will be something to pursue, but lets keep on focus for now.
from web-server-frameworks.
To recap our discussion on the meeting today:
What would folks think of providing an api at server creation time to pass your url implementation. By default we would create a URL
instance, but if you passed a function to the createServer
call your function would be called instead. This would be similar to the existing api where you can pass IncomingMessage
and ServerResponse
. So you could pass URL
, for example. Thoughts?
from web-server-frameworks.
@mcollina "better" is better than "faster" too.
from web-server-frameworks.
Agree with @mcollina. Anything that affects performance should be opt-in. In this particular case, if I don't need to parse the URL in most of my requests, why would I pay the price on all of them?
from web-server-frameworks.
The plan which @ronag and I discussed in this gist involves a multi-layer api. As a compromise, if we kept the "core" api with the string req.url
, would it be acceptable to have the higher level api use the URL
constructor? Since the higher level api is intended to usable directly (as in not via a framework) it would be preferable to me so we don't introduce "yet another" url implementation.
from web-server-frameworks.
Can you elaborate on:
Since the higher level api is intended to usable directly
and
we don't introduce "yet another" url implementation
from web-server-frameworks.
Since the higher level api is intended to usable directly
I think we should probably agree on which APIs have which intended constituencies. @ronag outlined a bit of what the goals of the different undici
apis are, and the layering which he does enables really low level usage, but also has "direct end user api's" which are high level. The goal was to take a similar approach (see the table in the comment linked above). What we don't yet have is a clear statement on the target usage of the layers.
we don't introduce "yet another" url implementation
One of the proposals above was to utilize a specialty parser like request-target
for the url. My concern with this is we already have 2 implementations, this would add a third.
from web-server-frameworks.
I think we should probably agree on which APIs have which intended constituencies. @ronag outlined a bit of what the goals of the different undici apis are, and the layering which he does enables really low level usage, but also has "direct end user api's" which are high level. The goal was to take a similar approach (see the table in the comment linked above). What we don't yet have is a clear statement on the target usage of the layers.
And what is wrong with using:
const url = new URL(req.url);
Seems simple enough if a user wants to parse the URL and can also be provided by the frameworks if they must...
One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.
agree with you and I believe this gives an opportunity to let users (or frameworks) chose how they want to parse a URL. As you suggested, we can provide a function to createServer
that tells the server how to parse the URL. If not provided, just returns a string by default:
http.createServer({
allowHTTP1: true,
parseUrl(stringUrl) {
return new URL(stringUrl);
}
})
from web-server-frameworks.
One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.
Agreed, that will likely lead to more confusion.
I think the more this conversation goes on we need more clarity on what the use case are. I think that most cases will be that only the pathname is used and that might lean more toward just telling users to parse it themselves if necessary.
@wesleytodd In addition to pathname, I think query string is quite common too.
I think what'd need to make this work is:
URL.from({ host, path, protocol })
. In this way we can skip the non-needed parts of the algorithm that are really expensive.
@mcollina I'm not sure that solves the problem of parsing req.url
into some object. Would you be able to URL.from(req.url)
or even URL.from(req)
?
from web-server-frameworks.
thanks for your message @styfle ! But this group isn't really active anymore (last two years I would say)
from web-server-frameworks.
possibly
http.getURL(req)
, which would be ok. Alsoreq.getURL()
might work.
Either option sounds great!
We could use original-url
as a starting point (or even vendor it into node core)
from web-server-frameworks.
original-url
uses the unsafe require('url').parse
method. I don't think anybody should use that implementation as there are multiple security problems with it. The best approach would be to pass via new URL()
or analog.
from web-server-frameworks.
would you like to send a PR for this?
from web-server-frameworks.
How is url.parse
unsafe? If the incoming request conforms to the generic URI syntax (which obviously should be verified, as you always do for untrusted user input, right?), there's no reason there should be problems.
from web-server-frameworks.
Related Issues (20)
- Node.js Foundation Web Server Team Meeting 2020-10-15 HOT 6
- Lets get coding! HOT 4
- Node.js Foundation Web Server Team Meeting 2020-10-29 HOT 6
- Node.js Foundation Web Server Team Meeting 2020-11-12 HOT 8
- Node.js Foundation Web Server Team Meeting 2020-11-26 HOT 2
- Node.js Foundation Web Server Team Meeting 2020-12-10 HOT 8
- Node.js Foundation Web Server Team Meeting 2020-12-24 HOT 7
- issues with meeting calendar / schedule for WG meetings HOT 8
- Node.js Foundation Web Server Team Meeting 2021-01-07 HOT 2
- Node.js Foundation Web Server Team Meeting 2021-01-21 HOT 5
- Node.js Foundation Web Server Team Meeting 2021-02-04
- Node.js Foundation Web Server Team Meeting 2021-02-18
- Node.js Foundation Web Server Team Meeting 2021-03-04 HOT 5
- Node.js Foundation Web Server Team Meeting 2021-03-18
- Node.js Foundation Web Server Team Meeting 2021-04-01 HOT 1
- Node.js Foundation Web Server Team Meeting 2021-04-15
- Node.js Foundation Web Server Team Meeting 2021-04-29 HOT 2
- Revive the WG HOT 25
- Cookies HOT 19
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from web-server-frameworks.