Giter Club home page Giter Club logo

Comments (37)

awwright avatar awwright commented on June 2, 2024 5

Possibly a tangent, but if I were to pick the naming, I would use req.target to mean "what the client provided literally" and req.uri to mean "Resolved (full, absolute) URI"

And if I were modifying the Node.js API, I would deprecate req.url or alias it to req.target instead.

from web-server-frameworks.

ljharb avatar ljharb commented on June 2, 2024 4

It makes sense to me to offer both (obviously under different names) - ie, req.url as the string, and req.getURL() as the lazily-computed-but-memoized URL object.

from web-server-frameworks.

mcollina avatar mcollina commented on June 2, 2024 4

I think what'd need to make this work is: URL.from({ host, path, protocol }). In this way we can skip the non-needed parts of the algorithm that are really expensive.

Essentially pushing forward the use of the native URL object requires some investigation on how to make its creation more performant.

from web-server-frameworks.

mcollina avatar mcollina commented on June 2, 2024 3

I'm still generically -1 to the idea of using

new URL(`${protocol}://${host}${path}`) // simplified

in a hot path by default.
It creates unnecessary computation by generating a string that is then parsed. It also creates a few objects which are slow to create as well (UrlSearchParams).

A similar discussion can be made for the Headers object of fetch and the relative standards.

None of this was defined with throughput in mind. If we want Node.js to shine, it needs to be faster, not slower.

from web-server-frameworks.

Ethan-Arrowood avatar Ethan-Arrowood commented on June 2, 2024 2

could we do something like req.url returns the faster solution and then req.url() would return the new URL() version? This way users have to opt-in to the performance decrease if they want it?

Additionally, could we look into improving host and protocol parsing? Or is has that already been attempted?

from web-server-frameworks.

ronag avatar ronag commented on June 2, 2024 1

https://github.com/nodejs/next-10/blob/master/VALUES_AND_PRIORITIZAION.md might also be worth considering, i.e. "Performance" vs "Web API compatibility".

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024 1

I hadn't clicked your link, but yeah this looks good. I will dig into what it is doing, thanks for the link!

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024 1

Overall, I think providing a way to retrieve req.url as a URL would be a great addition to Node core because the URL object is the future, url.parse() is a legacy of the past, before there was a standard.

Using standard JS URL objects allows code to be shared between the browser, Node.js, etc.

One of the big foot-guns today is parsing url query string params. Consider this code:

const { parse } = require('url');
const handler = (req, res) => {
  let { pathname = '/', query = {} } = parse(req.url || '', true);
  const { user = '' } = query;
  console.log(user.toLowerCase());
}

This works fine with https://example.com/api?user=foo but throws when two parameters with the same name are detected https://example.com/api?user=foo&user=bar because user suddenly becomes an array instead of a string.

Consider this similar code using URL:

const handler = (req, res) => {
  const url = new URL(req.url, 'https://example.com');
  const user = url.searchParams.get('user') || '';
  console.log(user.toLowerCase());
}

This bug doesn't happen with URL because it separates getAll() vs get(). Now we won't crash in production.

My workaround for the time being looks like this:

export function getURL(req) {
  const proto = req.headers['x-forwarded-proto'] || 'https';
  const host = req.headers['x-forwarded-host'] || req.headers.host || 'example.com';
  return new URL(req.url || '/', `${proto}://${host}`);
}

from web-server-frameworks.

ghermeto avatar ghermeto commented on June 2, 2024 1

Modifying req.url to return a URL object instead of a string would be a breaking change and it doesn't seem worth the churn to the ecosystem.

I don't think this is an issue if we are discussing a whole new API

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024 1

And what is wrong with using:

Nothing wrong, just an extra step and technically it would need to be some combination of this and the following (I have not tested all of this so it might not be completely accurate, and there might be other edge cases I am missing because URL does not parse relative urls):

const addr = server.address()
const proto = getProto(req) /* as I mentioned, would need the above linked handling for proto */
const url = new URL(req.url, req.headers.host || `${proto}//${addr.address}`);

I think the more this conversation goes on we need more clarity on what the use case are. I think that most cases will be that only the pathname is used and that might lean more toward just telling users to parse it themselves if necessary.

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024 1

I see. I've been bouncing around different issues and all seem to be dead ends.

from web-server-frameworks.

mcollina avatar mcollina commented on June 2, 2024 1

@mcollina I'm not sure that solves the problem of parsing req.url into some object. Would you be able to URL.from(req.url) or even URL.from(req)?

I think URL.from(req) would be fantastic. We probably can't shipt that API, but possibly http.getURL(req), which would be ok. Also req.getURL() might work.

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024 1

would you like to send a PR for this?

Looks like someone already did: watson/original-url#11

from web-server-frameworks.

mcollina avatar mcollina commented on June 2, 2024

I'm highly -1 on using new URL() for req.url. It's extremely slow and inefficient due to the host and protocol parsing.

from web-server-frameworks.

ronag avatar ronag commented on June 2, 2024

I usually use, https://www.npmjs.com/package/request-target. Maybe worth looking into?

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

This is why I was strongly questioning adding that (nodejs/node#35323) in yesterdays meeting.

The perf is a big concern I was not aware of (do we have benchmarks on this, or details on why?), but is also a good reason to push back on the web api compat technical value issue.

from web-server-frameworks.

ronag avatar ronag commented on June 2, 2024

do we have benchmarks on this, or details on why and what parts are slow?

From https://www.npmjs.com/package/request-target

$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
request-target x 552,748 ops/sec ±0.54% (344809 samples)

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

Well I very quickly ended on this line, and I am fairly confident we will be able to find a better (and even better perf) way than a big regex.

But this leads me to my next question: do we really want to deviate from the existing URL apis?

Seems like this would add "yet another url" to node core, right? Maybe we could get fast parsing but also expose an api which looks like a URL instance, but this gets us into a similar situation of "its a stream but not". Anyway, just thinking out loud on this.

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024

It's extremely slow and inefficient due to the host and protocol parsing.

Is the slow performance of new URL() inherent to the way its spec'ed or just the Node.js implementation?

If its the latter, then this would be a good excuse to re-evaluate the implementation to squeeze out better performance.

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

It makes sense to me to offer both

I think before we make any decisions on this, we need to know if we can help fix the perf issues of URL. If we cannot do that, then I think we should consider alternate api's to expose both. If not, I think one api is best.

My workaround for the time being looks like this:

@styfle There are security concerns here, but if we "do it right" I think it is a reasonable approach (if you know those headers can only be set by trusted proxies, you should be fine). The added support for the meta headers would also change this a bit, but not in a bad way.

Also, we would default them to the values computed from server.address() I think.

from web-server-frameworks.

stevenvachon avatar stevenvachon commented on June 2, 2024

It makes sense to me to offer both (obviously under different names) - ie, req.url as the string, and req.getURL() as the lazily-computed-but-memoized URL object.

Why both when there's URL::href?

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024

Modifying req.url to return a URL object instead of a string would be a breaking change and it doesn't seem worth the churn to the ecosystem.

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024

Maybe I misunderstood the intent of this discussion 🤔

I thought req in the original post was referring to the IncomingMessage class, the first parameter to createServer listener.

If you modify the url property on IncomingMessage, then it would be a breaking change, right?

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

Maybe I misunderstood the intent of this discussion

This discussion is related to this: #55

Because the entire api will be new, breaking is fine.

I would use req.target to mean "what the client provided literally" and req.uri to mean "Resolved (full, absolute) URI"

While I see the goal of this naming, I am strongly against bikeshedding names (at least at this point). I also do not think this would be a meaningful change, nor one which would make it easier/better for any of our constituencies. Once we nail down the particulars of what we need to offer on this api, we can discuss again if name changes will be something to pursue, but lets keep on focus for now.

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

To recap our discussion on the meeting today:

What would folks think of providing an api at server creation time to pass your url implementation. By default we would create a URL instance, but if you passed a function to the createServer call your function would be called instead. This would be similar to the existing api where you can pass IncomingMessage and ServerResponse. So you could pass URL, for example. Thoughts?

from web-server-frameworks.

stevenvachon avatar stevenvachon commented on June 2, 2024

@mcollina "better" is better than "faster" too.

from web-server-frameworks.

ghermeto avatar ghermeto commented on June 2, 2024

Agree with @mcollina. Anything that affects performance should be opt-in. In this particular case, if I don't need to parse the URL in most of my requests, why would I pay the price on all of them?

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

The plan which @ronag and I discussed in this gist involves a multi-layer api. As a compromise, if we kept the "core" api with the string req.url, would it be acceptable to have the higher level api use the URL constructor? Since the higher level api is intended to usable directly (as in not via a framework) it would be preferable to me so we don't introduce "yet another" url implementation.

from web-server-frameworks.

ghermeto avatar ghermeto commented on June 2, 2024

Can you elaborate on:

Since the higher level api is intended to usable directly

and

we don't introduce "yet another" url implementation

from web-server-frameworks.

wesleytodd avatar wesleytodd commented on June 2, 2024

Since the higher level api is intended to usable directly

I think we should probably agree on which APIs have which intended constituencies. @ronag outlined a bit of what the goals of the different undici apis are, and the layering which he does enables really low level usage, but also has "direct end user api's" which are high level. The goal was to take a similar approach (see the table in the comment linked above). What we don't yet have is a clear statement on the target usage of the layers.

we don't introduce "yet another" url implementation

One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.

from web-server-frameworks.

ghermeto avatar ghermeto commented on June 2, 2024

I think we should probably agree on which APIs have which intended constituencies. @ronag outlined a bit of what the goals of the different undici apis are, and the layering which he does enables really low level usage, but also has "direct end user api's" which are high level. The goal was to take a similar approach (see the table in the comment linked above). What we don't yet have is a clear statement on the target usage of the layers.

And what is wrong with using:

const url = new URL(req.url);

Seems simple enough if a user wants to parse the URL and can also be provided by the frameworks if they must...

One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.

agree with you and I believe this gives an opportunity to let users (or frameworks) chose how they want to parse a URL. As you suggested, we can provide a function to createServer that tells the server how to parse the URL. If not provided, just returns a string by default:

http.createServer({
  allowHTTP1: true,
  parseUrl(stringUrl) {
    return new URL(stringUrl);
  }
})

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024

One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.

Agreed, that will likely lead to more confusion.

I think the more this conversation goes on we need more clarity on what the use case are. I think that most cases will be that only the pathname is used and that might lean more toward just telling users to parse it themselves if necessary.

@wesleytodd In addition to pathname, I think query string is quite common too.

I think what'd need to make this work is: URL.from({ host, path, protocol }). In this way we can skip the non-needed parts of the algorithm that are really expensive.

@mcollina I'm not sure that solves the problem of parsing req.url into some object. Would you be able to URL.from(req.url) or even URL.from(req)?

from web-server-frameworks.

sheplu avatar sheplu commented on June 2, 2024

thanks for your message @styfle ! But this group isn't really active anymore (last two years I would say)

from web-server-frameworks.

styfle avatar styfle commented on June 2, 2024

possibly http.getURL(req), which would be ok. Also req.getURL() might work.

Either option sounds great!

We could use original-url as a starting point (or even vendor it into node core)

from web-server-frameworks.

mcollina avatar mcollina commented on June 2, 2024

original-url uses the unsafe require('url').parse method. I don't think anybody should use that implementation as there are multiple security problems with it. The best approach would be to pass via new URL() or analog.

from web-server-frameworks.

mcollina avatar mcollina commented on June 2, 2024

would you like to send a PR for this?

from web-server-frameworks.

awwright avatar awwright commented on June 2, 2024

How is url.parse unsafe? If the incoming request conforms to the generic URI syntax (which obviously should be verified, as you always do for untrusted user input, right?), there's no reason there should be problems.

from web-server-frameworks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.