Comments (12)
Dropping consecutive slashes in paths is a normalization procedure.
from modern-uri.
I see. But it makes this lib unusable in contexts where data: URIs might exist. If that's intended, this can be closed.
from modern-uri.
The problem here is that the slashes end up being interpreted as delimiters in a path. The uri
value doesn't look like a valid URI to me.
from modern-uri.
As far as I understand RFCs, it is valid though. See the example at RFC 2397 which also contains consecutive slashes.
from modern-uri.
I'd really like to use the library, but since data:
URIs are pretty ubiquitous in modern web, this library is unusable for manipulating websites without support for it. Maybe @mrkkrp can confirm that supporting RFC 2397 is out of scope for this project?
from modern-uri.
Would you like to open a PR to add support for data:
scheme?
from modern-uri.
Actually, I did a little more digging. Let me tell you, that I become to appreciate this library striving to follow the general URI handling rules laid out in RFC 3986. I think it would be wrong to handle scheme specific cases here, be it data:
or something else.
Problem
However, the normalisation of removing //
(or more technically: empty path segments), while o.K. for http(s):
et. al., is not warranted for generic URIs:
Section 1.2.3 states about hierarchical paths:
For some URI schemes, the visible hierarchy is limited to the scheme itself:
everything after the scheme component delimiter (":") is considered
opaque to URI processing. Other URI schemes make the hierarchy
explicit and visible to generic parsing algorithms.
Section 3.3 defines path segments as zero or more characters:
segment = *pchar
Section 6.1 states about normalisation (emphasis mine):
Because URIs exist to identify resources, presumably they should be
considered equivalent when they identify the same resource. However,
this definition of equivalence is not of much practical use, as there
is no way for an implementation to compare two resources unless it
has full knowledge or control of them. For this reason,
determination of equivalence or difference of URIs is based on string
comparison, perhaps augmented by reference to additional rules
provided by URI scheme definitions.
and the central guiding principle for normalisation:
Therefore, comparison methods are designed to minimize false negatives while strictly avoiding false positives.
Removing //
in data:
URIs though, leads to false positives and is thusly non compliant.
Solutions
I see two main solutions, going forward
- Allow empty path segments in the general case.
This conforms to the standard and should make the roundtrip of the OP possible. I'd see this as the correct(tm) way to solve this issue. The problem though is, that it might break dependent code, that assumes the normalization as currently performed. We could provide a normalizePath :: URI -> URI
function to make the fix as easy as possible.
- Allow empty path in the general case and perform normalisation for
http(s):
andftp:
This would be a more compatible solution. The problem is, that it's unprincipled: so for these protocols we provide normalization on top of the RFC and for other (hierarchical) protocols we don't?
- Allow empty path in the general case and perform normalisation for all known protocols that allow it.
This would be the most compatible solution. The problem though is that it requires a lot of research. Also shouldn't we then other protocol specific normalisation as well, to improve the Eq instance as much as possible? This would need even more research.
- Do nothing
This is the worst option IMO because it breaks the compliance of this, otherwise pretty nice, URI library.
Offer
I'd be happy to take a stab at either solution 1 or 2 and provide a pull request. Solution 3 would require to much research to commit on doing it alone. Option 4 does not involve any work.
from modern-uri.
One more thing: IMHO the roundtrip property
import Text.URI
prop_roundtrip :: Text -> Bool
prop_roundtrip uri = render <$> mkURI uri == (pure uri)
should really hold for all possible values of uri
. That is either parsing fails or the URL can be reproduced exactly as it was. This is incompatible with implicit normalisation, e.g. all solutions but number 1.
from modern-uri.
I think the best way forward is starting with 2 and gradually as users request normalization for more schemes we will move forward to 3. If I understand correctly the slashes in your original example are just part of the binary data, they are not delimiters of a path?
from modern-uri.
Yes indeed it's just base64 encoded binary. Which is totally fine with the generic parsing algorithm, as long as there is no normalisation happening by incorrectly assuming a hierarchy. If I understand RFC 3986 correctly, the whole part after the data:
scheme is considered as path in this case, though not an hierarchical one.
Just to be clear: Solution (2) and (3) are out of the RFC 3986 spec, insofar as it recommends only basic string comparison for equivalence checks in the generic case.
Additionally having some normalisation for some protocols, seems to be a little confusing. But if that's the way to go forward I'll try to implement a PR for solution (2)
from modern-uri.
Yes, please go ahead with (2) 👍
from modern-uri.
Ok, I'm on it.
from modern-uri.
Related Issues (20)
- How to concatenate parts of URI? HOT 7
- Use newtypes instead of `RText` HOT 2
- File URIs are not parsed HOT 2
- A URI with a Query such as '?&height=500&width=500' cannot be parsed HOT 2
- Expose parsers of certain classes of URIs HOT 5
- Port numbers validation HOT 5
- Add an `updateQueryParams` function HOT 3
- Doesn't build with GHC-9.0
- Compatibility with bytestring 0.11 HOT 2
- Trailing slashes for URLs with no path HOT 2
- Allow `[` and `]` in the query component HOT 6
- Hashable instance HOT 1
- relativeTo and quasi-quotation
- GHC 9.2/TemplateHaskell 2.18 HOT 1
- Bug parsing numeric subdomains HOT 1
- appending paths to urls HOT 3
- Colons in path are escaped HOT 2
- `mkURI` fails to correctly parse uri when hostname contains a "_" HOT 2
- Empty path with a trailing slash HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from modern-uri.