Comments (12)
I don't consider linkifying something that looks like a domain a bug: the bug here is that path/to/file.sh
shouldn't be considered to "look like a domain". Back into the URL regex to fix this.
from bleach.
One thing to consider is that top level xpi files might exist without any prefixed path, something like "somescript.sh" -- this would be linked by linkify, which is not desired for amo-validator output.
from bleach.
.sh
is a valid TLD, this behavior is correct. Turning it off would also stop linkifying example.com
in the text. If you don't want this behavior, why are you running linkify
at all?
from bleach.
Right, losing example.com would be fine. We just use it for URLs that are already prefixed with http(s). We want those links to go through the outgoing server so using bleach for this seemed like a good idea.
from bleach.
The default behavior can't change, because I don't think this is expected behavior. I'd nominate aggressive
or eager
for the name of the kwarg to linkify()
.
from bleach.
An aside, these false positives are why django's urlize filter only linkifies urls prefixed with a schema or 'www.', or domain-only urls ending in .com, .org and .net.
from bleach.
@SmileyChris: That difference is actually one of the reasons this method exists in the first place, which is why I'm not keen on changing it.
from bleach.
Fair enough then! :)
from bleach.
Could linkify be more configurable? Sometimes example.com should be linked, sometimes not. Now the only possiblity to configure the linkify is to copy-paste the whole linkify method. I think it'll be good to do one of the following:
- acccept callbacks for common tasks (e.g. post-process a node, find links, etc);
- make utility methods module-level functions so one can reuse them and build custom linkify;
- make linkify class with overridable methods + call for backward compatibility
My use case: linkify IDN domains (e.g. сайт.рф) + add target='_blank'
to links. I can also imagine somebody wanting to add css classes to external links, to stop linking localhost url or to use custom domain list (maybe even different domain lists for different parts of site).
Would you accept patch with such changes or this is all unreasonable or you want to implement something yourselves?
from bleach.
The cleanest of those options feels like the class, but I really struggle to see how any of them could be completely backwards compatible (except maybe callbacks for "common tasks" but I don't want a 40-kwarg method). I'm open to any of them as long as import bleach; bleach.linkify(text)
still works, none of the current defaults change, it's not a massive performance hit (it's slow enough as-is) and it's clean.
Any of these changes, I think, necessitates moving the linkify
code into a new module. Probably clean
, too, for consistency.
@kmike: I'm certainly open to a big rewrite but it's not my highest priority right now, especially not given how much bigger that scope is than this issue.
from bleach.
Another consequence of this eager linking is that linkify links the word "settings.py" on my crate.io page. Is there any way to tell it to suppress the link?
from bleach.
.py
is the ccTLD for Paraguay. It's legit.
That said, #56 would make linkify customizable enough that anyone could refuse to linkify whatever TLDs they want. The solution will be to write callbacks that test it, e.g.:
def no_python(text, attrs):
if attrs['href'].endswith('.py'):
return None
Or more broadly:
def only_with_protocol(text, attrs):
if not attrs['href'].startswith('http://'):
return None
The latter seems like a logical thing to include, maybe not as a default, but as an option.
Closing this in favor of #56.
from bleach.
Related Issues (20)
- Solo quiero decir que Bleach vuelve en octubre ❤️💪😎🍷💕
- bug: hardcoded dev dependency versions breaks mypy usage HOT 5
- fork html5lib-python or find alternative HOT 1
- bug: bleach.clean is not handling & symbol very well HOT 1
- Possible to only allow target="_blank" but no other values? HOT 3
- tox utility environments are constrainted to only run on Linux HOT 1
- bleach is deprecated; statement on project going forward (2023-01-23) HOT 11
- RFE: please provide update for latest `tinycss2` 1.2.1 HOT 2
- RFE: lease drop use `six` module HOT 1
- bug: linkify with entities inside anchor strings are incorrectly escaped HOT 1
- Open angle bracket '<' with few words after cleaned up if there's no closing bracket HOT 1
- bug: using OpenSUSE and Fedora packages which change the Bleach code, parse_shim tests fail with Python 3.10.12 HOT 6
- bug: Cleaner removes href valid tag "tg://user?id=124124124" HOT 1
- bug: drop support for Python 3.7 which is EOL
- feature: add support for Python 3.12
- Style attributes are getting stripped off HOT 13
- Open bracket '<' still cleaned up without closing bracket
- RFE: move away from deprecated `html5lib` HOT 1
- feature: please provide update for latest `tinycss2==1.3`
- bug: Open left angle bracket followed immediately by an alpha character causes next tag to be sanitized HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bleach.