xemlock / htmlpurifier-html5 Goto Github PK
View Code? Open in Web Editor NEWHTML5 support for HTMLPurifier
Home Page: https://packagist.org/packages/xemlock/htmlpurifier-html5
License: MIT License
HTML5 support for HTMLPurifier
Home Page: https://packagist.org/packages/xemlock/htmlpurifier-html5
License: MIT License
Because of the allowed elements inside figure? Do they not yet include HTML5 elements?
I hope you are doing well. :)
I'm working with HTML Purifier 4.10.0 and HTML5 Plugin version 0.1.8. Maybe I set something up wrong, but I'm getting a Class 'HTMLPurifier_AttrDef_HTML_Bool2' not found
error on line 15 of \library\HTMLPurifier\HTML5Definition.php
. The error is thrown during the purifying process. My calling code looks like this:
$htmlPurifierPath = 'resources/html-purifier/htmlpurifier-4.10.0/library/HTMLPurifier.auto.php';
$html5PluginRoot = 'resources/html-purifier/htmlpurifier-html5-0.1.8/library/HTMLPurifier';
$html5PluginConfig = "$html5PluginRoot/HTML5Config.php";
$html5PluginDefinition = "$html5PluginRoot/HTML5Definition.php";
if (!file_exists($htmlPurifierPath)) {
throw new Exception("HTML Purifier not found.", 500);
}
if (file_exists($html5PluginConfig)) {
$html5 = true;
}
require_once $htmlPurifierPath;
if ($html5) {
require_once $html5PluginConfig;
require_once $html5PluginDefinition;
}
$pdo = connMySql();
$config = "";
if ($html5) {
$config = HTMLPurifier_HTML5Config::createDefault();
} else {
$config = HTMLPurifier_Config::createDefault();
}
$purifier = new HTMLPurifier($config);
$html = $purifier->purify($_POST['html']);
The code specified as the source of the error looks like the following:
// use fixed implementation of Boolean attributes, instead of a buggy
// one provided with 4.6.0
$def->manager->attrTypes->set('Bool', new HTMLPurifier_AttrDef_HTML_Bool2());
I noticed in the comment preceding line 15, you were redefining a buggy boolean implementation defined in HTML Purifier version 4.6, which released in 2013. What would happen if I commented this line out?
Thanks for your time. :)
Installed via composer:
composer require ezyang/htmlpurifier
composer require xemlock/htmlpurifier-html5
I have this code:
Debug("START", 'comment_xss');
Debug($_POST['comment'], 'comment_xss');
$config = \HTMLPurifier_HTML5Config::create([
'HTML.AllowedElements' => ['p', 'figure', 'img', 'picture'],
'HTML.AllowedAttributes' => ['img.srcset', 'img.src', 'img.sizes'],
]);
$purifier = new \HTMLPurifier($config);
$comment = $purifier->purify($_POST['comment']);
Debug($comment, 'comment_xss');
Here are the logs(Debug):
2018-12-17 16:04:10 START;
2018-12-17 16:04:10 <figure><img src="https://bla_bla.jpg" data-image="5236469657"></figure><p>aa</p>;
2018-12-17 16:04:10 <p>aa</p>
As you can see the entire <figure>
is removed even if I've added it in the AllowedElements array.
What am I doing wrong? Can you please help?
Currently <fieldset>
and <label>
elements belong to unsafe part of HTML5_Forms
module. When stripped of form
and for
attributes they are harmless. I think that hiding them behind HTML.Trusted
flag, just as other form elements (and scripts) are, is too drastic a measure.
All safe elements: <fieldset>
, <label>
and <progress>
should be extracted to a separate module (HTML5_SafeForms
?). The module should be guarded by config setting (%HTML.SafeForms
), allowing it to be enabled in untrusted mode.
Also, users expect that <fieldset>
to be enabled by default:
the bump closes </p>
tags before <a>
and that's not valid change
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<p>I successfully installed the <a href="https://github.com/thephpleague/commonmark-ext-autolink">https://github.com/thephpleague/commonmark-ext-autolink</a> extension!</p>'
+'<p>I successfully installed the </p><a href="https://github.com/thephpleague/commonmark-ext-autolink">https://github.com/thephpleague/commonmark-ext-autolink</a><p> extension!</p>'
/home/travis/build/eventum/eventum/tests/MarkdownTest.php:51
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<!--https://github.com/cebe/markdown/issues/157#issuecomment-385439965--><p>here is a <a href="http://github.com">linkref</a>.<br>and <a href="http://google.com">inline</a></p>'
+'<!--https://github.com/cebe/markdown/issues/157#issuecomment-385439965--><p>here is a </p><a href="http://github.com">linkref</a><p>.<br>and </p><a href="http://google.com">inline</a>'
I have a small problem. I have this code, but only half of it works. AutoFormat.linkify does not work for some reason. But in code below everything works perfectly.
$config = HTMLPurifier_HTML5Config::create();
// Those works:
$config->set('Attr.EnableID', true);
$config->set('Attr.ID.HTML5', true);
$config->set('Attr.AllowedFrameTargets', array('_blank','_self','_target','_top'));
$config->set('HTML.TargetBlank', true);
// This does not work
$config->set('AutoFormat.Linkify', true);
But if I update it to this, then everything works.
$config = HTMLPurifier_HTML5Config::create([
'AutoFormat.Linkify' => true
]);
$config->set('Attr.EnableID', true);
$config->set('Attr.ID.HTML5', true);
$config->set('Attr.AllowedFrameTargets', array('_blank','_self','_target','_top'));
$config->set('HTML.TargetBlank', true);
Hi!
I've installed latest version of purifier and your extension via composer
"ezyang/htmlpurifier": "^4.11",
"xemlock/htmlpurifier-html5": "^0.1.11"
The following code:
$text = '<fieldset><legend>Some title</legend><div><p>Some content</p></div></fieldset>';
$config = \HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.Allowed', 'fieldset');
$purifier = new \HTMLPurifier($config);
$purifier->purify($text);
throws an error:
Element 'fieldset' is not supported (for information on implementing this, see the support forums)
I've try also another approach:
$text = '<fieldset><legend>Some title</legend><div><p>Some content</p></div></fieldset>';
$config = \HTMLPurifier_HTML5Config::create([
'HTML.Allowed' => 'fieldset'
]);
$purifier = new \HTMLPurifier($config);
$purifier->purify($text);
but got the same error.
I thought this extension adds support of some HTML5 tags including fieldset for HTMLPurifier. Did i missed something?
Regards, Alex
Related to #37.
Need something similar to HTML.FlashAllowFullScreen
.
Thanks for creating this lib!
https://www.w3schools.com/tags/tag_picture.asp
Current definitions of <blockquote>
and <form>
, inherited from HTMLPurifier, don't accept Sectioning content (introduced in 3b28dad):
https://github.com/ezyang/htmlpurifier/blob/b88fcd1/library/HTMLPurifier/HTMLModule/Text.php#L59
https://github.com/ezyang/htmlpurifier/blob/b88fcd1/library/HTMLPurifier/HTMLModule/Forms.php#L34
Currently the set of allowed <input>
types doesn't include HTML5 values. Also, it would be useful to be able to narrow the set of allowed input types (as requested in ezyang/htmlpurifier#213).
Required for e.g. Twitter or Instagram embeds, see related stack overflow question for reference
Currently <p>
is autoclosed only by address
, blockquote
, center
, dir
, div
, dl
, fieldset
, ol
, p
and ul
.
This full list is here: https://html.spec.whatwg.org/dev/grouping-content.html#the-p-element
I'm looking at switching to this from ezyang/htmlpurifier
due to growing need for HTML5 support.
Several years ago, lukusw
tried to add HTML5 support to htmlpurifier for Drupal but I think the idea dropped priority and was never implemented. ezyang
made some comments on lukusw
's attempt which is probably what slowed the whole thing down: https://www.drupal.org/project/htmlpurifier/issues/1321490#comment-9509073
I've been comparing lukusw
and your code based on ezyang
comments:
With this in mind, I'm hoping you can answer the below questions:
All of the HTML5 content needs to be gated, so it is only available when a user specifies an HTML5 doctype. You could try to put all of the HTML5 definitions in a new HTMLModule.
✔️ looks good
section/nav/aside/article are not Block content but Sectioning content. Flow should be redefined to include Sectioning (similar to how HTMLPurifier/HTMLModule/Text.php does Flow)
❌ Doesn't look to have changed?
header and footer need to exclude header/footer/main descendants; see the 'excludes' attribute; also an example in Text.php (pre)
❌ Doesn't look to have changed?
Ditto with address, use the same technique
❌ Doesn't look to have changed?
hgroup got removed from the HTML5 spec, so doesn't belong here.
✔️ seems fine to keep it
The figure specification doesn't look right; I think you need an asterisk after the Flow. A plain spec 'Flow' is special-cased. I suspect your specifications also exclude plain text.
❔ not sure if you've done this?
figcaption is not Inline, give it false instead.
✔️ seems fine
I'm a little worried about video tag, but the definition you've given is probably OK. I'm not sure if it should be allowed by default. Definitely autoplay should not be allowed. The contents has the same problem as figure.
✔️ allows autoplay, but otherwise seems ok
We should already have the inline elements; are the existing definitions buggy?
✔️ not sure that this is relevant... Existing definitions are gated to XHTML 1.1, so would need gated definition for html5 spec (http://htmlpurifier.org/phorum/read.php?3,8291,8514#msg-8514)
For ins/del datetime, ideally we would apply the HTML5 parse a date or time string and validate it, see http://www.w3.org/TR/html5/infrastructure.html#parse-a-date-or-time-string
✔️ seems fine
iframe allowfullscreen isn't an HTML5 attribute. And it shouldn't be allowed by default anyway, should be gated by Tricky at least.
❌ Not gated by tricky?
With the below code, the
<?php
require_once('vendor/autoload.php');
$html = '<p><strong><audio controls="controls"><source type="audio/mp3" src="myaudiofile.mp3" /></audio></strong></p>';
echo "In: " . $html . PHP_EOL;
$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
echo "out: " . $purifier->purify($html);
Expected output:
<p><strong><audio controls><source type="audio/mp3" src="myaudiofile.mp3" /></audio></strong></p>
(Unless I'm reading it wrong, the spec says <strong>
can contain "Phrasing content" which includes <audio>
) http://w3c.github.io/html/single-page.html#phrasing-content-2
Actual output:
<p><strong></strong></p><audio controls><strong></strong></audio><strong></strong>
I think it's because the <strong>
tag in the base library is set to allow contents of type "Inline", whereas <audio>
is defined as a block in this library.
Will follow up with a PR if I find a fix today
Normal HTMLPurifier lets you edit the config like:
$config = HTMLPurifier_Config::createDefault();
$html_purifier_cache_dir = sys_get_temp_dir() . '/HTMLPurifier/DefinitionCache';
if (!is_dir($html_purifier_cache_dir)) {
mkdir($html_purifier_cache_dir, 0770, TRUE);
}
$config->set('Cache.SerializerPath', $html_purifier_cache_dir);
Your change: https://github.com/xemlock/htmlpurifier-html5/blob/master/library/HTMLPurifier/HTML5Config.php#L32
Makes this throw an exception:
$config = HTMLPurifier_HTML5Config::createDefault();
$html_purifier_cache_dir = sys_get_temp_dir() . '/HTMLPurifier/DefinitionCache';
if (!is_dir($html_purifier_cache_dir)) {
mkdir($html_purifier_cache_dir, 0770, TRUE);
}
$config->set('Cache.SerializerPath', $html_purifier_cache_dir);
Cannot set directive after finalization invoked
Hello I want to pass HTML form through purify process, this should by possible in vanilla htmlpurifier since 4.13.0 by "HTML.Forms" but it doesn't seems to work in this html5 extend. Example:
<?php
require 'vendor/autoload.php';
$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.Trusted', FALSE);
$config->set('HTML.Forms', TRUE);
$purifier = new HTMLPurifier($config);
$dirty_html5 = '<form mnethod="post" action="#"><input></form>';
$clean_html5 = $purifier->purify($dirty_html5);
var_dump(htmlspecialchars($clean_html5));
----------------------
string(0) ""
from composer.lock:
"name": "ezyang/htmlpurifier",
"version": "v4.13.0",
"name": "xemlock/htmlpurifier-html5",
"version": "v0.1.11",
In case of vanilla "ezyang/htmlpurifier:v4.13.0:
<?php
require 'vendor/autoload.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Trusted', FALSE);
$config->set('HTML.Forms', TRUE);
$purifier = new HTMLPurifier($config);
$dirty_html5 = '<form mnethod="post" action="#"><input></form>';
$clean_html5 = $purifier->purify($dirty_html5);
var_dump(htmlspecialchars($clean_html5));
--------------------------
string(61) "<form action="#"><input /></form>"
Not sure what's wrong there?
The a element may be wrapped around entire paragraphs, lists, tables, and so forth, even entire sections, so long as there is no interactive content within (e.g. buttons or other links).
https://html.spec.whatwg.org/dev/text-level-semantics.html#the-a-element
Example:
<a><table></table></a>
becomes
<a></a><table></table>
Related to #37.
Permitted content: Flow content, but with no nested
<address>
element, no heading content (<hgroup>
,<h1>
,<h2>
,<h3>
,<h4>
,<h5>
,<h6>
), no sectioning content (<article>
,<aside>
,<section>
,<nav>
), and no<header>
or<footer>
element.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/address
In HTML5, caption
s may contain any flow content excluding descendent table
elements- https://html.spec.whatwg.org/multipage/tables.html#the-caption-element
Meaning:
<table>
<caption><h3>Monthly savings</h3></caption>
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table>
is perfectly valid HTML, but it results in the entire table being stripped out instead:
<h3>Monthly savings</h3>
Month
Savings
January
$100
Related issue: ezyang/htmlpurifier#131
Hi!
When i try this code:
$config = HTMLPurifier_HTML5Config::create($initial);
$definition = $config->getHTMLDefinition(true);
$definition->addElement("oembed", "Inline", "Inline", "", []);
Throwed exception:
Message: Cannot retrieve raw definition after it has already been setup (try moving this code block earlier in your initialization)
From: .../vendor/ezyang/htmlpurifier/library/HTMLPurifier/Config.php
Line: 540
thx
In HTML5 it is permitted to omit the <tbody>
from a <table>
- https://html.spec.whatwg.org/multipage/tables.html#the-tbody-element
The result of processing the below HTML is an empty string.
<table><thead><tr><th>foo</th></tr></thead></table>
Example valid HTML5 (https://validator.w3.org/nu/#textarea):
<!DOCTYPE html>
<html lang="en">
<head><title>foo</title><meta http-equiv="content-type" content="text/html;charset=utf-8"></head>
<body><table><thead><tr><th>foo</th></tr></thead></table></body>
</html>
Similar to <script>
it would be good to allow <link>
if the href
is whitelisted
I can send a PR if you'll accept it
Would you accept a PR which adds contenteditable attribute support?
https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/contenteditable
Would it be possible to create a cloned repo with a Facades for laravel users?
htmlpurifier has support for deprecated attributes and will convert them to their style
equivalent
<table>
<tr bgcolor="#edeeef">
<td width="3"></td>
<td bgcolor="#f9fafa" width="1"></td>
<td bgcolor="#edeeef" width="1"></td>
<td bgcolor="#dbdee0" width="1"></td>
</tr></table>
When using this lib bgcolor seems to get nuked. I've tried added "HTML.TidyLevel" => "heavy",
but it doesn't seem to do anything. http://htmlpurifier.org/docs/enduser-tidy.html makes reference to the doctype, so I'm wondering whether the HTML5 doctype has something to do with it not working?
Hi.
Thanks for the great library which saves developer lifetime 😄
I'm using this config to override the Purifier configuration in a Symfony project with the exercise/htmlpurifier-bundle
The bundle creates the parent configuration than uses HTMLPurifier_Config::inherit() method to create the child one. The method implementation is taken from the parent class, not HTML5Config as the following.
/**
* Creates a new config object that inherits from a previous one.
* @param HTMLPurifier_Config $config Configuration object to inherit from.
* @return HTMLPurifier_Config object with $config as its parent.
*/
public static function inherit(HTMLPurifier_Config $config)
{
return new HTMLPurifier_Config($config->def, $config->plist);
}
As a result all the child configurations are HTMLPurifier_Config instances instead of HTMLPurifier_HTML5Config which causes errors as they don't support HTML5 tags.
I'm using a workaround inheriting the base class like:
class HTMLPurifier_AltHTML5Config extends \HTMLPurifier_HTML5Config
{
public static function inherit(HTMLPurifier_Config $config)
{
return new static($config->def, $config->plist);
}
}
But the 'inherit()' method should be overridden as well, I suppose.
Thanks again.
Best wishes.
Iframe are removed by default ?
$htmlpurify_config = \HTMLPurifier_HTML5Config::createDefault();
$purifier = new \HTMLPurifier($htmlpurify_config);
content
<b>Inline <del>context No block allowed</del></b>
<video width="400" height="222" controls><source src="video.mp4" type="video/mp4"><source src="video.webm" type="video/webm"><source src="video.ogv" type="video/ogg">
Ici l'alternative à la vidéo : un lien de téléchargement, un message, etc.
</video>
<iframe width='560' height='315' src='//www.youtube.com/embed/RGLI7QBUitE?autoplay=1' frameborder='0' allowfullscreen></iframe>
Hi,
We've noticed that dir="auto" is no longer removed in v0.1.10.
Was that a deliberate decision? It's not mentioned in the release notes...
It's not a problem for us, but may be for others.
How can I retain the attribute? Thanks a lot.
Hi. HTMLPurifier can not be configured with list of allowed elements
Next code produce exception
User Warning: Element 'picture' is not supported (for information on implementing this, see the support forums)
$config = \HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.AllowedElements', ['img', 'picture']);
$config->set('HTML.AllowedAttributes', ['img.srcset', 'img.src', 'img.sizes']);
$htmlPurifier = new \HTMLPurifier($config);
$htmlPurifier->purify($value);
PHP Deprecated: trim(): Passing null to parameter #1 ($string) of type string is
deprecated in
htmlpurifier-html5/library/HTMLPurifier/AttrTransform/HTML5/Input.php on line 242
....
PHP Deprecated: str_replace(): Passing null to parameter #2 ($replace) of type
array|string is deprecated in
htmlpurifier-html5/vendor/ezyang/htmlpurifier/library/HTMLPurifier/ElementDef.php
on line 179
https://github.com/xemlock/htmlpurifier-html5/runs/7854291682?check_suite_focus=true
Related issue: ezyang/htmlpurifier#311
Datetime
attribute type should be used in <ins>
, <del>
and <time>
elements instead of potentially XSS-prone Text
type.
Setting HTML.XHTML to true doesn't affect tags, however, it is written in documentation that
[...] in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.
I tried to pass this option in Custom HTMLPurifier Class:
namespace App\HtmlPurifier;
final class CustomPurifier extends \HTMLPurifier
{
public function __construct($var)
{
$config = \HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.XHTML', true);
parent::__construct($config);
}
}
Service :
services:
# HTMLPurifier
App\HtmlPurifier\CustomPurifier:
tags:
- name: exercise.html_purifier
profile: default
exercise_html_purifier.default: '@App\HtmlPurifier\CustomPurifier'
In Controller, I use the right HTMLPurifier, it's ok. But purify() method do not convert html tags to self-closing tags.
$html = "<img src='test.png'/><hr/><br/>";
$purifier->purify($html); // "<img src="test.png" alt="test.png"><hr><br>"
I was expecting that it will return <img src="test.png" alt="test.png"/><hr/><br/>
Maybe I did not understand what HTML.XHTML is supposed to do
Hi,
I am using a combination of wkhtmltopdf and htmlpurifier-html5 to generate pdf's.
The problem that I am facing at the moment is that the inline style border-radius gets removed after passing the purify() function.
Any thoughts on why it could be doing this?
Thanks.
Currently toggling modules is not granular enough - there is only one switch (HTML.Trusted
) which enables all unsafe modules. And there is no way of enabling Forms
module without also enabling Scripting
. You can do something like the following, but it's not convenient and seems like a dirty override:
$config = new HTMLPurifier_HTML5Config::create([
'HTML.Trusted' => true,
'HTML.ForbiddenElements' => ['script', 'noscript'],
]);
Related to ezyang/htmlpurifier#213.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.