Comments (19)
That's reasonable for cached content, or pages that don't change - but I'm using it for email where my template remains the same, but it's different every time its rendered, and it's likely to be rendered hundreds of thousands of times, so processing the output is very inefficient.
from emogrifier.
Actually, I'm not quite sure whether this behavior is intended (or avoidable). We'd need to look into that to find this out.
As a workaround, you could use a different delimiter, e.g. # or ^.
from emogrifier.
Thanks for the advice, I'll use one of those
from emogrifier.
Given that this is very likely to crop up when using emogrifier with any templating system, it would be useful to be able to escape template markup before processing, then unescape it afterwards. This way you could support any templating system, regardless of delimiters. Common delimiters are {
/}
, [[
/]]
, *|
/ |*
, ~{
/ }~
, so you can see there is a fair amount of variation, and characters such as |
and *
are likely to play havoc with regexes unless handled very carefully. The hard question is how best to escape them, given that they can occur in any context as they do not have to be valid HTML - for example between </tr>
and <tr>
. An HTML-safe approach might be to wrap them in HTML comments, but that may cause problems for template tags that appear inside attribute values, or indeed as attributes.
from emogrifier.
@Synchro, would you be willing to provide a PR for this (including unit tests that cover the new behavior)?
from emogrifier.
Yes. we run into the same problems with Twig
.
Emogrifier converts <a href="{{ var | filter }}">
to <a href="%7B%7B%20var%20%7C%20filter%20%7D%7D">
I believe this is a problem in PHP's DOMDocument
rather than emogrifier itself, though. A fix would be highly appreciated.
from emogrifier.
@jubi Would you be willing to provide a pull request for this?
from emogrifier.
So Emogrifier should not fix attribute values that are not properly encoded as HTML, I think.
from emogrifier.
@oliverklee
One workaround could be something like this (not very clean):
public function saveLinks(){
$xmlDocument = $this->createXmlDocument();
$xpath = new \DOMXPath($xmlDocument);
$this->clearAllCaches();
$nodesWithHrefAttributes = $xpath->query('//*[@href]');
$this->links = array();
$id = 0;
if ($nodesWithHrefAttributes !== FALSE) {
foreach ($nodesWithHrefAttributes as $node) {
$this->links[] = $node->getAttribute('href');
$node->setAttribute('href', 'emogrify_stored_link_'.$id++);
}
}
}
public function restoreLinks($html){
$id = 0;
foreach ($this->links as $link) {
$html = str_replace('emogrify_stored_link_'.$id++, $link, $html);
}
return $html;
}
from emogrifier.
I think a more general way to do it would be to ignore everything within defined delimiters. One way of doing this is to search for everything that matches the delimiters, like \{\{.*\}\}
and replace it with an identifier as in your example. Then run the inliner, which should ignore the identifiers as being plain text, then do a search and replace to put everything back again. The only problem with this approach is if delimiters can appear inside delimited strings, like {{$var|default:"{{"}}
, and the proper way to deal with that is to hook into the template compiler itself.
from emogrifier.
@Synchro
I disagree. We are already using a DOM parser. Falling back to regex does not seem a good approach to me. Plus the problem only occurs in href
s anyway. Doing it your way would match every variable, even those, that would never be affected by the initial problem.
from emogrifier.
Using a regular expression approach to first find all href attribute values and then restore them afterwards would be fine with me.
from emogrifier.
It's not using a regex to parse HTML - it's using a regex to parse template syntax. Does it not also happen in other places like src
and CSS url()
items? I can foresee other situations where template syntax could confuse a DOM parser, for example if you had something like:
{{if $x->type == "heading"}}<th>{{else}}<td>{{/if}}
(ok, so that's a bit contrived!)
from emogrifier.
Actually, I think a better way would be to first render the template and then put it through Emogrifier. This would solve the problem of passing a template that might not be valid (X)HTML.
from emogrifier.
I think as long as we are using a DOM-based approach in Emogrifier, we won't be able to work with templates that are invalid XHTML.
However, for templates that are valid XHTML, the only remaining problem are href and src attributes that automatically get URL-encoded. So saving and restoring those should suffice.
I don't know whether url() items in CSS are affected. @Synchro or @jubi, would you be willing to write a short proof-of-concept unit test for this in a separate PR? If it passes, we can include it as a safety net against regressions.
from emogrifier.
My rough idea for a template syntax workaround is this:
function mogrify($source) {
//Change to match your template delimiters
$ldelim = preg_quote('[[');
$rdelim = preg_quote(']]');
$tags = [];
preg_match_all('/'.$ldelim.'.*?'.$rdelim.'/', $source, $tags);
$repl = [];
foreach ($tags[0] as $tag) {
$repl[$tag] = md5($tag);
}
//Replace tags with hashes
$source = str_replace(array_keys($repl), array_values($repl), $source);
//Do the CSS inlining
$emogrifier = new \Pelago\Emogrifier();
$emogrifier->setHtml($source);
$emogrifier->disableInvisibleNodeRemoval();
$source = $emogrifier->emogrify();
//Put tags back in
$source = str_replace(array_values($repl), array_keys($repl), $source);
return $source;
}
This, in theory at least, should be reasonably robust, and it's simple and fast. It hides all template markup from emogrifier then restores it afterwards. Most common template tag contexts should survive. It could get messy if you have template tags that generate style sheets, in which case you have a chicken and egg problem... This works with simple content examples that emogrifier otherwise makes a mess of, for example:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>s</title>
<style>
p {color:red;}
</style>
</head>
<body>
[[thing]]
<p>hello</p>
<a href="[[blah]]">asdas</a>
</body>
</html>
With the above code, this results in correct-looking output:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>s</title>
</head>
<body>
[[thing]]
<p style="color: red;">hello</p>
<a href="[[blah]]">asdas</a>
</body>
</html>
I will see how well this fares with more complex examples.
from emogrifier.
Hi @Synchro , thanks for the implementation proposal.
Actually, the PR #170 should fix this by not encoding the text nodes anymore.
from emogrifier.
This is still an issue
from emogrifier.
@monstrfolk Could you please open a separate issue for this including the steps to reproduce this, the actual result and the expected result? Thanks!
from emogrifier.
Related Issues (20)
- Re-establish support for PHP 7.2 HOT 3
- Add a branch without an upper PHP version limit HOT 4
- Psalm warning re closure passed to usort HOT 5
- Disable HTML formatting by default HOT 8
- Upgrade to phpunit-data-provider 3.0.0 HOT 1
- Psalm errors on Windows with PHIVE Phar HOT 5
- Return values from preg_replace* are not checked
- Inconsistency in dev tool versions selected for CI build HOT 3
- Breaking change in 7.1 HOT 10
- Add support for Symfony 7
- Drop support for unsupported Symfony versions HOT 2
- Switch from the abandoned `squizlabs/php_codesniffer` to `phpcsstandards/php_codesniffer` HOT 2
- Fix composer-normalize on CI HOT 3
- Add readme caveat re URL escaping
- Avoid generating a `composer.lock`
- Use the development PHP INI on CI
- Add a GitHub Action that comments on pull requests if tests are missing
- !important should be preserved in inline styles HOT 3
- Review braces_position coding style enforcement in PHP CS Fixer HOT 2
- Switch to the PER2 coding standard and reduce the number of custom rules
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emogrifier.