cure53 / dompurify Goto Github PK

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:

Home Page: https://cure53.de/purify

License: Other

HTML 3.36% JavaScript 96.50% Shell 0.14%

xss sanitizer dom security javascript dompurify prevent-xss-attacks mathml html svg

dompurify's People

Contributors

Stargazers

Watchers

Forkers

robbert hyderali koto suto codylindley djfarrelly jobin-sun shaohua miyeah versal nemesisqp devd oztc filedescriptor tdr130 expdb2015 vsakv conradirwin gujiman gibson042 voku kwangkim jimmyhchan suckafish iarnaud anivs tdeekens neilj odst27 noscripter mail-apps luiseduardohdbackup styu 0xsobky var23rav f00bar10 mariangibala doc2k nebulou5 jampy wirehead myvyang happy-ferret protonmail hagenmo coca1ne su-action strugee hackvertor liangklfang iznyn kubaz clks001 firmfirm calmacroi cybernetics edg2s 75f2cc yoavm p10ns11y polywickstudio signedbytes sharavanaprasad dineshkotwani krodyx buu700 murraycem esalman-sfdc n4yk itcms tsdk hakanerp christopherchestnut petebacondarwin phpplay iwitsky jeffml1 olusinae mayfield fox-rck lastmjs ydaniv crypt0b0y shaodahong fb11 h0k5 wangmd richvred jindasong crypto-breaker fokion zelenkoff codetay hbcbh1999 vicular rohitnehra mozfreddyb talk2noob hantk8 brianwitte

dompurify's Issues

Minor clobbering issue of NamedNodeMap

Commit 11d778e and 6f3a06c changed the clobber check of elm.attributes with instanceof. However, this check enables attackers to clobber window.NamedNodeMap in FF22-34 and terminate DOMPurify. This can be done by changing an existing iframe in a page to a crafted page before loading DOMPurify to pollute window.

PoC
Open http://jsbin.com/sozogi/2/ in FF22-34. An error should be thrown.

I think we should revert the changes to the older one.

Bug bounty

Hi Team,

I thought you'd be interested to know we have now rolled out DOMPurify into production for all FastMail users as an extra layer of XSS protection (on top of our server-side filter and strict content security policy). As such, we would like to extend our bug bounty to cover security bugs in the DOMPurify library, even when they would not be exploitable on FastMail due to the other two protection mechanisms.

To qualify for the bounty, we would expect the reporter to responsibly disclose the issue to any of the library maintainers (we would be grateful for a head's up, but we monitor the security mailing list too) or to ourselves (in which case we would obviously notify you!). Other than that, the same rules and rewards as the standard FastMail bug bounty would be applicable (with rewards ranging from $100 to $5000 depending on severity and what browsers are affected), and of course any award would still be ultimately at our discretion.

If you're happy with this, please feel free to advertise it. I hope that this can encourage further scrutiny of the library, ultimately making it safer for everyone.

Neil.

License issue

We want to use lib at LinkedIn, but legal department have concern about MPL 2.0 License that is slightly copyleft, please consider to use one of other open licenses e.g.:

Apache 2.0 License
BSD "2-clause" or "Simplified" License
BSD "3-clause" License
MIT License

Thanks.

Provide an API to allow hooking into _sanitizeElements and _sanitizeAttributes

It appears that there's a lot of special cases for developers to cover, like removal of elements but having the text content survive (see #15). I don't want to cover those issues in DOMPurify as I want to keep the code-base small and functionality limited for security reasons.

I am wondering if it makes sense to provide an API for the two core methods _sanitizeElements and _sanitizeAttributes so people can hook their own code based on what these functions return to influence the final markup DOMPurify creates.

Thoughts very much welcome!

Graceful degradation in case of legacy browser?

We might want to think about what happens, when the browser running a purified website is not supporting what DOMPurify needs.

Should we just return the original string? Or are we in a position to modify something?

Should we protect against DOM clobbering by default?

Right now, DOMPurify protects itself against DOM Clobbering. But the resulting markup itself still has clobbering potential when being used by the sanitizing website.

Should be by default prevent that? My thought would be: If an element's name is at the same time a property in document, the element should be removed. What do you think? We would prevent XSS and DOM clobbering.

//cc @fhemberger @mathiasbynens

Test DOMPurify with a Browser grid

DOMPurify is designed to run in browsers. In ALL browsers. I would like to make some minor changes, but would like to be sure not to break anything. So I would like to have tests in a LOT of browsers in different versions and OSs. Wouldn't it be nice to have tests with a lot of browser/os combinations? There are some vendors, who offer that for free for open source projects like BrowserStack, SauceLabs or BrowserSwarm (I did not evaluate any of them for features or even fitness).
I think, this test support is more important, than my suggesting changes.
Are there any plans to do things like that? If not, I would like to do some research in this area.

Update npm release

We're switching to browserify for our build process and I saw that your current master branch exports a common.js module, which is great.

It seems though that the npm package at 0.4.2 does not seem to be up to date, so an npm publish would be nice. Thanks!

src/href blacklisting?

My friend was glancing at the code for purifier and he was worried by https://github.com/cure53/DOMPurify/blob/master/purify.js#L347

where it seems DOMPurify blacklists schemes instead of whitelisting http/https/mailto: URIs.

I am curious---Whats the reasoning behind this? I usually imagined all sanitizers use a whitelisting mechanism.

Bower version mismatch

Adding "dompurify": "~0.3" to bower.json leads to:

bower dompurify#~0.3 ENORESTARGET No tag found that was able to satisfy ~0.3

"dompurify": "v0.3" works

For semantic versioning it seems Bower requires git tags to be named x.x(.x)

Plugin API / Hooks for custom filters and sanitation

Several users have recently requested specific filtering and sanitation features that would require core changes or rethinking of the configuration API. This includes custom nesting rules, different href/src filtering and many other things.

I believe it would be best to create a hooking / plugin architecture instead of adding additional complexity to the core API. Now, the question is: how to best build that? How to do it so it makes sense, works well with the existing config flags and still provides enough flexibility for developers to add their own functionality?

My initial thought was: Hooks. We can implement hooks where the developer receives all DOMPurify config flags, the string and its current representation object, can work on all of them and pass them back after being done. We can put hooks into several places:

Before any filtering has been done
Right after the document has been created
Right after the clobbering checks happened
Right after the tag filtering has completed
Right after the attribute filtering has completed
Right before the serialization happens

This should allow us to do things like:

Specific handling of charset directives
Filtering to prevent deep nesting
Handling of CSS
Handling of HTTP leaks such as src, href, formaction etc.
Handling of special unwanted characters

Does that general strategy make sense or am I living in the 90s with that idea? :) Very much open for any form of input, please let me know.

Spammy CC of all who requested specific features or a in core development - sorry for that but your input is needed :) @fhemberger @devd @arbixy @eoftedal @joelabair

Src of <img> whose value is base64 encoded, is removed

The src of images with base64 encoded values starts as data:image and it matches the regex in this line, and this attribute is not added to the dom, causing an image without src to be appended. The comment says Safely handle custom data attributes, but not sure why it is done.

Sanitize MIME message

I'm trying to sanitze a MIME message:

MIME-Version: 1.0
Sender: [email protected]
Received: by 10.60.155.104 with HTTP; Mon, 5 May 2014 06:08:54 -0700 (PDT)
Date: Mon, 5 May 2014 15:08:54 +0200
Delivered-To: [email protected]
Message-ID: <CAMQ7_A76jtf9Q1bx=G_K9miTdy0C2LPo7xuECNmGpDzmQPcZ4A@mail.gmail.com>
Subject: Hello
From: =?UTF-8?Q?xxx?= <[email protected]>
To: =?UTF-8?Q?xxx?= <[email protected]>
Content-Type: multipart/alternative; boundary=089e0115e7e4537fcd04f8a6d552

--089e0115e7e4537fcd04f8a6d552
Content-Type: text/plain; charset=UTF-8

Test

--089e0115e7e4537fcd04f8a6d552
Content-Type: text/html; charset=UTF-8

<div dir="ltr">Test</div>

--089e0115e7e4537fcd04f8a6d552--

and get with default 0.3 the following output:

MIME-Version: 1.0
Sender: [email protected]
Received: by 10.60.155.104 with HTTP; Mon, 5 May 2014 06:08:54 -0700 (PDT)
Date: Mon, 5 May 2014 15:08:54 +0200
Delivered-To: [email protected]
Message-ID:

Is it intended that everything after a <[email protected]> is truncated? And if yes, is there a setting to prevent that?

Thanks.

jQuery Plugin

I am not even sure if this is a great idea but these few lines can make DOMPurify also work as a jquery extension; adding a safehtml function.

(function($){
    "use strict";
    $.fn.safehtml = function(str) {
        var clean = DOMPurify.sanitize(str, { 'RETURN_DOM': true }); 
        this.empty();
        this.append(clean.childNodes);
        return this;
    })(jquery);

Minor changes can make it work with require/amd definitions too.

One concern is that there are so many other vectors for XSS. While there is value to making safehtml available (and encourage people to use it instead of safeHTML), do we write "safe" versions of everything?

https://code.google.com/p/domxsswiki/wiki/jQuery

So, I guess this is mainly an issue to start off a discussion about this.

More powerful and simple Hook API

After having worked with the hooks for a while, I think we could make them more powerful.

For example, I don't want to repeat the whole process of determining whether an attribute is an attribute and the node is a node.

I was thus thinking about passing a meta object with every hook call, that passes along some relevant info about the element/attribute to inspect. Any thoughts?

Uncaught TypeError: Cannot read property 'nodeName' of undefined

In Google Chrome this test:

<img src=data:image/jpeg,ab798ewqxbaudbuoibeqbla>

generates this error:

Uncaught TypeError: Cannot read property 'nodeName' of undefined purify.js:340

So, every test with a src attribute and a data: value at beginning is not well "accepted" by Chrome.

Flag KEEP_CONTENT has no effect on io.js

See #29 (comment): insertAdjacentHTML is missing in jsdom.

Host DOMPurify on NPM

It would be great if DOMPurify was hosted on NPM.

This would make it easier to re-use DOMPurify in the Node.js context rather than in the browser and also make it easier to manage it as a dependency.

This is not an issue or bug, just a "nice to have" feature.

Warnings mode?

Often in big web applications, one might have a good cut point to always insert purify calls. But, you can't be sure it won't break stuff. It would really help with rollout if we could pass a flag to dompurify where it would go through the tree and just get a list of nodes it would have removed if it was in enforcement mode. This can help track down nasty corner cases (which might even be vulns) and then once the warnings go down to zero, people can turn on enforcement mode.

This sort of stuff really helps in large scale deployment. CSP has this going for it.

What do you think? The simpler idea of "call purify and compare return value to input" doesn't work because the html tree serializations might have minor differences.

Benchmark in major browsers, especially mobile browsers.

Would be interesting to see how fast it actually run in browser, imagine a google reader like app, it would be quite interesting to see if it's possible relying some of the xss filter work to client-side.

Nested tags sanitation

How about nested tags sanitation support?

Let's say that only first level <p> tags are allowed and for example <p>Hello <p>world</p></p> should be sanitized into <p>Hello world</p>

or only second level resulting in Hello <p>world</p>

Best Regards

Make testsuite work on IE9

So far the test-suite does not work at all on IE9, dying with an error stating that innerHTML of undefined cannot be set. QUnit issue? (@fhemberger)

Leave text content?

Can I remove an element node and not remove the text node inside of it?

Tests for config flags

What's the best way to be able to test the config flags DOMPurify offers?

Currently, we basically test the default config. Ideally, we have coverage over all existing config flags. Even more ideally with coverage for different flags that might interact and interfere.

Should use template instead of document.implementation.createHTMLDocument to avoid the web-component registry of the owner doc being reused

_initDocument uses document.implementation.createHTMLDocument() to create a new data document. Because of http://w3c.github.io/webcomponents/spec/custom/#creating-and-passing-registries this is potentially not as safe as it used to be. Specifically, the newly created document will use the same web-components registry as its execution context, which means that if a dirty HTML string is passed in, a nefarious string could potentially leverage any registered web-components to nefarious ends.

Probably the best course of action is to do something like:

var doc = document.createElement('template').content.ownerDocument;

instead of the current document.implementation.createHTMLDocument(). (Note that since it's its own document, I don't think the template node would need to be linked into the DOM, but if something seems wrong, that could be it. At least in Gecko, things like CSS parsing won't usefully happen until a DOM node is parented into the tree.)

Greater context: We got burnt by this in a non-security context when doing cloneNode, fix part of a bug at https://bugzilla.mozilla.org/show_bug.cgi?id=1116087#c12.

Cannot remove hook

Thanks for the useful lib. Unfortunately it is not possible to remove a hook because they are held as global state here:

https://github.com/cure53/DOMPurify/blob/master/purify.js#L16

E.g. if a want to sanitize one html email while removing 'src' attributes from image tags and then for the second html mail, I want to allow images:

// remove sources of image tags to prevent privacy leak during resource fetching
if (removeImages) {
    window.DOMPurify.addHook('afterSanitizeAttributes', function(node) {
        if ('src' in node) {
            node.removeAttribute('src');
        }
    });
}

// sanitize HTML content: https://github.com/cure53/DOMPurify
html = window.DOMPurify.sanitize(html);

If 'removeImages' is false the second time the hook is still called.

Should expose default values of ALLOWED_TAGS and ALLOWED_ATTR in DOMPurify object

Should expose default values of ALLOWED_TAGS and ALLOWED_ATTR in DOMPurify object so that adding new entries to the existing allowed values won't be difficult.

Registering hook leads to infinite loop in IE 10 / IE 11

Either I am doing something very obvious wrong or registering a hook (I tested afterSanitizeElements and beforeSanitizeElements) and then calling the sanitize-method leads to an infinite loop in IE 10 and IE 11, which hangs up the browser. If I remove the hook, everything works fine. My JS Code just does this:

DOMPurify.addHook('afterSanitizeElements', function(node) {
  if (node.nodeType && node.nodeType === document.TEXT_NODE) {
    node.textContent = 'foo';
  }

  return node;
});
var editor = document.querySelector('.editor');
document.querySelector('.button').addEventListener('click', function () {
  var dirty = '<div><p>This is a beatufiul text</p><p>This is too</p></div>';
  editor.innerHTML = DOMPurify.sanitize(dirty);
});

I would expect it to keep the DOM-structure intact and just replace the text-nodes with "foo".

I created a pen which is enough to cause the error in IE 10 and IE 11. Other browsers (I tested Edge, Chrome, Firefox and Safari) look fine to me: http://codepen.io/anon/pen/YywGXz

It should have function to remove '<' and '>' instead of return '<' and '>'

when user pass some malicious data it should have functionality/configuration to discard '<' and '>' character

Pre-test / Performance optimization

Would it become a problem if DOMPurify checks each string to sanitize for the occurrence of the character "<" first? Do you see a risk if a string not containing any "<" is returned as such without checks and modifications?

I think we might save a bit on performance but want to be sure there's no foot-gun potential.

Does this handle CSS?

Does DOMPurify sanitize CSS style sheets embeded in the markup?

Enhancement suggestion: Interpolation API

TLDR: Secure interpolation API in DOMPurify.
Usage:

interpolate`<a href='${href}' title='${title}'>${text}</a>`

Returns a safe escaping of the variables href, title and text according to their context.

Implementation:
Using ES6 template strings (and possibly a transpiler to make this work in non-ES6 environments), the interpolation function gets the context in which the variables are used. The function can then construct a safe clone of the template, e.g. <a href='1' title='2'>3</a>. When doing this, a hash table of identifiers and variable names is built (1: href, 2: title, 3: text).
This template may be safely inspected without any DOM clobbering concerns (Note: the thread model is that the template is safe and the variables are not).

Parsing & walking this template should be done in the typical DOMPurify style and pause whenever a placeholder value is seen (this can be achieved by looking at the hash table). The parser is stateful and DOM-aware, so it knows the current context in which the placeholder occurs and escapes accordingly. Contexts-aware escaping notes that combinations of attribute and tag need their own sanitizing function, e.g. URL escaping for href, disallowing javascript: etc.

When the template DOM has been completely parsed and each placeholder value has been replaced with an escaped input value, it's being transformed back into a string and returned.

<template> cannot appear alone.

For example, <template> is removed but <div><template> is alright. Not sure if it is intended behavior?

Tested in latest Chrome and Firefox with DOMPurify 0.4.3.

Block Network requests

Again, not sure if this is meaningful, but does it make sense for DOMPurify to support an option where all elements that could result in network requests are removed? Just removing src, href etc attributes might be enough?

Package description is too long for npm

Look here: https://www.npmjs.com/package/dompurify

This is what it looks like:

dompurify

DOMPurify is a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. It's written in JavaScript and works in all modern browsers (Safari, Opera (15+), Internet Explorer (10+), Firefox and Chrome - as well as almost anything else usin

DOMPurify is written by security people who have vast background in web attacks and XSS. Fear not. For more details please also read about our Security Goals & Threat Model

Breaking up that second line using a newline in the README should probably fix it.

Change CONTENT_TAGS to blacklist

I wanted to check this before submitting a pull request: at the moment, if you use KEEP_CONTENT, the tag has to match a whitelist for its contents to be kept. This is not necessary for security, since the contents is still sanitised just like any other content in the page, and is annoying if you are sanitising random HTML from sources which may use arbitrary non-standard tags to wrap bits of content (resulting in chunks being stripped by the sanitiser).

I suggest changing the CONTENT_TAGS whitelist to a FORBID_CONTENTS blacklist; the blacklist is not needed for security, but we do want to remove the contents of script tags and perhaps a few others by default, since we know they are not designed to have user-visible content.

Anyone have a problem with this/see a security flaw I've missed? If not, I'll submit a pull request.

REVIEW: Fix for bypass in Firefox thanks to a newly discovered mXSS

Hi @neilj and @fhemberger!

Today I ran into a mXSS issue on latest Gecko that causes a bypass under certain conditions. The problem is, that Firefox shows different behavior for innerHTML interaction than any other browser when doing that in an SVG context. I talked to @freddyb of Mozilla and we developed a fix.

The mXSS bug in Firefox causes a bypass when the sanitized HTML is later not being applied with innerHTML but with document.write() or alike. From a security standpoint, I find this to be close to critical. The cause for this issue is a parser behavior change in Gecko when dealing with HTML elements inside inline SVG documents. I will publish the attack vector after the fix has been reviewed (contact me offline for a PoC).

This fix might however be breaking so I'd love to hear your opinion. The tests are green and things look okay - but better to have that one triple-checked:

Changeset
https://github.com/cure53/DOMPurify/compare/DOMParser?expand=1

Test Results
https://travis-ci.org/cure53/DOMPurify/builds/80807894

Opinions are welcome!

Cheers,
.mario

sanitize() function seems to reverse the order of the attributes

It looks like the order of attributes is being reversed in the output from sanitize().
While from from the point of view of functionality it makes no difference, it is something that will trip over people that are writing unit tests using this library.

That said, thank you very much for this product!

Demo Output Confusing

Hey guys!

On the Demo page.. what exactly is the output supposed to show? When I use the write to DOM feature, I am flooded with a wall of large red text. Is this supposed to happen? There are no alerts, but there are obviously style tags being inserted.

Is this the expected behaviour? If so, I think a note about expected behaviour on that page might be appropriate, as I was expecting something in pure plaintext. Seeing a wall of giant red text made me think "that's probably not supposed to happen."

Firefox, Ubuntu.

Regression test Suite

We need a proper regression test suite to make sure minor changes don't destroy protection of known bad.

cc @fhemberger

Suggestion: Warn about MathML Elements

The maction, semantics elements can lead to script execution.
I'd prefer if the note for configuration mentions that they should be avoided

Option to return a DocumentFragment instead of html

The way I am currently using dompurify looks like this:

var html = this.databaseEntity.foo;
html = dompurify(this.document.defaultView).sanitize(html);
var fragment = parseHTMLSnippit(this.document.defaultView, html);
this.bar.appendChild(fragment);

Which means the html is being parsed twice. It would be useful if there was an option in dompurify to return a DocumentFragment. (adopted/imported into the document dompurify has been constructed with).

For example:

var html = this.databaseEntity.foo;
var fragment = dompurify(this.document.defaultView).sanitize(html, {output: 'fragment'});
this.bar.appendChild(fragment);

Use `tests/expect.json` on the demo

Ideally, the examples on the demo page would be loaded from the tests/expect.json file, so that we only have to maintain the list in one place.

tests/expect.json can also be used for unit testing / regression testing (#5).

Drops charset declarations

If a document in the <head>-tag contains:

<meta http-equiv="Content-type" content="text/html;charset=UTF-8">

the declaration will be dropped. This may lead to character set problems.

Test-suite acts non-deterministically on IE11

Currently, all tests pass on IE11 and below. However sometimes the clobbering check doesn't seem to work and two specific tests fail:

onsubmit, onfocus; DOM clobbering: children
onsubmit, onfocus; DOM clobbering: attributes

This is only happening on IE11, Windows 7 and Windows 8.1.
I cannot figure out, what this is, DOMPurify acts normally in the demo and in real-life scenarios. Maybe a QUnit race condition? (@fhemberger)

Could you provide code examples of what is being sanitized?

I'm looking at the demo that is linked in your README. I don't see what is being sanitized. E.g. the alert(1) values are still present in the output.

Could you write a section in the README which gives a few code examples (before/after)?

Make DOMPurify work in Node.js

Regarding issue #26 and #27, I originally held back Common JS style exports and publishing on npm on purpose, as DOMPurify doesn't run on a pure Node.js environment (it does client side with Browserify).

I'm still looking for a way to get it to work on Node.js as well. jsdom lacks DOM Level 2 Traversal methods like createNodeIterator at the moment, which DOMPurify uses internally.

What's currently missing:

document.implementation.createHTMLDocument
(polyfilled with return jsdom('<html><body></body></html>');)
`window.NodeFilter (extracted the properties needed for DOMPurify from the spec)
document.createNodeIterator(root, whatToShow, filter, entityReferenceExpansion)
NodeIterator.nextNode() implementation
Setter method for document.body.outerHTML

purify style elements?

I'd be awesome if you can purify style elements; or allow a subset of CSS syntax (e.g. no @import) like how reddit does it: https://github.com/reddit/reddit/blob/master/r2/r2/lib/cssfilter.py

I'm weighing options for sanitizing HTML/CSS input and I'm surprised how lightweight DOMPurify is, compared to the JS version of Google caja.

Issue rendering CSS with <style> being the first element

DOMPurify is integrated within the ModSecurity WAF XSS Evasion Demo here -
http://www.modsecurity.org/demo/demo-deny-noescape.html

Here is an example link that injects DOMPurify in the response -
http://www.modsecurity.org/demo/demo-deny-noescape.html?test=%3Cscript&enable_dompurify_defense=on&disable_browser_xss_defense=on

Notice that the CSS data does not render properly. Mario mentioned the following -

"Known issue is <style> being the first element, will go to doc.head instead doc.body. Which is even correct by spec, a config flag will fix that soon. Style attributes however are supposed to work."

rel=noreferrer?

Happy to submit a PR, but not sure if we want this.

Basically, does it make sense to default to rel=noreferrer on anchors (and I think img also supports it). Obviously, doesn't work on all platforms, but is a reasonable default for untrusted content. Stops referrer leaks and hijacking etc.