Giter Club home page Giter Club logo

svg-sanitizer's People

Contributors

abarkine avatar adamroyle avatar adamsilverstein avatar angrybrad avatar brianteeman avatar darylldoyle avatar gudmdharalds avatar hugopeek avatar igor-krein avatar joshuabaker avatar lolli42 avatar norkunas avatar ohader avatar seinopsys avatar snipe avatar xerc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

svg-sanitizer's Issues

Q: Are you aware of things this library does not catch?

I'm investigating this library and associated plugins for my employer on behalf of some clients. I'm curious, are there any known SVG issues that this does not catch? I know that SVG's can never be made 100% safe, but I'm trying to ascertain the level of mitigation this library provides, and determine the language and certainty that can be provided to clients if it's integrated into our processes

Check/Remove unused id-attributes

INPUT

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 145 45">
  <metadata id="metadata"></metadata>
  <g id="whatever">
    <path id="path123"/>
  </g>
</svg>

OUTPUT

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 145 45">
  <metadata></metadata>
  <g>
    <path/>
  </g>
</svg>

suspicious node svg

If there is a doctype then this is reported as an error

For example take any svg from the svg 1.1 structural specification


<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="8cm" height="3cm"
     xmlns="http://www.w3.org/2000/svg" version="1.1">
  <desc>Local URI references within ancestor's 'defs' element.</desc>
  <defs>
    <linearGradient id="Gradient01">
      <stop offset="20%" stop-color="#39F" />
      <stop offset="90%" stop-color="#F3F" />
    </linearGradient>
  </defs>
  <rect x="1cm" y="1cm" width="6cm" height="1cm" 
        fill="url(#Gradient01)"  />

  <!-- Show outline of canvas using 'rect' element -->
  <rect x=".01cm" y=".01cm" width="7.98cm" height="2.98cm"
        fill="none" stroke="blue" stroke-width=".02cm" />

</svg>

This produces the error

                {
                    "message": "Suspicious node 'svg'",
                    "line": -1
                }

[ask] it's possible to use on php 5.6 ?

Hello i wanna ask my application use php 5.6
when i'm try to install it no show error and when use it not show error also.
there are possibility issue when use php 5.6? because i see support package "php": "^7.0 || ^8.0"

Thanks

libxml_disable_entity_loader is deprecated

I was testing out my project on PHP 8 and got a warning about libxml_disable_entity_loader being deprecated:

https://www.php.net/manual/en/function.libxml-disable-entity-loader.php says:

Warning This function has been DEPRECATED as of PHP 8.0.0. Relying on this function is highly discouraged.

I've only taken a cursory glance at it, but it seems like the solution would be to pass an empty function or some other no-op into https://www.php.net/manual/en/function.libxml-set-external-entity-loader.php instead.

Question: why not to use kses?

Any reason why in the context of WordPress this library do not use KSES? Is it just to keep it portable or are there any other reasons?

(very shallow inspection of the code reveals functionality with is very similar to what kses does)

Removing DOCTYPE breaks entities

Hi,

not sure if I am doing anything wrong here. The sanitizer removes the DOCTYPE which breaks entities being used, e.g. in this adobe export file. After sanitizing this the file and opening directly in a browser, it produce errors like "Entity 'ns_extend' not defined".

before

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 17.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" [
	<!ENTITY ns_extend "http://ns.adobe.com/Extensibility/1.0/">
	<!ENTITY ns_ai "http://ns.adobe.com/AdobeIllustrator/10.0/">
	<!ENTITY ns_graphs "http://ns.adobe.com/Graphs/1.0/">
	<!ENTITY ns_vars "http://ns.adobe.com/Variables/1.0/">
	<!ENTITY ns_imrep "http://ns.adobe.com/ImageReplacement/1.0/">
	<!ENTITY ns_sfw "http://ns.adobe.com/SaveForWeb/1.0/">
	<!ENTITY ns_custom "http://ns.adobe.com/GenericCustomNamespace/1.0/">
	<!ENTITY ns_adobe_xpath "http://ns.adobe.com/XPath/1.0/">
]>
<svg version="1.1" id="Layer_1" xmlns:x="&ns_extend;" xmlns:i="&ns_ai;" xmlns:graph="&ns_graphs;"
	 xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="32px" height="32px"
	 viewBox="0 0 32 32" enable-background="new 0 0 32 32" xml:space="preserve">
...
</svg>

after

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 17.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
<svg version="1.1" id="Layer_1" xmlns:x="&ns_extend;" xmlns:i="&ns_ai;" xmlns:graph="&ns_graphs;"
	 xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="32px" height="32px"
	 viewBox="0 0 32 32" enable-background="new 0 0 32 32" xml:space="preserve">
...
</svg>

CDATA section is removed

From version 15.0 CDATA nodes are removed.

Example document:

<svg>
<defs>
  <style type="text/css"><![CDATA[
        .fil0 {fill:#FF0000}
   ]]></style>
 </defs>
</svg>

Result from 15.0 (15.1, 15.2):

<svg>
    <defs>
      <style type="text/css"></style>
    </defs>
</svg>

Suspicious node '#cdata-section'

Result before 15.0 (14.1):

<svg>
  <defs>
    <style type="[text/css]()"><![CDATA[
        .fil0 {fill:#00994E}
   ]]></style>
  </defs>
</svg>

Can't find a way to add #cdata-section to safe nodes, as list is hardcoded

$safeNodes = [
    '#text',
];

Why are HTML and MathML elements allowed?

Since foreignObject is not allowed, the only places you can use HTML or MathML would be title or desc (at least in a text/html context). Since both of those are invisible, it doesn't seem compelling to allow usage of HTML or MathML elements at all?

Allowing HTML tag names makes it easier to find bypasses, especially when the output is used inline in text/html.

Empty hrefs filling up the log

There are alot of erroneous "Suspicious attribute 'href'" in the log ( $xmlIssues )
This makes it difficult to use the function getXmlIssues() the check if an svg string was correct.

This is caused by $element->getAttribute('href') returning an empty string for elements that doesn't have a href attribute, in combination with the function isHrefSafeValue saying that empty href aren't safe.

Adding an empty check to the function isHrefSafeValue solves this problem.

//Allow empty URI.
if (empty($value)){
return true;
}

Remove doctype node after node elements have been analyzed

This is really a strange scenario - however in the end it occasionally caused segmentation faults...

vendor/bin/phpunit --no-coverage --filter=/doctypeAndEntityAreRemoved/
PHPUnit 6.5.14 by Sebastian Bergmann and contributors.

[1]    63007 segmentation fault  vendor/bin/phpunit --no-coverage --filter=/doctypeAndEntityAreRemoved/

PR #53 adds corresponding entityTest.svg (used in test-case doctypeAndEntityAreRemoved) which defined a XML entity using <!DOCTYPE fortiguard [ <!ENTITY lab "cool, text as an image">]>.

It turned out that the sequence of removing doctype, followed by \DOMXPath on the document causes a segmentation fault (at least on PHP 7.2 and 7.4, using libxml2 2.9.10 (always) and 2.9.12 (occasionally)). This is the call stack:

Whitelist attribute xml:space not recognised

We have exported a SVG with Adobe and the sanitizer does not like that. It give the following errors:
There are sanitization issues with this SVG file:
Suspicious attribute 'space' in line 4
Suspicious attribute 'enable-background' in line 4

Generator is: Adobe Illustrator 19.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)

Issue #63 is for the enable-background.
But the space attribute is something weird.

Relevant code of the SVG:
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" id="Layer_1" version="1.1" x="0px" y="0px" width="600px" height="600px" viewBox="0 0 600 600" xml:space="preserve">

If you sanitize this the getXmlIssues() function wil return the error above:
Suspicious attribute 'space' in line 4

Somehow the code strips xml:
Did not found the problem/solution for this.

Escaped css selector names in SVGs break the parser

Version: 0.16.0

Given:

$oddSvg = <<<SVG
<svg>
    <defs>
        <style>
            .\37 15ca94c-fc50-4dc6-8356-e55b8cb855fa { fill: #1d526b; }
        </style>
    </defs>
</svg>
SVG;

(new enshrined\svgSanitize\Sanitizer())->sanitize($oddSvg); // => ""

Expected: There should be no changes.
Actual: It returns an empty string.

Context:
Apparently some SVG generators use UUID class names. According to the CSS spec class selectors can lead with escaped digits (TIL).


I'm not sure I'll have time to look into the solution; but, wanted to file this so others know. Ideally any tools starting its CSS selectors with numbers should be thrown to the wolves.

Allow attributes that Adobe Illustrator export makes

We have exported a SVG with Adobe and the sanitizer does not like that. It give the following errors:
There are sanitization issues with this SVG file:
Suspicious attribute 'space' in line 4
Suspicious attribute 'enable-background' in line 4

Generator is: Adobe Illustrator 19.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)

I wil make an PR for the enable-background attribute.
The space attribute is something different. Because it's in the list. It's looks like that XPath or something else is removing the xml: prefix when walking over the elements. Will make a separated issue for this.

Incorrect W3C namespace in XML is allowed

<svg xmlns="https://www.w3.org/2000/svg" viewBox="0 0 100 100">
  <path d="M30,1h40l29,29v40l-29,29h-40l-29-29v-40z" stroke="#000" fill="none"/> 
  <path d="M31,3h38l28,28v38l-28,28h-38l-28-28v-38z" fill="#a23"/> 
  <text x="50" y="68" font-size="48" fill="#FFF" text-anchor="middle"><![CDATA[410]]></text>
</svg>

Copy the above into the sanitiser and you will see the correct SVG. However when loading on an actual webpage, having the W3C XML namespace pointing to https will cause the SVG to fail to load.

Add filterUnits to allowed attributes

We’ve run into an issue with SVGs exported from Figma that utilise the filterUnits attribute.

<filter ... filterUnits="userSpaceOnUse">

Are we able to include this in the attributes whitelist, or is there a reason for it’s exclusion?

SVG emptied on sanitize

So I have an SVG file created as "Export as" in Photoshop CC 2015. Running it through the sanitizer gives the following:

image

Is there anything I could do to catch this ?

Thanks a ton,
Andrass

Requesting more details on GHSA-fqx8-v33p-4qcc (CVE-2022-23638)

It seems tag 0.15.0 addressed a security vulnerability, see corresponding advisory GHSA-fqx8-v33p-4qcc (CVE-2022-23638)

Corresponding commit at 17e12ba contains a new test case tests/data/htmlTest.svg.

Invoked as svg.svg in browser, mime-type image/svg+xml

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-1 -1 2 2">
    <!--><img src onerror=alert(1)><!-->
    <?x ><img src onerror=alert(1)><?x?>
    <p/><![CDATA[ ><img src onerror=alert(1)> ]]>
    <font face=""/><![CDATA[ ><img src onerror=alert(1)> ]]>
</svg> 

→ no problem since <img> is not a SVG element
-> not a vulnerability

Invoked as svg.html in browser, mime-type text/htm

<html>
<body>
<div>
    <svg xmlns="http://www.w3.org/2000/svg" viewBox="-1 -1 2 2">
        <!--><img src onerror=alert(1)><!-->
        <?x ><img src onerror=alert(1)><?x?>
        <p/><![CDATA[ ><img src onerror=alert(1)> ]]>
        <font face=""/><![CDATA[ ><img src onerror=alert(1)> ]]>
    </svg>
</div>
</body>
</html>

→ valid concern, since HTML is used in inline SVG
→ scripts are executed in browser
→ cross-site scripting vulnerability

Conclusion & Post-review

Request

  • @darylldoyle please report back, whether these assumptions are correct (it affects only SVG used inline in some HTML-context)
  • consider updating advisory details of GHSA-fqx8-v33p-4qcc - I can support with that task

Requesting more details on GHSA-xrqq-wqh4-5hg2 (CVE-2023-28426)

It seems that tag 0.16.0 did not actually fix a real security vulnerability, see corresponding advisory GHSA-xrqq-wqh4-5hg2 (CVE-2023-28426).

The provided new test SVG files in commit cce18bc still seem to be fine (encoded correctly) even when being processed with the previous version 0.15.4 of this library.

Invoked as svg.svg in browser, mime-type image/svg+xml

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" xml:space="preserve">
    <form action="javascript:alert('1')">
        <input type="submit"></input>
    </form>
    
        &lt;form action="javascript:alert('1')"&gt;
            &lt;input type="submit" onclick="javascript:alert('1')"/&gt;
        &lt;/form&gt;
    
</svg>

→ the <form> tag does not look nice, but is without any functionality inside an SVG context
→ the nested <form> tag in a CDATA section is correctly encoded (this is what the security advisory is referring to)
→ not a vulnerability

Invoked as svg.html in browser, mime-type text/htm

<html>
<body>
<div>
    <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" xml:space="preserve">
        <form action="javascript:alert('1')">
            <input type="submit"></input>
        </form>
        
            &lt;form action="javascript:alert('1')"&gt;
                &lt;input type="submit" onclick="javascript:alert('1')"/&gt;
            &lt;/form&gt;    
        
    </svg>
</div>
</body>
</html>

→ the <form> tag does not look nice, but is without any functionality inside an SVG context
→ the nested <form> tag in a CDATA section is correctly encoded (this is what the security advisory is referring to)
→ not a vulnerability

Conclusion & Post-review

  • disallowing non SVG tags from being processed (like the <form> tag) was handled in #74 and commit 355a65d - which was totally fine to be adjusted, but did not qualify as a security vulnerability
  • the remaining aspect of encoding CDATA sections was handled in a previous fixes already, see issue #71
  • introducing the package ezyang/htmlpurifier seems to be superfluous in this scenario

Disclaimer & Call for Action

"Cannot add self usage" error

Happens when trying to sanitize the attached image.

icon.svg.zip

The SVG validates as valid XML, but I'm not smart enough to know if it's poorly done SVG or a bug in SVG sanitization.

Looks like this commit is where the behavior changed as it worked fine before it: 504da82

Standalone scanning of files via CLI throwns Uncaught Error: Class with XPath

`php ../svg-sanitizer/src/svg-scanner.php "../myTestSVG.svg"

Fatal error: Uncaught Error: Class 'enshrined\svgSanitize\data\XPath' not found in C:\svg-sanitizer\src\Sanitizer.php:214
Stack trace:
#0 C:\svg-sanitizer\src\svg-scanner.php(126): enshrined\svgSanitize\Sanitizer->sanitize('<?xml version="...')
#1 {main}
thrown in C:\svg-sanitizer\src\Sanitizer.php on line 214`

I've used different versions from version 11 to 13.3 and they all produce this error, I have also tried reinstall the package via composer but still receive the same error.

XML header lines removed

If you have an SVG files that begins with <?xml version="1.0" encoding="utf-8"?>, then Sanitizer will remove that when saving the clean XML elements back out to the XML file.

It happens in this line:

$clean = $this->xmlDocument->saveXML($this->xmlDocument->documentElement, LIBXML_NOEMPTYTAG);

The reason is that all of the XML header info (verison, encoding, etc.) is saved on $this->xmlDocument. Changing that line to:

$clean = $this->xmlDocument->saveXML($this->xmlDocument, LIBXML_NOEMPTYTAG);

Makes it behave as you'd expect. Just making sure that wasn't a conscious design decision before I submit a pull request.

Should xlink:href always be removed?

Hello! Glad someone finally took on the SVG mess on WP :)

I ran in to this issue though, causing the SVG image not to display:

Before

<clipPath id="SVGID_2_">
 <use xlink:href="#SVGID_1_"  overflow="visible"/>
</clipPath>

After (upload)

<clipPath id="SVGID_2_">
</clipPath>

Similair with the image element, though just removing the xlink:href attribute:

Before

<image overflow="visible" width="66" height="77" xlink:href="data:image/jpeg;base64,/9j/4AA...jbf8ADaP/2Q==" transform="matrix(0.48 0 0 0.48 521.2959 384.499)">

After

<image overflow="visible" width="95" height="77" transform="matrix(0.48 0 0 0.48 638.2959 384.499)">

Which makes me wonder if xlink:href should be removed when containing just an #id or an base64 data-image.

Should SVGZs be handled by the library or the code consuming the library?

SVGZ's can be handled by running the content through gzdecode() before sanitisation and then running through gzencode() afterwards.

I'm fairly certain we can use 0 === mb_strpos($contents , "\x1f" . "\x8b" . "\x08") to check if it's a gzipped string or not

Simple but should this be handled by the library or not?

Change the license type to MIT or to Apache 2.0 or to the double license GPL v2 + MIT or GPL v2 + Apache 2

Hi @darylldoyle!

I hope everything is great.

We would like to use your library, however we are working on a non-open source software.
The current license (GPL v2) is blocking us from using the library.

Could you please consider changing the license to MIT or to Apache 2.0?
Or perhaps having a double license like GPL v2 + MIT or GPL v2 + Apache 2 would be fine for you?

In that case more software developers will be able to use the library.

Safelist to allow image elements in href attributes for SVGs

Since I don't think this is currently possible, it would be nice to be able to be able to use setAllowedAttrs() to detect a starting pattern inside a href attribute like data:image/*.

e.g. This gets false positive flagged:

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 735 70" width="735" height="70">
	<defs>
		<image width="735" height="70" id="img1" href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUrPcSqAAAAABJRU5ErkJggg=="/>
	</defs>
</svg>

PHP 5.2 compatibility

Currently wont work on PHP 5.2 due to namespacing. Not sure what the pros/cons of dropping namespacing to support 5.2 could be.

Using library as SVG validator

Hello @darylldoyle thank you for this great library, it works really well.

I would like to use this library as a validation when uploading a SVG. In general it works ok, but there are some "feature requests" which probably could improve the handling. The use case it, that we want to decide if we allow the upload of the SCG or not not to clean up the uploaded SVG.

So we tried to sanitize and then check if there are any xmlIssues or if sanitize returns false. So far so good, but we stumbled here over some challenges:

  • If there are PHP opening/closing tags, the sanitizer cleans them, but it's never reported somewhere
  • If there is a comment in the SVG (which is ok I guess) it's reported as xmlIssue here
  • There is no differentiation between the errors a comment raises the same issue level as a javascript
  • It's hard to determinate what error occured just from having the strings (e.g. to output then custom error messages)

It would be nice to enhance the doc block comments with the description of the parameters and return values.

Is there anything which is on the roadmap for future releases?

Thank you in advance.

Stripping out <animate> tag

Should <animate> tags be whitelisted? I'm not sure if you've chosen to strip them out specifically, but 'animate' is not in the allowed tags array. I've used the svg_allowed_tags filter provided by the WordPress plugin to add it for now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.