gavioto / owaspantisamy Goto Github PK

Automatically exported from code.google.com/p/owaspantisamy

HTML 28.00% Java 14.30% CSS 6.62% JavaScript 6.80% ASP 0.01% DIGITAL Command Language 31.32% PHP 0.01% Shell 4.19% Groff 0.53% C# 8.23%

owaspantisamy's People

Contributors

Watchers

owaspantisamy's Issues

System.IO.StreamReader Constructor for Policy (Patch Included)

To simplify deployment we store our policy files as embedded resources in
the DLL. However the current Policy does not support reading Policy files
either from a string or a stream. Attached is a patch to Policy.cs that
will construct a Policy object given a StreamReader. This required some
slight re-organisation of the Policy file to move XML parsing into its own
method (Initialise) which is called by the constructors.

Given this patch it is possible to read the policy from the assembly with;
System.Reflection.Assembly asm = typeof(MyClass).Assembly;
System.IO.Stream stream = asm.GetManifestResourceStream("MyClass.Policy");
Policy policy = Policy.getInstance(new System.IO.StreamReady(stream)

The patch provided is against the current head (R72)

Original issue reported on code.google.com by [email protected] on 18 Jan 2009 at 10:33

Attachments:

Policy.patch

Support other Input Sources than Files

Currently only physical files are accepted as input sources. In the context of 
Content Management 
Systems it might be advisable to be able to load policy definitions from other 
sources, for example 
directly from the content repository or the database.

I am providing a patch that extends the Signature of Policy to support generic 
SAX InputSources and 
generic InputStreams with a SystemID. Also, in order to be able to support more 
than one 
configuration, I am caching the policy per SystemID and not globally anymore.

Original issue reported on code.google.com by [email protected] on 3 Jul 2008 at 7:07

Attachments:

antisamy-input-sources.patch

[deleted issue]

[deleted issue]

textCssAttacks() testing for the wrong attribute

This method in the test suite has a bug:

The z-index test is checking for position ... just a copy/paste bug but may 
be worth fixing.

Original issue reported on code.google.com by [email protected] on 20 Apr 2009 at 8:54

Missmatch between ErrorMessageUtil constant and included propertybundles

There seems to be a missmatch between ErrorMessageUtil and the included
property-bundles. The bundles contains the key
"error.attribute.invalid.filtered" while ErrorMessageUtil have the constant:

public static final String ERROR_ATTRIBUTE_CAUSE_FILTER =
"error.attribute.invalid.filter";

Results in exception:

java.util.MissingResourceException: Can't find resource for bundle
java.util.PropertyResourceBundle, key error.attribute.invalid.filtere
    at java.util.ResourceBundle.getObject(ResourceBundle.java:386)
    at java.util.ResourceBundle.getString(ResourceBundle.java:346)
    at
org.owasp.validator.html.util.ErrorMessageUtil.localize(ErrorMessageUtil.java:95
)
    at
org.owasp.validator.html.util.ErrorMessageUtil.getMessage(ErrorMessageUtil.java:
79)
    at
org.owasp.validator.html.scan.AntiSamyDOMScanner.addError(AntiSamyDOMScanner.jav
a:601)
    at
org.owasp.validator.html.scan.AntiSamyDOMScanner.recursiveValidateTag(AntiSamyDO
MScanner.java:492)
    at
org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(AntiSamyDOMScanner.java:13
4)
    at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:99)

Original issue reported on code.google.com by [email protected] on 31 Oct 2008 at 8:46

Small error in in the antisamy-slashdot-1.2.xml file



In the antisamy-slashdot-1.2.xml file,  <strong> is listed as tag that is 
allowed, however no rule is defined to validate it.

Original issue reported on code.google.com by [email protected] on 15 Dec 2008 at 2:39

Regex color error in antisamy-slashdot-1.1.1.xml

In the policy file antisamy-slashdot-1.1.1.xml, the line beginning:

<regexp name="colorNameOrCode" value="(#[0-9a-zA-Z]{6} ....

Seems like it should be:

<regexp name="colorNameOrCode" value="(#[0-9a-fA-F]{6} ....

Presuming that this regex is for detecting hex colors. This error also
occurs in other policy files which specify colorNameOrCode.

Original issue reported on code.google.com by [email protected] on 8 May 2008 at 12:04

In the policy file, giving tag or attribute names any caps causes them to get filtered

I presume this is a bug since I don't see it stated anywhere that
lower-case names are required.

What steps will reproduce the problem?
1. Add the following rule:

        <tag name="b" action="validate">
            <attribute name="id">
                <literal-list>
                    <literal value="feh"/>
                </literal-list>
            </attribute>
            <attribute name="fehOnly">
                <literal-list>
                    <literal value="feh"/>
                </literal-list>
            </attribute>
        </tag>

2. Try it on the following text:

<b id="feh" fehOnly="feh" badAtt="poison">woot</b>

3. Get the following result:

***** DIRTY HTML *****
<b id="feh" fehOnly="feh" badAtt="poison">woot</b>
***** CLEAN HTML *****
<b id="feh">woot</b>

***** ERRORS *****
The <b>badatt</b> attribute of the <b>b</b> tag has been removed for
security reasons. This removal should not affect the display of the HTML
submitted.
The <b>fehonly</b> attribute of the <b>b</b> tag has been removed for
security reasons. This removal should not affect the display of the HTML
submitted.




What is the expected output? What do you see instead?

Change the attribute name ("fehOnly" --> "fehonly"):

<attribute name="fehonly">

And you get the expected output:

***** DIRTY HTML *****
<b id="feh" fehOnly="feh" badAtt="poison">woot</b>
***** CLEAN HTML *****
<b fehonly="feh" id="feh">woot</b>

***** ERRORS *****
The <b>badatt</b> attribute of the <b>b</b> tag has been removed for
security reasons. This removal should not affect the display of the HTML
submitted.



What version of the product are you using? On what operating system?

v1.1.1, MacOS Tiger


Please provide any additional information below.

This happens with tag names, too.

Original issue reported on code.google.com by [email protected] on 10 May 2008 at 1:27

No thread safe way to create a Policy

With anti samy, there is no thread safe way to create a policy file.  The
Policy.getInstance(...) methods all read/write from the private _instance
variable without synchronization.  If two threads are creating policy files
(even with the same file/filename) there is potential for conflict, see
http://www.ibm.com/developerworks/java/library/j-dcl.html , in particular
the section on out of order writes.  

I think the _instance variable in Policy can be removed altogether.

Original issue reported on code.google.com by [email protected] on 22 Jun 2009 at 8:50

nbsp translated into " " (blank) in AntiSamy

What steps will reproduce the problem?

Run antisamy on the command line a &nbsp as input, you'll see the results
translate the blank spaces instead of maintaining the HTML characters and code.

What is the expected output? What do you see instead?
Input: <p>&nbsp;&nbsp;Does this work</p>
Expected: <p>&nbsp;&nbsp;Does this work</p>
Actual Output: <p>  Does this work</p> 

What version of the product are you using? On what operating system?
1.1 JDK 1.6 (and 1.5), Windows

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 4 Apr 2008 at 5:18

Attachments:

test.txt

Simplify AntiSamy.java, reduce code duplication

Currently AntiSamy.java contains a number of convenience methods for sanitizing 
HTML that are 
duplicating code instead of wrapping the most generic implementation (that 
should do the real 
work) with default values for parameters.

My patch changes these methods into wrapper methods making it easier to change 
or extend the 
Scanner API.

Original issue reported on code.google.com by [email protected] on 7 Jul 2008 at 12:00

Attachments:

antisamy-method-simplification.patch

Phishing: negative margins

margin-top:-50px is possible with myspace policy. In some cases this can be
used just like position offsets.

Original issue reported on code.google.com by [email protected] on 17 Aug 2008 at 4:06

Maven packaging

It would be very nice to have a Maven-packaged release of AntiSamy. I can 
help if you want.

Original issue reported on code.google.com by [email protected] on 28 May 2009 at 1:27

Malformed HTML cause parsing exception

What steps will reproduce the problem?
Attempt to sanitize tag like:
<a - href="/">link</a>
<a . href="/">link</a>

Closing tags parsed Ok

What is the expected output? What do you see instead?
Expected:
<a href="/">link</a>
<a href="/">link</a>

Exception:
INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.

What version of the product are you using? On what operating system?
v1.1.1, Mac OS

Please provide any additional information below.
This is probably a nekoHTML bug

Original issue reported on code.google.com by [email protected] on 4 Jun 2008 at 6:12

Blocking: #21

Non empty elements are converted to a minimized form in xhtml.

Attempt to sanitize a string like the following with the directive
"useXHTML" set to true:
<strong></strong><a>hello world</a>

AntiSamy outputs the following:
<strong /><a>hello world</a>

If you render that in an html page (in ie/ff using xhtml doctype) the link
will be in bold. It shouldn't be bold.

It turns out not to be an issue of AntiSamy per se, the problem is when
using the "XHTMLSerializer" of xerces 2.9.1.

If the tag is empty, it is using the minimized version of it
(HTMLSerializer.java line 411).

The XHTML spec mentions that only tags that are allowed to be empty (i.e.
b, img) are allowed to be minimized:
http://www.w3.org/TR/xhtml1/#guidelines

Original issue reported on code.google.com by [email protected] on 11 Feb 2009 at 6:38

Slashdot policy does not filter javascript from link anchor's href attribute

What steps will reproduce the problem?
1. Create a Policy instance from the antisamy-slashdot-1.1.1.xml:
    Policy policy = Policy.getInstance("antisamy-slashdot-1.1.1.xml");

2. Scan the String "<a href=\"javascript:alert('xss')\">link</a>":
    CleanResults cr = (new AntiSamy()).scan(html, policy);

3. The result of cr.getCleanHTML() will contain the original HTML, i.e.,
there will still be JavaScript commands in the anchor's href attribute.


What is the expected output? What do you see instead?
I was expecting to see "link", instead I see the input string, "<a
href=\"javascript:alert('xss')\">link</a>"


What version of the product are you using? On what operating system?
I am using version 1.1.1 on Windows XP with JDK 1.5.0_12.


Please provide any additional information below.
I found a possible solution by expanding

    <regexp name="onsiteURL"
value="([\p{L}\p{N}\\/\.\?=&amp;;\#-~]+|\#(\w)+)"/>

in section <common-regexps> to exclude the colon:

    <regexp name="onsiteURL"
value="([\p{L}\p{N}\\/\.\?=&amp;;\#-~&amp;&amp;[^:]]+|\#(\w)+)"/>

But I am by no means a security expert, so this possible solution needs to
be carfully looked into!

Original issue reported on code.google.com by [email protected] on 25 Apr 2008 at 7:52

Attachments:

TestAntiSamy.java

css filter wrong with font-family when supplying a list of fonts

What steps will reproduce the problem?
1. allow font-family in policy, list some fonts (courier, serif)
2. allow span to have style attributes
3. test with string: "<span style="font-family: serif, 
courier;">preformated</span>"

What is the expected output? What do you see instead?
input should stay the same.
output now is: "<span style="">preformated</span>"
Filter error-msg is: non valid arguments for font-family
Filter is working if only 1 font is supplied.

What version of the product are you using? On what operating system?
os X, Java 1.6, latest jar antiSamy (1.3)

Please provide any additional information below.
my font-family-config:
        <property name="font-family"
            description="This property specifies a prioritized list of font family names and/or 
generic family names.">
            <category-list>
                <category value="visual" />
            </category-list>
            <!-- allowing only generic font families -->
            <literal-list>
                <literal value="serif" />
                <literal value="courier" />
                <literal value="courier new" />
                <literal value="arial" />
                <literal value="lucida console" />
                <literal value="sans-serif" />
                <literal value="cursive" />
                <literal value="verdana" />
                <literal value="fantasy" />
                <literal value="monospace" />
            </literal-list>
        </property>

Original issue reported on code.google.com by [email protected] on 3 Aug 2009 at 8:01

onsiteUrl should accept percentage sign.

If I give to Antisamy a string like:

<a href="/foo/bar.do?%24id=1">hello</a>

it will return:
<a>hello</a>

because the onsiteURL regex doesn't include the percentage sign.

I noticed that the offsiteURL already takes the percentage, is there a
reason why we shouldn't accept it for onsiteURL or was it just a typo?

Thanks!

Original issue reported on code.google.com by [email protected] on 20 Jun 2008 at 2:43

Conditional comments

What steps will reproduce the problem?

http://maulnets.ya.ru/replies.xml?item_no=1513&ncrnd=6690
I tried to process this page with: MySpace policy and found that it allows IE 
conditional comment like:
<!--[if lt IE 7]><link rel=stylesheet 
href="http://img-css.friends.yandex.net/css/_ya.ru-ie6.css?
build=10443"><![endif]-->

What is the expected output? What do you see instead?
I expect AntiSamy to filter that stuff

What version of the product are you using? On what operating system?
1.1.1 mac OS

Please provide any additional information below.
The funny thing that it's filter comments when it submit only coments

Original issue reported on code.google.com by [email protected] on 4 May 2008 at 4:46

Provide Programmatic Access to Policy File

Policy object currently parses policy files.

This change will add the ability to serialize (output) policy files and
make changes to the policy programmatically.

Among other things, this will allow:
- JUnit test cases to be created against a policy file
- Automated creation of antisamy.xml derivative files (ex. myspace,
slashdot, etc).
- Programmatic API for policy modification (e.g., to use with a policy
editor GUI)

Original issue reported on code.google.com by [email protected] on 17 Mar 2009 at 3:59

SAX-based scanner implementation for reduced memory usage & increased performance

As mentioned here: 
https://lists.owasp.org/pipermail/owasp-antisamy/2008-June/000048.html a 
SAX-based scanner implementation can lead to performance advantages, because 
the document is 
not held in memory (only a stack holding the operations)

Usage is illustrated in the test case, which is also used for performance 
tests. Running SAX and 
DOM against the AntiSamy Google Code Homepage with the default policy reports 
that SAX is 50% 
faster than DOM.

Original issue reported on code.google.com by [email protected] on 7 Jul 2008 at 11:25

Attachments:

antisamy-sax-filter.patch

out of memory: junit : antisamy-myspace-1.2 , antisamy-ebay-1.2

What steps will reproduce the problem?
1. see junit test below

What is the expected output? What do you see instead?
expected:cleaned html
actual: out of memory

What version of the product are you using? On what operating system?
        <dependency>
            <groupId>antisamy</groupId>
            <artifactId>antisamy-bin</artifactId>
            <version>1.2</version>
        </dependency>       
        <dependency>
            <groupId>batik</groupId>
            <artifactId>batik-css</artifactId>
            <version>1.6-1</version>
        </dependency>
         <dependency>
            <groupId>batik</groupId>
            <artifactId>batik-util</artifactId>
            <version>1.6-1</version>
        </dependency> 
        <dependency>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis-ext</artifactId>
            <version>1.3.04</version>
        </dependency>       
        <dependency>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis</artifactId>
            <version>2.0.2</version>
        </dependency>     
        <dependency>
            <groupId>xerces</groupId>
            <artifactId>xercesImpl</artifactId>
            <version>2.9.1</version>
        </dependency> 
        <dependency>
            <groupId>net.sourceforge.nekohtml</groupId>
            <artifactId>nekohtml</artifactId>
            <version>1.9.11</version>
        </dependency>

Please provide any additional information below.
[code]

import junit.framework.TestCase;

import org.owasp.validator.html.AntiSamy;
import org.owasp.validator.html.CleanResults;
import org.owasp.validator.html.Policy;

public class CleanerTest extends TestCase {
    String dirty = "<a onblur=\"try {parent.deselectBloggerImageGracefully();}
catch(e) {}\"
href=\"http://www.charityadvantage.com/ChildrensmuseumEaston/images/BookswithBil
l.jpg\"><img
style=\"FLOAT: right; MARGIN: 0px 0px 10px 10px; WIDTH: 150px; CURSOR:
hand; HEIGHT: 100px\" alt=\"\"
src=\"http://www.charityadvantage.com/ChildrensmuseumEaston/images/BookswithBill
.jpg\"
border=\"0\" /></a><br />Poor Bill, couldn't make it to the Museum's <span
class=\"blsp-spelling-corrected\" id=\"SPELLING_ERROR_0\">story time</span>
today, he was so busy shoveling! Well, we sure missed you Bill! So since
you were busy moving snow we read books about snow. We found a clue in one
book which revealed a snowplow at the end of the story - we wish it had
driven to your driveway Bill. We also read a story which shared fourteen
<em>Names For Snow. </em>We'll catch up with you next week....wonder which
hat Bill will wear?<br />Jane" ;        

    /**
     * produces an out of memory error
     * @throws Exception
     */
    public void testMySpaceCleaner() throws Exception {
        AntiSamy as = new AntiSamy();
        Policy policy =
Policy.getInstance(this.getClass().getClassLoader().getResourceAsStream("antisam
y-myspace-1.2.xml"));
        CleanResults cr = as.scan(dirty, policy);
        assertNotNull( cr.getCleanHTML() ) ;
    }
    /**
     * produces an out of memory error
     * @throws Exception
     */
    public void testEBayCleaner() throws Exception {
        AntiSamy as = new AntiSamy();
        Policy policy =
Policy.getInstance(this.getClass().getClassLoader().getResourceAsStream("antisam
y-ebay-1.2.xml"));
        CleanResults cr = as.scan(dirty, policy);
        assertNotNull( cr.getCleanHTML() ) ;
    }

    /**
     * works
     * @throws Exception
     */
    public void testSlashdotCleaner() throws Exception {
        AntiSamy as = new AntiSamy();
        Policy policy =
Policy.getInstance(this.getClass().getClassLoader().getResourceAsStream("antisam
y-slashdot-1.2.xml"));
        CleanResults cr = as.scan(dirty, policy);
        assertNotNull( cr.getCleanHTML() ) ;
    }


}
[/code]

Original issue reported on code.google.com by [email protected] on 23 Feb 2009 at 7:43

CSS Scanner gives extra format to inline styles.

If I give something like this to AntiSamy:

<p style="border: 3.0px;">hello</p>

It returns this:

<p style="    border: 3.0px;

">hello</p>"

It appends a tab before the "border" and appends the breakline after the
value. It shouldn't.

This is done in org\owasp\validator\css\CssHandler.java in the "property"
method in lines 471 and 482. If it's an inline value, it shouldn't append
those characters.

I attached the proposed fix.

Original issue reported on code.google.com by [email protected] on 31 Jul 2008 at 11:30

Attachments:

CssHandler.patch

Improvement: find links option

With "find links" option in policy files AntiSamy might wrap links in text
with anchors.

This is hard to implement without AntiSamy because standalone function
can't know whether link comes as a tag attribute or not. And AntiSamy works
at object level and can easily do this.

Original issue reported on code.google.com by [email protected] on 17 Aug 2008 at 3:17

CSS font-family Property Parsing Error

What steps will reproduce the problem?
1. Scan an inline stylesheet "font-family: Geneva, Arial, Sans-serif"

What is the expected output? What do you see instead?
This should be valid by the regular expression. Instead, validation fails.

This appears to be caused by parsing of the LexicalUnit for the font family
name.

Originally reported by Chase Seibert.

Original issue reported on code.google.com by [email protected] on 27 Oct 2008 at 5:41

antisamy-slashdot-1.2.xml handles invalid tags oddly

Using version 1.2 and antisamy-slashdot-1.2.xml

Example test string: "<b>bold<b><i>italic</i><title>title</title>"
Note the wrong second <b> tag (on purpose).
What I would expect would be "<b>bold<b><i>italic</i>title</b></b>".
However, what is returned is:

"<b>bold<b>\n" +
"    <i>italic</i></b></b>\n" +
"<b>\n" +
"  <b>title</b></b>"

Note the newlines, blanks and the duplication of the <b> tag before title.

Another example using nested tags:
Example test 
string: "<title>title</title><b>bold</b><i>italic</i><b><i>boldanditalic</b
>"
Note the missing </i> tag after boldanditalic (on purpose).
What I would expect would 
be "title<b>bold</b><i>italic</i><b><i>boldanditalic</i></b>".
However, what is returned is:
title<b>bold</b>
<i>italic</i>
<b>
  <i>boldanditalic</i></b>
<i />
Note again the newlines, blanks and the duplication of the (empty) <i /> 
tag at the end.

Although the result isn't "wrong", the irregular blanks and newlines make 
it harder to test, as do duplication of invalid tags.

Original issue reported on code.google.com by [email protected] on 29 Jul 2008 at 2:49

Move strings to a resource file

The source to AntiSamy hardcodes all errors, warnings, etc. in the source.
 It would be nice if the source had these strings in a resources file so
that one could swap out the resources file to change the messages.

Original issue reported on code.google.com by sorakiu on 4 Aug 2009 at 7:00

Ability to encode unknown tags without balancing them.

Antisamy has the "onUnknownTag" directive where you can set to encode
whatever is not recognized. However Cyberneko will balance the tag before
it's given to Antisamy.

Example, cleaning the string:
"<b>hello</b> <world> !"
Will return:
<b>hello</b> &lt;world&gt; !&lt;/world&gt;

However I'd like the ability to recognize that "<world>" is not an html tag
and just encode it instead of removing it and without having it balanced.

I'd like to get:
"<b>hello</b> &lt;world&gt; !";

A use case would be a text input where the user enters plain text but the
output is rendered into HTML. We would like to display literally whatever
the user input (for instance, "<world>") and let the browser pick up markup
that is considered safe "<b>").

This is easy to accomplish. Before sending the html string to the
DOMFragmentParser, we can encode the open bracket for any unrecognized tag.
An unrecognized tag would be a tag that was not explicitly defined in the
policy.

I attach a patch file with the suggested changes.

Original issue reported on code.google.com by [email protected] on 26 Nov 2008 at 6:24

Attachments:

EncodeNonHtmlTags.patch

Non-latin encoding problem

What steps will reproduce the problem?
1. cfscript code executed with railo:
this.sanitizer = createObject("java", "org.owasp.validator.html.AntiSamy");
var results = this.sanitizer.scan(input, policyFile);
return results.getCleanHTML();


What is the expected output?
<p>параграф</p>

What do you see instead?
<p>????????</p>


What version of the product are you using? On what operating system?
Railo 2.0.0.034 + Resin-3.1.2 + mac os  10.5.2


Please provide any additional information below.
I guess it's not Railo or Cf specific problem, it's just treated as some kind 
of iso encoding when it 
suppoused to be a unicode

Original issue reported on code.google.com by [email protected] on 10 Mar 2008 at 12:00

Error parsing <style> contents containing CDATA

From Gang Zheng:

I tried the following input string with AnitSamy and encountered an exception:

Input String: <style type="text/css"><![CDATA[P {  margin-bottom:
0.08in; } ]]></style>

org.apache.batik.css.parser.ParseException: character
       at org.apache.batik.css.parser.Scanner.nextToken(Scanner.java:381)
       at org.apache.batik.css.parser.Scanner.next(Scanner.java:222)
       at org.apache.batik.css.parser.Parser.parseStyleSheet(Parser.java:185)
       at
org.owasp.validator.css.CssScanner.scanStyleSheet(CssScanner.java:124)
       at
org.owasp.validator.html.scan.AntiSamyDOMScanner.recursiveValidateTag(AntiSamyDO
MScanner.java:318)
       at
org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(AntiSamyDOMScanner.java:13
5)
       at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:99)

I traced the code, and it seems that AntiSamy passes the text of
"<![CDATA[P {  margin-bottom: 0.08in; } ]]>" to the CSS scanner, and
the CSS scanner does not like <![CDATA[...]]> as the surrounding of
the real style sheet contents.

If I remove the CDATA from the input and change the style sheet
contents to "<style type="text/css">P {  margin-bottom: 0.08in;
}</style>", everything works fine.

So my questions is, how can I make the AntiSamy/CSS Scanner correctly
parse the CDATA contents? After all, the CDATA section in the original
input is perfectly legal style sheet contents.

Original issue reported on code.google.com by [email protected] on 24 Nov 2008 at 9:47

Combine color regular expressions

For clarity, recommend changing color regular expressions (currently
colorCodeOrName and cssColor) to separate, distinct regular expressions
that can be reused by all HTML tags/CSS properties.

Proposed Breakdown:
colorName (color names defined by HTML/CSS Spec):
(aqua|black|blue|fuchsia|gray|grey|green|lime|maroon|navy|olive|purple|red|silve
r|teal|white|yellow)

colorCode (color by hex - HTML + CSS): #[[0-9][0-9a-fA-F]?]{3} --- need to
check the format of this regular expression but the idea is that #000
should be valid for black

rgbCode (CSS only):
rgb\(([1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]),([1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]),(
[1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\)
--- basically 0-255 for each of R/G/B

Original issue reported on code.google.com by [email protected] on 14 Mar 2009 at 8:11

Invalid key in i18n

Very minor issue...

AntiSamy 1.2

In ErrorMessagingUtils there is:

public static final String ERROR_ATTRIBUTE_CAUSE_FILTER =
"error.attribute.invalid.filter";

But in AntiSamy_en_US.properties it reads:
error.attribute.invalid.filtered=The <b>{0}</b> tag contained ...

Notice that in the Java file it reads "filter" while in the properties file
is "filtered".

Original issue reported on code.google.com by [email protected] on 19 Jun 2008 at 4:47

Bug: Error

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 21 Jul 2008 at 12:06

Allow use of InputStream for Policy instead of File

In a web application, you can't necessarily depend on knowing the complete path 
to the policy file.  
I've attached a patch that will allow you to use an InputStream instead of a 
File or String filename.  It 
required modification to the private constructor, but should otherwise be 100% 
compatible.  Any 
IOExceptions are wrapped in a PolicyException.

Example use:
Policy pol = 
Policy.getInstance(getClass().getResourceAsStream("/resources/antisamy.xml");
...

Original issue reported on code.google.com by [email protected] on 5 Apr 2008 at 8:51

Attachments:

policyis.patch

Preserve comments

I would like to have AntiSamy preserve HTML comments in the original 
source, because the template language I am using (MiniTemplator) uses 
comments for its directives.

Is this something that would be easy, or it is a monumental task?

I suppose a worst-case I could escape the comments temporarily beforehand 
and unescape them after running AntiSamy.

Original issue reported on code.google.com by [email protected] on 17 Apr 2009 at 4:46

incorrect CSS error handling

ParseExceptions thrown by the batik css parser are not being caught,
instead they cause the cleaner to fail.  The attached patch fixes the
issue, though the error message may need to be updated (and added to all
the localized message bundles)

Original issue reported on code.google.com by [email protected] on 28 Jul 2009 at 10:10

Attachments:

patch

Color code is canonicalized to rgb code

Feature request from Raphael Moita regarding CSS color code:

Does someone know why AntiSamy changes values Hex to RGB like this below
and how can I avoid that?

<font style="BACKGROUND-COLOR: #ffff00"> to <font style="BACKGROUND-COLOR:
rgb(255,255,0)">

Proposed implementation by using a policy directive that allows users to
specify canonicalization format for CSS colors (hex, rgb, etc).
Subsequently, lexicalValueToString method needs to be changed to honor the
new directive.

Original issue reported on code.google.com by [email protected] on 18 Apr 2009 at 6:29

New policy

TinyMCE policy file so that AntiSamy can be used as the server side
validation to be used with simple version of TinyMCE.
(http://wiki.moxiecode.com/examples/tinymce/installation_example_00.php)

It should allow minor markup and styling. Practical use is to validate
comments and internal site messaging.

I have attached example file that I've created from Slashdot policy.

(I will work on it some more because I would like it to be compatible to
other editors too)

Original issue reported on code.google.com by [email protected] on 17 Aug 2008 at 3:09

Attachments:

default.xml

Enhancement: cssElementSelector

Current cssElementSelector pattern:
[a-zA-Z0-9-_]+|\*

should be enhanced with space marker:
cssElementSelector pattern
[a-zA-Z0-9-_\p{Zs}]+|\*

in order to process multiple class assigments like:
class="class1 class2"

Original issue reported on code.google.com by [email protected] on 8 Oct 2008 at 5:57

No error messages for removed IFRAME tags. Childless IFRAME, SCRIPT, LINK tags always removed despite policy configuration.

I was looking for ways to throw an exception on a validation failure (ie
remove or truncate).  I was hoping that all validation failures would be
logged to errorMessages, but this does not seem to be the case. 
Specifically, the anchor tag below throws an exception but the iframe tag
does not.  

Exception from line 5
<a href="javascript:alert('hacked!');">click me</a>

No Exception
<iframe src='http://www.google.com'></iframe>

1.    AntiSamy antiSamy = new AntiSamy();
2.    CleanResults cleanResults = antiSamy.scan(canonical, SLASHDOT_POLICY);
3.
4.    if(!cleanResults.getErrorMessages().isEmpty()) {
5.        throw getException(name);
6.    }

After further testing, it seems that iframe, link, and script tags without
child nodes are removed without an error message.  Specifically, these tags
are not in the allowedEmptyTags array, so they are nuked.  This also means
that these tags will always be removed despite the policy setting.

No Error Message
<iframe src='http://www.hacker.com/fishing.html'></iframe>
<link rel="stylesheet" type="text/css"
href="http://www.hacker.com/takeover.css"/>
<script src='http://www.hacker.com/takeover.js'></script>

Error Message
<iframe src='http://www.hacker.com/fishing.html'>&nbsp;</iframe>
<script src='http://www.hacker.com/takeover.js'>&nbsp;</script>

Original issue reported on code.google.com by [email protected] on 13 May 2009 at 6:07

onsiteURL regex let's javascript code go through

The href attribute of the anchor tag can take javascript code to execute
when clicking on the link.

Scanning something like:
"<a href=\"javascript:alert('xss!');\">xss</a>"

The "onsiteURL" will mark it as invalid because it does not accept the
parenthesis.

However, scanning something like:
"<a href=\"javascript:alert%28%27xss%27%29\">xss</a>"

Will go through and will get executed when clicking on the link.

According to the URL spec, page 15
(http://www.w3.org/Addressing/URL/url-spec.txt), the colon is a reserved
character. I suggest not to include this character in the set of accepted
characters in the onsiteURL regex. Currently this character is accepted
through the range specified in \#-~.

I suggest breaking down this range in two in order to exclude the colon.
I'm not sure if then the other reserved characters should be excluded too
or how should they be handled (like the question mark character). 

This issue was originally posted in the mailing list:
https://lists.owasp.org/pipermail/owasp-antisamy/2008-May/000041.html

Original issue reported on code.google.com by [email protected] on 2 Jun 2008 at 2:34

Doesn't accept <style media="print, projection, screen">

Currently the provided antisamy examples will only accept a single media 
type in the media attribute of a style tag; however, HTML will accept 
several types separated by commas.  Ideally this would also be supported by 
antisamy easily.  I suspect I can resolve this in my own file using a regex 
of some sort.

Original issue reported on code.google.com by [email protected] on 17 Apr 2009 at 4:27

onsiteURL (and possibly other *URL regexes) breaks on hyphens, underscores

What steps will reproduce the problem?
1. Start an instance of AntiSamy with any policy file.
2. Filter the following tags:

<img src="/whoa_dude" />
<img src="/smiley-frown" />


What is the expected output? What do you see instead?

Expect both tags back and no errors to show. Instead it fails on the value
with the hyphen. Get back the HTML:

<img src="/whoa_dude">

And an error:
The <b>img</b> tag contained an attribute that we couldn't process. The
<b>src</b> attribute had a value of <u>&#47;smiley&#45;frown</u>. This
value could not be accepted for security reasons. We have chosen to remove
the <b>img</b> tag and its contents in order to process this input. 


If we then change "onsiteURL" to the following regex w/ properly escaped
hyphen:

value="([\p{L}\p{N}\\/\.\?=\#&amp;;\-~]+|\#(\w)+)"


Then it trips on the underscore. Resulting HTML:

<img src="/smiley-frown">

Error:
The <b>img</b> tag contained an attribute that we couldn't process. The
<b>src</b> attribute had a value of <u>&#47;whoa&#95;dude</u>. This value
could not be accepted for security reasons. We have chosen to remove the
<b>img</b> tag and its contents in order to process this input. 


So then we add the underscore to the regex to finally get:

value="([\p{L}\p{N}\\/\.\?=\#&amp;;\-_~]+|\#(\w)+)"

And this yields proper results for our narrow little test. Whether other
stuff gets broken or not, I've not checked (pertinent, given how fixing the
hyphen broke the underscore).





What version of the product are you using? On what operating system?

AntiSamy 1.1.1


Please provide any additional information below.

Note that the <img> tag comes out without closing itself for some reason.
This is a separate bug.

I've not tried this with offsiteURL, cssOnsiteUri, or cssOffsiteUri

Original issue reported on code.google.com by [email protected] on 23 May 2008 at 10:26

Scanner appends break lines

Steps to reproduce:

Call the "scan" method of antisamy using the following text:
"javascript <b>java</b> python" 
and any policy that accepts the "b" tag.

Example:
CleanResults cs = new AntiSamy().scan(text_, policy);
text_ = cs.getCleanHTML();

The "getCleanHTML" method then returns the following string:
"javascript <b>java</b>\n python\n".

It inserts two breaklines, one after the end of the "b" tag and another one
at the end of the string.

I expect the string to be identical to the input since the "b" tag is an
allowed one, no breaklines should have been introduced.

This is caused in the "scan" method of AntiSamyDOMScanner. When creating
the OutputFormat, it sets the indenting to true and a value of 2. Why is
the indenting needed? Turning the indenting to false and removing the
"setIndent(2)" gives the right result. AntiSamy shouldn't attempt to indent
or pretty print the input.

Original issue reported on code.google.com by [email protected] on 30 May 2008 at 1:45

"face" font attribute should accept hyphen in default policies.

In the example policy "antisamy-myspace-1.2.xml" the face attribute doesn't
accept the hyphen:

<attribute name="face">
  <regexp-list>
    <regexp value="[\w;, ]+"/>
  </regexp-list>
</attribute>

So something like:

<font face="sans-serif">hello</font>

Won't be accepted, the hyphen should probably be added, example:

<attribute name="face">
  <regexp-list>
    <regexp value="[\w;, \-]+"/>
  </regexp-list>
</attribute>

Original issue reported on code.google.com by [email protected] on 4 Sep 2008 at 4:44

VM Out of Memory error

What steps will reproduce the problem?
1. Try to clean this: <SCRIPT =">" SRC=""></SCRIPT>

This causes an infinite recursion loop and runs the VM out of memory.

Original issue reported on code.google.com by [email protected] on 15 Dec 2008 at 9:32

Internationalization Issue

What steps will reproduce the problem?
1. Using the sample war (AntiSamyDemoWebApp.war)
2. Enter å to be validated
3. Select any of the policies (exception NO POLICY) 

What is the expected output? What do you see instead?
Except: å
Get: Ã¥

What version of the product are you using? On what operating system?
1.3 AntiSamyDemoWebApp.war

Please provide any additional information below.

If I make the following changes the index.jsp page:
Add  contentType="text/html;charset=UTF-8" to the page declaration.
Resulting in <%@ page language="java" contentType="text/html;charset=UTF-8"%>

Add the code request.setCharacterEncoding("UTF-8"); at the beginning of the
first code block

Change the form to POST rather then GET.

Then I get the results I expect, however if I then but in invalid html (<a
hre="dfhads">å</a>) and use the Chinese language the Chinese character do
not come through correctly.

I get: æ
�è®°aå��å�«ä¸�ä¸ªæ²¡æ³�å¤�ç��ç��å±�æ�§ã��
�å±�æ�§hreå·²è¢«è¿�æ»¤ï¼�ä½�æ 
�è®°ä¿�å�ä¸�å��ã��

I would expect: 
标记a包含一个没法处理的属性。属性hre已被过滤，但标记保�
��不变。

Original issue reported on code.google.com by [email protected] on 13 Jul 2009 at 6:37

Seemingly-equivalent attribute tags yield two different behaviors

What steps will reproduce the problem?
1. On the following HTML:

<b id="feh" fehOnly="feh" intOnly="123d" badAtt="poison">woot</b>


2. Use the following filter:

        <tag name="b" action="validate">
            <attribute name="id">
            </attribute>
            <attribute name="fehonly">
                <literal-list>
                    <literal value="feh"/>
                </literal-list>
            </attribute>
            <attribute name="intonly">
                <regexp-list>
                    <regexp name="integer"/>
                </regexp-list>
            </attribute>
        </tag>


3. And you will get:

***** DIRTY HTML *****
<b id="feh" fehOnly="feh" intOnly="123d" badAtt="poison">woot</b>
***** CLEAN HTML *****
<b fehonly="feh">woot</b>

***** ERRORS *****
The <b>badatt</b> attribute of the <b>b</b> tag has been removed for
security reasons. This removal should not affect the display of the HTML
submitted.
The <b>b</b> tag contained an attribute that we couldn't process. The
<b>id</b> attribute had a value of <u>feh</u>. This value could not be
accepted for security reasons. We have chosen to remove the <b>id</b>
attribute from the tag and leave everything else in place so that we could
process this input.
The <b>b</b> tag contained an attribute that we couldn't process. The
<b>intonly</b> attribute had a value of <u>123d</u>. This value could not
be accepted for security reasons. We have chosen to remove the
<b>intonly</b> attribute from the tag and leave everything else in place so
that we could process this input.



What is the expected output? What do you see instead?


When the following (a):

            <attribute name="id">
            </attribute>

Is changed to (b):

            <attribute name="id"></attribute>

Or even (c):

            <attribute name="id"/>

The output is instead:

***** DIRTY HTML *****
<b id="feh" fehOnly="feh" intOnly="123d" badAtt="poison">woot</b>
***** CLEAN HTML *****
<b fehonly="feh" id="feh">woot</b>

***** ERRORS *****
The <b>badatt</b> attribute of the <b>b</b> tag has been removed for
security reasons. This removal should not affect the display of the HTML
submitted.
The <b>b</b> tag contained an attribute that we couldn't process. The
<b>intonly</b> attribute had a value of <u>123d</u>. This value could not
be accepted for security reasons. We have chosen to remove the
<b>intonly</b> attribute from the tag and leave everything else in place so
that we could process this input.

Notice that the *id* attribute is preserved.


What version of the product are you using? On what operating system?
AntiSamy v.1.1.1, MacOS Tiger



Please provide any additional information below.

I suspect this was meant to be a feature and not a bug. But it seems like
(a), (b), and (c) should all be equivalent. Instead, a newline in the node
changes the filtering from "accept all" to "accept nothing". If nothing
else, seems like this should be documented. 

I'd understand if you chose to do nothing about this, since the fix is
really changing the validation so that unspecified values for attributes in
the policy files default to giving error, which is a total pain (if
philosophically correct, given that this is a whitelist, after all). 

This bug is really about how treating a tag in ways that one could easily
think of as equivalent instead yields two very different behaviors, and
that this isn't documented anywhere.

All that aside, great job on AntiSamy!

Original issue reported on code.google.com by [email protected] on 12 May 2008 at 7:51

Upload antisamy to maven repository

Please consider uploading AntiSamy to a Maven repository. That makes it
much easier to include it in a Maven-managed project.

A guid on how to upload artifacts to the central repository is here:
http://maven.apache.org/guides/mini/guide-central-repository-upload.html

Please let me know if I can help, eg. creating the POM file.

Original issue reported on code.google.com by [email protected] on 31 Jul 2008 at 9:35

Bug:ErrorMessageUtil

Hi,

in ErrorMessageUtil is a little bug.
He can not find the "error.attribute.invalid.filter" Propertie.
Change in line 37 to error.attribute.invalid.filtered

java.util.MissingResourceException: Can't find resource for bundle
java.util.PropertyResourceBundle, key error.attribute.invalid.filter
    at java.util.ResourceBundle.getObject(ResourceBundle.java:342)
    at java.util.ResourceBundle.getString(ResourceBundle.java:302)
    at org.owasp.validator.html.util.ErrorMessageUtil.localize(Unknown Source)
    at org.owasp.validator.html.util.ErrorMessageUtil.getMessage(Unknown Source)
    at org.owasp.validator.html.scan.AntiSamyDOMScanner.addError(Unknown Source)
    at
org.owasp.validator.html.scan.AntiSamyDOMScanner.recursiveValidateTag(Unknown
Source)
    at org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(Unknown Source)
    at org.owasp.validator.html.AntiSamy.scan(Unknown Source)


Thanke you!

Original issue reported on code.google.com by [email protected] on 21 Jul 2008 at 12:22

gavioto / owaspantisamy Goto Github PK

owaspantisamy's People

Contributors

Watchers

owaspantisamy's Issues

Recommend Projects

Recommend Topics

Recommend Org