Giter Club home page Giter Club logo

dryopteris's Introduction

Dryopteris

Dryopteris erythrosora is the Japanese Shield Fern. It also can be used to sanitize HTML to help prevent XSS attacks.

Usage

Let's say you run a web site, and you allow people to post HTML snippets.

Let's also say some script-kiddie from Norland posts this to your site, in an effort to swipe some credit cards:

<SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT>

Oooh, that could be bad. Here's how to fix it:

safe_html_snippet = Dryopteris.sanitize(dangerous_html_snippet)

Yeah, it's that easy.

In this example, safe_html_snippet will have all of its broken markup fixed by libxml2, and it will also be completely sanitized of harmful tags and attributes. That's twice as clean!

Sanitization Usage

You're still here? Ok, let me tell you a little something about the two different methods of sanitizing the Dryopteris offers.

Fragments

The first method is for html fragments, which are small snippets of markup such as those used in forum posts, emails and homework assignments.

Usage is the same as above:

safe_html_snippet = Dryopteris.sanitize(dangerous_html_snippet)

Generally speaking, unless you expect to have <html> and <body> tags in your HTML, this is the sanitizing method to use.

The only real limitation on this method is that the snippet must be a string object. (Support for IO objects was sacrificed at the altar of fixer-uppery-ness. If you need to sanitize data that's coming from an IO object, either socket or file, check out the next section on Documents).

Documents

Sometimes you need to sanitize an entire HTML document. (Well, maybe not you, but other people, certainly.)

safe_html_document = Dryopteris.sanitize_document(dangerous_html_document)

The returned string will contain exactly one (1) well-formed HTML document, with all broken HTML fixed and all harmful tags and attributes removed.

Coolness: dangerous_html_document can be a string OR an IO object (a file, or a socket, or ...). Which makes it particularly easy to sanitize large numbers of docs.

Whitewashing Usage

Whitewashing Fragments

Other times, you may want to remove all styling, attributes and invalid HTML tags. I like to call this "whitewashing", since it's putting a new layer of paint on top of the HTML input to make it look nice.

One use case for this feature is to clean up HTML that was cut-and-pasted from Microsoft(tm) Word into a WYSIWYG editor/textarea. Microsoft's editor is famous for injecting all kinds of cruft into its HTML output. Who needs that? Certainly not me.

whitewashed_html = Dryopteris.whitewash(ugly_microsoft_html_snippet)

Please note that whitewashing implicitly also sanitizes your HTML, as it uses the same HTML tag whitelist as sanitize(). It's implementation is:

  1. unless the tag is on the whitelist, remove it from the document
  2. if the tag has an XML namespace on it, remove it from the document
  3. remove all attributes from the node

Whitewashing Documents

Also note the existence of whitewash_document, which is analogous to sanitize_document.

Standing on the Shoulders of Giants

Dryopteris uses Nokogiri and libxml2, so it's fast.

Dryopteris also takes its tag and tag attribute whitelists and its CSS sanitizer directly from HTML5.

Authors

Quotes About Dryopteris

"dryopteris shields you from xss attacks using nokogiri and NY attitude"

"I just wanted to say thank you for your dryopteris plugin. It is by far the best sanitization I've found."

dryopteris's People

Contributors

flavorjones avatar brynary avatar tenderlove avatar jbarnette avatar queso avatar pauldix avatar

Stargazers

Angus H. avatar Jonathan Wilkins avatar Bastian avatar 草色青青 avatar Kris Leech avatar Ryan Rempel avatar gmarik avatar Shane Hanna avatar Peter Schrammel avatar Max Justus Spransy avatar Paul Smith avatar Chris Eskow avatar Maximilian Schoefmann avatar Bence Nagy avatar Ilya Grigorik avatar Sébastien Grosjean avatar nelson fernandez avatar  avatar Quake Wang avatar Jeffrey Chupp avatar  avatar Sean Porter avatar  avatar Yaroslav Markin avatar Scott Taylor avatar  avatar  avatar Aman Gupta Karmani avatar Luca G. Soave avatar Cameron Walters (cee-dub) avatar Jesse Clark avatar Dan Pickett avatar Eduard Bondarenko avatar liu wen ju avatar Jim Lindley avatar Jeff Hodges avatar Jonathan Conway avatar Nadeem Bitar avatar Pat Nakajima avatar  avatar Cristi Balan avatar Peter Cooper avatar Josh Knowles avatar  avatar

Watchers

 avatar James Cloos avatar  avatar

dryopteris's Issues

Won't whitewash elements like "</![endif]-->" and "</!--[if>"

I have a fragment of text that was spit out by Word.

The last line contains these ugly elements:

Unfortunately, dryopteris does not remove the elements that look like this:

The pattern seems to be an open bracket and a forward slash.

"</!--[if>" and "</![endif]-->"

Thanks,

Lake

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.