Giter Club home page Giter Club logo

shaman.dom's Introduction

Shaman.Dom

An optimized version of HtmlAgilityPack with many bug fixes and improvements.

Features

  • Minimizes heap allocations and avoids duplicate strings in the heap
  • HtmlAttribute, HtmlAttributeCollection, HtmlNodeCollection are now structs
  • Proper encoding/unencoding of attributes
  • Fixed allocation on each call of HtmlNode.get_TagName()
  • Fixed imprecise parsing of <form> and <table>
  • Fixed imprecise parsing of malformed entities
  • Fixed imprecise parsing of extra closing tags
  • HtmlNode.HasClass(), HtmlNode.ClassList
  • Faster Entitize(), DeEntitize()
  • Faster Descendants()/DescendantsAndSelf()
  • Support for unicode surrogates
  • Removed legacy stuff
  • Support for lazily switching document encoding when <meta charset> is found
  • Support for .NET Standard
  • HtmlDocument.Tag for storing arbitrary data
  • HtmlDocument.PageUrl, HtmlDocument.BaseUrl

shaman.dom's People

Contributors

antiufo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

shaman.dom's Issues

Closing span tag written to wrong location

Using version 1.0.1.16 of Shaman.Dom, there is incorrect output when serializing the attached html file "test.html". Create an HtmlDocument instance from this file, then write the HtmlDocument to a new file using the WriteTo() method. You will find that a <span/> closing tag has been added on line 116. This has the effect of placing the two <div> tags on lines 114 and 115 before the <span/> when they should be after the <span/>. View the output file "test.out.html" to see this.

TestHtml.zip

Attribute values are automatically un-escaped

If an attribute in an HTML document contains escaped characters, those characters are un-escaped when the HtmlDocument class builds the DOM. This means that we can't be guaranteed that the DOM will reproduce the original HTML when it is written out using WriteTo().

For example, a tag such as this will be converted into invalid HTML if it is converted into an HtmlNode and then written out:

<meta name="ajs-editor.loader.resources" content="&lt;link type=&quot;text/css&quot; rel=&quot;stylesheet&quot; href=&quot;/s/d41d8cd98f00b204e9800998ecf8427e/en_GB-1988229788/4733/f235dd088df5682b0560ab6fc66ed22c9124c0be.3/7b3a13c140c13e9ca5f4681905edf481/_/download/contextbatch/css/editor,macro-browser,fullpage-editor,-atl.general/batch.css&quot; media=&quot;all&quot;&gt;
&lt;link type=&quot;text/css&quot; rel=&quot;stylesheet&quot; href=&quot;/s/d41d8cd98f00b204e9800998ecf8427e/en_GB-1988229788/4733/f235dd088df5682b0560ab6fc66ed22c9124c0be.3/7b3a13c140c13e9ca5f4681905edf481/_/download/contextbatch/css/editor,macro-browser,fullpage-editor,-atl.general/batch.css?media=%28max-width%3A+1450px%29&quot; media=&quot;(max-width: 1450px)&quot;&gt;
&lt;!--[if lt IE 9]&gt;
&lt;link type=&quot;text/css&quot; rel=&quot;stylesheet&quot; href=&quot;/s/d41d8cd98f00b204e9800998ecf8427e/en_GB-1988229788/4733/f235dd088df5682b0560ab6fc66ed22c9124c0be.3/7b3a13c140c13e9ca5f4681905edf481/_/download/contextbatch/css/editor,macro-browser,fullpage-editor,-atl.general/batch.css?conditionalComment=lt+IE+9&quot; media=&quot;all&quot;&gt;
&lt;![endif]--&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;/s/d41d8cd98f00b204e9800998ecf8427e/en_GB-1988229788/4733/f235dd088df5682b0560ab6fc66ed22c9124c0be.3/7b3a13c140c13e9ca5f4681905edf481/_/download/contextbatch/js/editor,macro-browser,fullpage-editor,-atl.general/batch.js&quot; &gt;&lt;/script&gt;
">

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.