Giter Club home page Giter Club logo

tagsoup-megaparsec's Introduction

tagsoup-megaparsec

Hackage Build Status

A Tag token parser and Tag specific parsing combinators, inspired by parsec-tagsoup and tagsoup-parsec. This library helps you build a megaparsec parser using TagSoup's Tag as tokens.

Usage

DOM parser

We can build a DOM parser using TagSoup's Tag as a token type in Megaparsec. Let's start the example with importing all the required modules.

import Data.Text ( Text )
import qualified Data.Text as T
import Data.HashMap.Strict ( HashMap )
import qualified Data.HashMap.Strict as HMS
import Text.HTML.TagSoup
import Text.Megaparsec
import Text.Megaparsec.ShowToken
import Text.Megaparsec.TagSoup

Here's the data types used to represent our DOM. Node is either ElementNode or TextNode. TextNode data constructor takes a Text and ElementNode data constructor takes an Element whose fields consist of elementName, elementAttrs and elementChildren.

type AttrName   = Text
type AttrValue  = Text

data Element = Element
  { elementName :: !Text
  , elementAttrs :: !(HashMap AttrName AttrValue)
  , elementChildren :: [Node]
  } deriving (Eq, Show)

data Node =
    ElementNode Element
  | TextNode Text
  deriving (Eq, Show)

Our Parser is defined as a type synonym for TagParser Text. TagParser takes a type argument representing the string type and we chose Text here. We can pass any of StringLike types such as String and ByteString.

type Parser = TagParser Text

There is nothing new in defining a parser except that our token is Tag Text instead of Char. We can use any Megaparsec combinators we want as usual. Our node parser is either element or text so we used the choice combinator (<|>).

node :: Parser Node
node = ElementNode <$> element
   <|> TextNode <$> text

tagsoup-megaparsec library provides some Tag specific combinators.

  • tagText: parse a chunk of text.
  • anyTagOpen/anyTagClose: parse any opening and closing tag.

text and element parsers are built using these combinators.

NOTE: We don't need to worry about the text blocks containing only whitespace characters because all the parsers provided by tagsoup-megaparsec are lexeme parsers.

text :: Parser Text
text = fromTagText <$> tagText

element :: Parser Element
element = do
  t@(TagOpen tagName attrs) <- anyTagOpen
  children <- many node
  closeTag@(TagClose tagName') <- anyTagClose
  if tagName == tagName'
     then return $ Element tagName (HMS.fromList attrs) children
     else fail $ "unexpected close tag" ++ showToken closeTag

Now it's time to define our driver. parseDOM takes a Text and returns either ParseError or [Node]. We used many combinator to represent that there are zero or more occurences of node. We used TagSoup's parseTags to create tokens and passed it to Megaparsec's parse function.

parseDOM :: Text -> Either ParseError [Node]
parseDOM html = parse (many node) "" tags
  where tags = parseTags html

tagsoup-megaparsec's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.