Giter Club home page Giter Club logo

re's Introduction

Regular Expression Construction

Complex regular expressions are hard to construct and even harder to
read. The Re library allows users to construct complex regular
expressions from simpler expressions. For example, consider the
following regular expression that will parse dates:

   /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/

Using the Re library, that regular expression can be built
incrementaly from smaller, easier to understand expressions.
Perhaps something like this:

  require 're'

  include Re

  delim                = re.any("- /.")
  century_prefix       = re("19") | re("20")
  under_ten            = re("0") + re.any("1-9")
  ten_to_twelve        = re("1") + re.any("012")
  ten_and_under_thirty = re.any("12") + re.any("0-9")
  thirties             = re("3") + re.any("01")

  year = (century_prefix + re.digit.repeat(2)).capture(:year)
  month = (under_ten | ten_to_twelve).capture(:month)
  day = (under_ten | ten_and_under_thirty | thirties).capture(:day)

  date = (year + delim + month + delim + day).all

Although it is more code, the individual pieces are smaller and
easier to independently verify. As an additional bonus, the capture
groups can be retrieved by name:

  result = date.match("2009-01-23")
  result[:year]      # => "2009"
  result[:month]     # => "01"
  result[:day]       # => "23"

Version

This document describes Re version 0.0.6.

Usage

  include Re

  number = re.any("0-9").all
  if number =~ string
    puts "Matches!"
  else
    puts "No Match"
  end

Examples

Simple Examples

  re("a")                -- matches "a"
  re("a") + re("b")      -- matches "ab"
  re("a") | re("b")      -- matches "a" or "b"
  re("a").many           -- matches "", "a", "aaaaaa"
  re("a").one_or_more    -- matches "a", "aaaaaa", but not ""
  re("a").optional       -- matches "" or "a"
  re("a").all            -- matches "a", but not "xab"

See Re::Rexp for a complete list of expressions.

Using re without an argument allows access to a number of common
regular expression constants. For example:

  re.space / re.spaces  -- matches " ", "\n" or "\t"
  re.digit / re.digits  -- matches a digit / sequence of digits

Also, re without arguments can also be used to construct character
classes:

  re.any                -- Matches any charactor
  re.any("abc")         -- Matches "a", "b", or "c"
  re.any("0-9")         -- Matches the digits 0 through 9
  re.any("A-Z", "a-z", "0-9", "_")
                        -- Matches alphanumeric or an underscore

See Re::ConstructionMethods for a complete list of common constants
and character class functions.

See Re.re, Re::Rexp, and Re::ConstructionMethods for details.

regexml Example

Regexml is an XML based language to express regular expressions.
Here is their example for matching URLs.

    <regexml xmlns="http://schemas.regexml.org/expressions">
        <expression id="url">
            <start/>
            <match equals="[A-Za-z]" max="*" capture="true"/> <!-- scheme (e.g., http) -->
            <match equals=":"/>
            <match equals="//" min="0"/> <!-- mailto: and news: URLs do not require forward slashes -->
            <match equals="[0-9.\-A-Za-z@]" max="*" capture="true"/> <!-- domain (e.g., www.regexml.org) -->
            <group min="0">
                <match equals=":"/>
                <match equals="\d" max="5" capture="true"/> <!-- port number -->
            </group>
            <group min="0" capture="true"> <!-- resource (e.g., /sample/resource) -->
                <match equals="/"/>
                <match except="[?#]" max="*"/>
            </group>
            <group min="0">
                <match equals="?"/>
                <match except="#" min="0" max="*" capture="true"/> <!-- query string -->
            </group>
            <group min="0">
                <match equals="#"/>
                <match equals="." min="0" max="*" capture="true"/> <!-- anchor tag -->
            </group>
            <end/>
        </expression>
    </regexml>

Here is the Re expression to match URLs:

    URL_PATTERN =
      re.any("A-Z", "a-z").one_or_more.capture(:scheme) +
      re(":") +
      re("//").optional +
      re.any("0-9", "A-Z", "a-z", "-@.").one_or_more.capture(:host) +
      (re(":") + re.digit.repeat(1,5).capture(:port)).optional +
      (re("/") + re.none("?#").many).capture(:path).optional +
      (re("?") + re.none("#").many.capture(:query)).optional +
      (re("#") + re.any.many.capture(:anchor)).optional

    URL_RE = URL_PATTERN.all

Performance

We should say a word or two about performance.

First of all, building regular expressions using Re is slow. If you
use Re to build regular expressions, you are encouraged to build the
regular expression once and reuse it as needed. This means you
won’t do a lot of inline expressions using Re, but rather assign the
generated Re regular expression to a constant. For example:

  PHONE_RE = re.digit.repeat(3).capture(:area) +
               re("-") +
               re.digit.repeat(3).capture(:exchange) +
               re("-") +
               re.digit.repeat(4)).capture(:subscriber)

Alternatively, you can arrange for the regular expression to be
constructed only when actually needed. Something like:q

  def phone_re
    @phone_re ||= re.digit.repeat(3).capture(:area) +
                    re("-") +
                    re.digit.repeat(3).capture(:exchange) +
                    re("-") +
                    re.digit.repeat(4)).capture(:subscriber)
  end

That method constructs the phone number regular expression once and
returns a cached value thereafter. Just make sure you put the
method in an object that is instantiated once (e.g. a class method).

When used in matching, Re regular expressions perform fairly well
compared to native regular expressions. The overhead is a small
number of extra method calls and the creation of a Re::Result object
to return the match results.

If regular expression performance is a premium in your application,
then you can still use Re to construct the regular expression and
extract the raw Ruby Regexp object to be used for the actual
matching. You lose the ability to use named capture groups easily,
but you get raw Ruby regular expression matching performance.

For example, if you wanted to use the raw regular expression from
PHONE_RE defined above, you could extract the regular expression
like this:

  PHONE_REGEXP = PHONE_RE.regexp

And then use it directly:

  if PHONE_REGEXP =~ string
    # blah blah blah
  end

The above match runs at full Ruby matching speed. If you still
wanted named capture groups, you can something like this:

  match_data = PHONE_REGEXP.match(string)
  area_code = match_data[PHONE_RE.name_map[:area]]

License and Copyright

Copyright 2009 by Jim Weirich ([email protected]).
All rights Reserved.

Re is provided under the MIT open source license (see MIT-LICENSE)

Links:

Documentation :: http://re-lib.rubyforge.org
Source :: http://github.com/jimweirich/re
GemCutter :: http://gemcutter.org/gems/re
Download :: http://rubyforge.org/frs/?group_id=9329
Bug Tracker :: http://www.pivotaltracker.com/projects/47758
Continuous Integration :: http://travis-ci.org/#!/jimweirich/re
Author :: [email protected]

re's People

Contributors

jimweirich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.