Giter Club home page Giter Club logo

regexgen's Introduction

regexgen

Generates regular expressions that match a set of strings.

Installation

regexgen can be installed using npm:

npm install regexgen

Example

The simplest use is to simply pass an array of strings to regexgen:

const regexgen = require('regexgen');

regexgen(['foobar', 'foobaz', 'foozap', 'fooza']); // => /foo(?:zap?|ba[rz])/

You can also use the Trie class directly:

const {Trie} = require('regexgen');

let t = new Trie;
t.add('foobar');
t.add('foobaz');

t.toRegExp(); // => /fooba[rz]/

CLI

regexgen also has a simple CLI to generate regexes using inputs from the command line.

$ regexgen
Usage: regexgen [-gimuy] string1 string2 string3...

The optional first parameter is the flags to add to the regex (e.g. -i for a case insensitive match).

ES2015 and Unicode

By default regexgen will output a standard JavaScript regular expression, with Unicode codepoints converted into UCS-2 surrogate pairs.

If desired, you can request an ES2015-compatible Unicode regular expression by supplying the -u flag, which results in those codepoints being retained.

$ regexgen ๐Ÿ‘ฉ ๐Ÿ‘ฉโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿพโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป
/\uD83D\uDC69(?:(?:\uD83C[\uDFFB-\uDFFF])?\u200D\uD83D\uDCBB)?/

$ regexgen -u ๐Ÿ‘ฉ ๐Ÿ‘ฉโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿพโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป
/\u{1F469}(?:[\u{1F3FB}-\u{1F3FF}]?\u200D\u{1F4BB})?/u

Such regular expressions are compatible with current versions of Node, as well as the latest browsers, and may be more transferrable to other languages.

How does it work?

  1. Generate a Trie containing all of the input strings. This is a tree structure where each edge represents a single character. This removes redundancies at the start of the strings, but common branches further down are not merged.

  2. A trie can be seen as a tree-shaped deterministic finite automaton (DFA), so DFA algorithms can be applied. In this case, we apply Hopcroft's DFA minimization algorithm to merge the nondistinguishable states.

  3. Convert the resulting minimized DFA to a regular expression. This is done using Brzozowski's algebraic method, which is quite elegant. It expresses the DFA as a system of equations which can be solved for a resulting regex. Along the way, some additional optimizations are made, such as hoisting common substrings out of an alternation, and using character class ranges. This produces an an Abstract Syntax Tree (AST) for the regex, which is then converted to a string and compiled to a JavaScript RegExp object.

License

MIT

regexgen's People

Contributors

devongovett avatar gilmoreorless avatar mathiasbynens avatar ticky avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.