Giter Club home page Giter Club logo

proposal-regexp-features's People

Contributors

mathiasbynens avatar rbuckton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mathiasbynens

proposal-regexp-features's Issues

`\R` should not match in between CR and LF

I have a question regarding the equivalent regex.

From what I understand, the goal of UTS#18 Line Boundaries section is to say that CRLF should be treated as if it was a single character. However, I believe that neither the current equivalent regex in this proposal ((?>\r\n?|[\x0A-\x0C\x85\u{2028}\u{2029}])) nor the one in UTF#18 ((?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}])) fulfill that goal.

The problem I see is that they still match in between CR and LF (see tc39/proposal-regexp-v-flag#42). Consider /^\r\R$/u.test("\r\n"). According to the current proposal and UTS#18, this will return true. I think this is inconsistent with the behavior of line boundary assertions (^$) in UTS#18 where the position in between CR and LF is explicitly accounted for.

Shouldn't the equivalent regexes for this proposal and UTS#18 be (?>\r\n?|(?<!\r)\n|[\x0B-\x0C\x85\u{2028}\u{2029}]) and (?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}](?<!\u{D A})) respectively?

Originally posted by @RunDevelopment in #1 (comment)

Feature request: `\h` to match all "horizontal whitespace" characters

In several RegExp engines, \h works as a convenience method for specifying "horizontal whitespace". For the majority of cases you can capture these type of characters with only the space and tab characters ([\t ]) but that omits edge cases related to less commonly used, non-newline whitespace characters like en space (U+2002), em space (U+2003), and thin space (U+2009).

Essentially, \h gives you a subset of \s that omits the newline characters. Having access to this flag in ECMAScript would help me write regular expressions with greater confidence that I'll capture strings even if they contain rarely used whitespace characters:

Screen Shot 2022-04-02 at 4 10 18 PM

\h Language support

Not exhaustive, but this table is based on the engines available in regex101:

Language/Engine Supported? Behavior Example
PCRE Success Link
PCRE 2 Success Link
Java Success Link
Golang 🚫 Invalid token error Link
.NET 🚫 Invalid token error Link
Python 🚫 Matches literal h character Link
ECMAScript 🚫 Matches literal h character Link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.