Giter Club home page Giter Club logo

Comments (4)

tonton-pixel avatar tonton-pixel commented on May 25, 2024 1

I actually wrote a similar module myself and, after a lot of extensive testing, I think I found why the reported match is incorrect.

IMHO, there is a slight flaw in the code used to generate the emoji regular expression:

module.exports = () => {
	// https://mathiasbynens.be/notes/es-unicode-property-escapes#emoji
	return /<% emojiSequence %>|\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;
};

From left to right:

  • U+261D U+FE0F is not matched by any emoji sequence parsed from emoji-sequences.txt or emoji-zwj-sequences.txt (the only sequences involving U+261D are the five skin tone variations U+261D U+1F3FB to U+261D U+1F3FF).
  • U+261D U+FE0F is not matched by \p{Emoji_Modifier_Base}\p{Emoji_Modifier} for the same reason, \p{Emoji_Modifier} as defined in emoji-data.txt can only be one of U+1F3FB to U+1F3FF.
  • Since \p{Emoji_Modifier} is optional, U+261D U+FE0F is then tested against \p{Emoji_Modifier_Base} only, and a match is found indeed but just for the first code point U+261D; since a regular expression engine is eager, it stops searching as soon as it finds a valid match, which prevents the rest of the expression to be tested, namely the last part \p{Emoji}\uFE0F which is the expression which would have produced the right match (\p{Emoji_Presentation} wouldn't have been a proper candidate either since it represents characters which an emoji presentation by default, and doesn't include U+261D).

So, I think \p{Emoji_Modifier} should not be optional in \p{Emoji_Modifier_Base}\p{Emoji_Modifier}?.

Actually, the whole expression could be entirely dropped since it is already taken care of by the injected <% emojiSequence %> which contain all the sequences of type Emoji_Modifier_Sequence which are strictly equivalent.

So, it should be instead:

module.exports = () => {
	return /<% emojiSequence %>|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;
};

from emoji-regex.

mathiasbynens avatar mathiasbynens commented on May 25, 2024

@tonton-pixel You’re absolutely right! The clearest way to express this is by using a not-yet(?)-standard RegExp feature, as described here: https://github.com/tc39/proposal-regexp-unicode-sequence-properties#matching-all-emoji-including-emoji-sequences

In other words, emojiSequence expands to what’s described here: https://github.com/tc39/proposal-regexp-unicode-sequence-properties#matching-emoji-sequences

const reEmojiSequence = /\p{Emoji_Flag_Sequence}|\p{Emoji_Tag_Sequence}|\p{Emoji_ZWJ_Sequence}|\p{Emoji_Keycap_Sequence}|\p{Emoji_Modifier_Sequence}/u;

from emoji-regex.

mathiasbynens avatar mathiasbynens commented on May 25, 2024

Actually, the Emoji_Modifier_Base comment is wrong. <% emojiSequence %> includes \p{Emoji_Modifier_Sequence}, but an \p{Emoji_Modifier_Base} symbol that is NOT followed by a \p{Emoji_Modifier} symbol doesn’t form a sequence, but it’s still an emoji. Here are some examples:

☝⛹✊✋✌✍🎅🏃🏄🏊🏋👂👃👆👇👈👉👊👋👌👍👎👏👐👦👧👨👩👮👰👱👲👳👴👵👶👷👸👼💁💂💃💅💆💇💪🕵🕺🖐🖕🖖🙅🙆🙇🙋🙌🙍🙎🙏🚣🚴🚵🚶🛀🤘🤙🤚🤛🤜🤝🤞🤦🤰🤳🤴🤵🤶🤷🤸🤹🤼🤽🤾'

So we need \p{Emoji_Modifier_Sequence} (as included in emojiSequence) but in addition, we need \p{Emoji_Modifier_Base}.

from emoji-regex.

mathiasbynens avatar mathiasbynens commented on May 25, 2024

Note that U+261D has both \p{Emoji_Modifier_Base} and \p{Emoji}.

from emoji-regex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.