Comments (4)
I actually wrote a similar module myself and, after a lot of extensive testing, I think I found why the reported match is incorrect.
IMHO, there is a slight flaw in the code used to generate the emoji regular expression:
module.exports = () => {
// https://mathiasbynens.be/notes/es-unicode-property-escapes#emoji
return /<% emojiSequence %>|\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;
};
From left to right:
U+261D U+FE0F
is not matched by any emoji sequence parsed from emoji-sequences.txt or emoji-zwj-sequences.txt (the only sequences involvingU+261D
are the five skin tone variationsU+261D U+1F3FB
toU+261D U+1F3FF
).U+261D U+FE0F
is not matched by\p{Emoji_Modifier_Base}\p{Emoji_Modifier}
for the same reason,\p{Emoji_Modifier}
as defined in emoji-data.txt can only be one ofU+1F3FB
toU+1F3FF
.- Since
\p{Emoji_Modifier}
is optional,U+261D U+FE0F
is then tested against\p{Emoji_Modifier_Base}
only, and a match is found indeed but just for the first code pointU+261D
; since a regular expression engine is eager, it stops searching as soon as it finds a valid match, which prevents the rest of the expression to be tested, namely the last part\p{Emoji}\uFE0F
which is the expression which would have produced the right match (\p{Emoji_Presentation}
wouldn't have been a proper candidate either since it represents characters which an emoji presentation by default, and doesn't includeU+261D
).
So, I think \p{Emoji_Modifier}
should not be optional in \p{Emoji_Modifier_Base}\p{Emoji_Modifier}?
.
Actually, the whole expression could be entirely dropped since it is already taken care of by the injected <% emojiSequence %> which contain all the sequences of type Emoji_Modifier_Sequence
which are strictly equivalent.
So, it should be instead:
module.exports = () => {
return /<% emojiSequence %>|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;
};
from emoji-regex.
@tonton-pixel You’re absolutely right! The clearest way to express this is by using a not-yet(?)-standard RegExp feature, as described here: https://github.com/tc39/proposal-regexp-unicode-sequence-properties#matching-all-emoji-including-emoji-sequences
In other words, emojiSequence
expands to what’s described here: https://github.com/tc39/proposal-regexp-unicode-sequence-properties#matching-emoji-sequences
const reEmojiSequence = /\p{Emoji_Flag_Sequence}|\p{Emoji_Tag_Sequence}|\p{Emoji_ZWJ_Sequence}|\p{Emoji_Keycap_Sequence}|\p{Emoji_Modifier_Sequence}/u;
from emoji-regex.
Actually, the Emoji_Modifier_Base
comment is wrong. <% emojiSequence %>
includes \p{Emoji_Modifier_Sequence}
, but an \p{Emoji_Modifier_Base}
symbol that is NOT followed by a \p{Emoji_Modifier}
symbol doesn’t form a sequence, but it’s still an emoji. Here are some examples:
☝⛹✊✋✌✍🎅🏃🏄🏊🏋👂👃👆👇👈👉👊👋👌👍👎👏👐👦👧👨👩👮👰👱👲👳👴👵👶👷👸👼💁💂💃💅💆💇💪🕵🕺🖐🖕🖖🙅🙆🙇🙋🙌🙍🙎🙏🚣🚴🚵🚶🛀🤘🤙🤚🤛🤜🤝🤞🤦🤰🤳🤴🤵🤶🤷🤸🤹🤼🤽🤾'
So we need \p{Emoji_Modifier_Sequence}
(as included in emojiSequence
) but in addition, we need \p{Emoji_Modifier_Base}
.
from emoji-regex.
Note that U+261D has both \p{Emoji_Modifier_Base}
and \p{Emoji}
.
from emoji-regex.
Related Issues (20)
- © is recognised as an emoji HOT 2
- Typescript error when using require('emoji-regex') HOT 3
- Is this lib basically doing /\p{Emoji}|\p{Default_Ignorable_Code_Point}/gu ? HOT 3
- Question around choice of factory HOT 2
- Determine emoji type? HOT 2
- rename License file?
- some variations of emojis are not working with current version of emoji-regex library
- The face-exhaling emoji isn't matched correctly
- Node 20 LTS supports the new proposed RegExp flag linked in the source HOT 1
- Why does the second result return false? HOT 2
- Simplify build by leveraging rgi-emoji-regex-pattern HOT 1
- Certain emoji sequences are not recognized HOT 8
- Shopping Bag emoji doesn't match HOT 1
- typescript synthetic import HOT 2
- npm install emoji-regex returns MODULE_NOT_FOUND HOT 2
- Symbol # and Number 0-9 HOT 3
- .npmrc breaks local install HOT 1
- Does not match some emoji HOT 2
- How to use emoji-regex v10.0.0 with regexp unicode flag HOT 2
- Add changelog HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emoji-regex.