Comments (12)
Update: I have a proposal for your review now @muan! 🙌
Preview the full proposal of changes here: Proposed-all.html
.
This follows what I proposed in the last comment: that emojilib
's keywords be sourced from all associated words in Emojilib/Unicode and platforms we can scrape from. The ones I could easily access were: "fluemoji" (Fluent UI / Windows), "gemoji" (GitHub), and "twemoji" (Twitter).
Using 🛫 as an example, here's what that would look like:
Current | Proposed | Proposed Changes | ||
---|---|---|---|---|
➕ Added | ➖ Removed | ✔️ Unchanged | ||
|
|
|
|
|
Full comparison and proposal tables on: https://github.com/JoshuaKGoldberg/repros/tree/emojilib-platforms-keywords-comparison.
Unless directed otherwise, I'll send a big PR updating the keywords in this repo... soon. Hopefully later this month.
Note that the following emojis have significantly fewer keywords in the proposed changes:
- 🐦 went from 6 keywords to 1:
bird
- 🛃 went from 4 keywords to 1:
customs
- 🏜️ went from 4 keywords to 1:
desert
- 🐬 went from 9 keywords to 2:
dolphin
,flipper
- 🐘 went from 6 keywords to 1:
elephant
- 🦍 went from 4 keywords to 1:
gorilla
- ⛰️ went from 4 keywords to 1:
mountain
- 🐙 went from 7 keywords to 1:
octopus
- ❇️ went from 6 keywords to 2:
*
,sparkle
None of the platforms in emoji-platform-data
have more than 1-2 keywords for them. Adding in a more rich platform would fill back in those missing keywords. For example, asking the native macOS emoji picker for sea
includes 🐙 in the results. I added emoji-platform-data
issues labeled platform support
.
from emojilib.
+1, having docs on this would be great. I'm working on omnidan/node-emoji#132 to bring node-emoji
to emojilib@3
. The test cases in that draft PR are showing a lot of places where emojilib@3 removed conveniences the library relied on. For example, "heart"
shows up in a few emojis, but not ❤️ itself:
emojilib/dist/emoji-en-US.json
Lines 986 to 991 in e8e9a84
I wrote a quick script to find discrepencies:
// npm i emojilib-2@npm:emojilib@2 emojilib-3@npm:emojilib@3
const { lib: emojisV2 } = await import("emojilib-2");
const { default: emojisV3 } = await import("emojilib-3", {
assert: { type: "json" },
});
const missing = [];
const missingIgnoringAliases = [];
for (const [nameV2, detailsV2] of Object.entries(emojisV2)) {
const detailsV3 = emojisV3[detailsV2.char];
if (detailsV3?.includes(nameV2)) {
continue;
}
const complaint = { nameV2, detailsV2, detailsV3 };
missing.push(complaint);
const primaryAlias = detailsV3?.[0];
if (
primaryAlias &&
!/^(?:flag|two|smiling_face_with)_|_face$/.test(primaryAlias)
) {
missingIgnoringAliases.push(complaint);
}
}
console.table({
"Missing in general": missing.length,
"Missing ignoring a few quick aliases": missingIgnoringAliases.length,
});
┌──────────────────────────────────────┬────────┐
│ (index) │ Values │
├──────────────────────────────────────┼────────┤
│ Missing in general │ 678 │
│ Missing ignoring a few quick aliases │ 456 │
└──────────────────────────────────────┴────────┘
@muan is there a description anywhere of how #178's lists were generated? Or, if not, could you speak to how you generated it?
from emojilib.
@muan is there a description anywhere of how #178's lists were generated? Or, if not, could you speak to how you generated it?
I believe I had some hack-together local scripts so I don't recall the exact differences. But here's what might have happened:
Previously this project was exclusively built for github shortcodes at our internal hackathon, and with v3 I decided to move away from that. so the primary key became their official unicode names, which would explains why tada
was replaced with party popper
, poop
was replaced by pile of poo
.
IIRC, the official name of the emoji changes with each version sometimes too (gun -> water gun), which was why I made the character be the key now.
I feel like I would/should have done the work to compare and keep the GitHub shortcodes but I guess I did not.
So to add them all back, a name/alias comparison between GitHub's set and the unicode set could potentially do the trick.
from emojilib.
OK! Sorry for taking so long on this - I wanted to really think through the problem space. As in: what's a "keyword"?
Using the 🛫 emoji as an example, I think there are really 2-3 use cases for emoji keywords:
- 🆔 Identity: Where keywords can be used as either...
- 🌕 Full Identity: Terms that are a complete alias or title for the emoji (e.g.
airplane_departure
) - 🌗 Partial Identity: Terms that can be a part of the complete identity of the emoji, but aren't standalone (e.g.
airplane
,departure
)
- 🌕 Full Identity: Terms that are a complete alias or title for the emoji (e.g.
- 🔗 Relation: Terms that would relate to the emoji in searching, but aren't part of its identity (e.g.
airport
,taking
)
Ideally I'd propose emojilib separate at least 🆔 identity from 🔗 relation keywords. Some users will want only identity, e.g. node-emoji
's :shortcode:
replacement. Some users will want the relation ones as well, e.g. general text searches.
+1 to @muan's suggestion in #194 (comment) of a comparison. I'd say a programmatic approach would be the easiest & least controversy-risking approach for emojilib
. My proposal would be something like:
- 🆔 Identity keywords should be sourced from the Unicode standard, Emojipedia also-known-as and title, and platform shortcodes
- 🔗 Relation keywords should be sourced from the search terms defined for emoji in individual platforms
As for setting up that programmatic approach... we can get halfway there. I made a standalone emojipedia
package to scrape & store the Emojipedia data for each emoji. That data includes 🆔 identity shortcodes across Discord, Emojipedia (based on the Unicode standard), GitHub, and Slack.
Looking at the data that's in emojipedia
and/or emojilib@3
today on the 🛫 emoji, we can see that there are a lot of 🆔 identity keywords that are only in one of the two datasets but not both:
In Both 🌕 | Only in Emojilib 🌗 | Only in Emojipedia 🌓 | |||
---|---|---|---|---|---|
Full Keywords | Partial Keywords | Full Keywords | Partial Keywords | Full Keywords | Partial Keywords |
|
|
|
|
|
|
Full comparison on: https://github.com/JoshuaKGoldberg/repros/tree/emojilib-emojipedia-keywords-comparison.
My next task will be trying to similarly source the 🔗 relation keywords programmatically. That way we can make a script that populates emojilib
data automatically. 🔗 Relation keywords aren't stored on Emojipedia that I can find, so I plan on trying to find exports of individual platforms' emoji libraries such as https://github.com/github/gemoji.
from emojilib.
Hey sorry for the lack of response I was largely away last year.
TBH I have not thought about this at length. but I agree with what you've written here. If pull requests were sent for these keywords, I'd accept them all.
I suggest that a section be added to CONTRIBUTING.md or README.md that gives guidance to future contributors about questions like these.
I agree. I'd be happy to accept a PR for this if anyone's willing to send them.
from emojilib.
@thdoan sounds like you've done as much as I have. I wrote an emoji picker in Python that you're welcome to check out. The search works surprisingly well!
from emojilib.
Makes sense! I sent #226 as a draft for reference that only augments, rather than removes.
from emojilib.
@jacobwhall I'm planning to fork this and start an emoji autocomplete project also. Have you settled on a fast way to search through the aliases? I was thinking about doing something like a filter, but not sure if there are faster options out there.
UPDATE: I did some performance tests, and I think for best performance I'm going to flatten the arrays into strings -- finding partial text matches in strings is faster than doing the same operation on arrays.
https://jsbench.me/zql58n0oew/1
When doing a partial match on every keystroke, every bit of performance counts ^^.
from emojilib.
@jacobwhall cool, I'm experimenting with an emoji autocomplete by leveraging the browser's native datalist functionality. However, I've decided to start my emojis map from scratch based on https://emojipedia.org/ (all tedious manual work since they closed their API). We'll see how it goes.
from emojilib.
Thank you for your work on this @JoshuaKGoldberg
Note that the following emojis have significantly fewer keywords in the proposed changes
I suggest that we integrate individual keyword contributions into this new workflow. I think it's worth retaining the keywords from this project for the example emojis you provided. Contributions to this project could continue to add common-sense keywords that may have been overlooked by unicode/emojipedia/etc.
from emojilib.
Is there any indication when #226 will be moved from draft/will be merged? Interested in seeing a resolution to this upstream lib omnidan/node-emoji#132.
from emojilib.
Any progress on this guys? Like the idea of having a strict workflow in here instead of random keyword proposals
from emojilib.
Related Issues (20)
- Export a function for applying fitzpatrick_scale HOT 1
- Unicode Emoji 11 HOT 2
- you should add a hurt emojis
- Emojis to words HOT 2
- Some emojis show empty characters HOT 1
- New emojis please ? HOT 1
- Some emoji aliases contain trailing commas
- Expanding Keyword Library to include other symbols HOT 1
- Contextual references HOT 1
- No way to find emoji by name HOT 2
- 🙏 - No longer matching matching "pray"
- Add emoji names back to list of keywords HOT 1
- Keyword workflow
- Where can I find the mappings for current :EMOJICODE: systems (Slack, Discord, iOS, Android, etc.)? HOT 1
- Add type definitions HOT 1
- Unicode 15.1
- Update README with ESM instructions
- First entry for some emoji is slug, some are not HOT 2
- import issue of v2.4.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emojilib.