richardanaya / emoji-rs Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I'm trying to figure out the easiest way to search and find all skin tones for a particular emoji like clapping hands, for instance. any tips?
As I first brought up in this reddit comment, I think this crate would really benefit from additional data being stored with each emoji.
Here I'll present a draft of what kind of data may be included, how to scrape that data, and methods of generation. If this idea is accepted, it can be work-shopped before implementation.
Currently, this repo pulls from the emoji-test page. While this does well to give basic information, it lacks much useful information. I propose using the Unicode CLDR to gather data from. Not only is it how major projects typically build emoji libraries, but it has much more data.
Some data that can be scraped with this method is as follows:
face with tears of joy
Smileys & People
face-positive
face
, face with tears of joy
, joy
, laugh
, tear
fully-qualified
Gathering data can be done in a few steps:
Gathering the emoji-specific codepoints and initial data is easy. It's found in cldr/tools/java/org/unicode/cldr/util/data/emoji/emoji-test.txt
via the CLDR link above. I believe this is the same data as where this project currently pulls from, but this should be double checked and a link to latest
should be found.
Parsing basic info is done directly from the above file. This gives codepoint, string representation, qualification, and canonical name.
Cross referencing is done by examining files within common/annotations/*.xml
.
Interpreting the scraped data and dumping into a rust crate should be done with great care. There is a lot of data here, and I think a lot of room for improvement over the current method. I propose the following method:
build.rs
to download files and generate Rust code. This removes the dependency on javascript that this crate currently has, and would allow for a very small footprint -- all generation is done during build time.build.rs
to dump pre-formatted rust code into OUT_DIR
to be included directly in lib.rs
.lib.rs
to help in searching and filtering emoji.Good news is that CLDR gives annotations in a large number of languages. Bad news is this project should eventually account for that. I propose we stick with English for now and work that out later.
However, here is a rough idea of what I was thinking:
annotations/*.xml
file. For example, English localication would be enabled via the en
feature.en
should be a default featureI recommend some of the following crates while working on this project:
phf
to create perfect compile-time hashtablesxml-rs
to parse the annotationsquote
to generate the rust library codeproc_use
for separating large modules into different files (these files will get seriously huge if not seperated)More will be needed obviously, but I've had positive experiences with the ones above.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.