Giter Club home page Giter Club logo

emoji-rs's People

Contributors

richardanaya avatar shizcow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

emoji-rs's Issues

update crate?

Hey, I tried using this crate expecting the functionality of the #1 merge to be present and it's not. I don't see it on docs.rs either. Did you need to push up the new version of the crate?

Adding Extra Data

The problem

As I first brought up in this reddit comment, I think this crate would really benefit from additional data being stored with each emoji.

Here I'll present a draft of what kind of data may be included, how to scrape that data, and methods of generation. If this idea is accepted, it can be work-shopped before implementation.

Source Files

Currently, this repo pulls from the emoji-test page. While this does well to give basic information, it lacks much useful information. I propose using the Unicode CLDR to gather data from. Not only is it how major projects typically build emoji libraries, but it has much more data.

Some data that can be scraped with this method is as follows:

  • Codepoint characters
    • Ex: ๐Ÿ˜‚
  • Canonical name
    • Ex: face with tears of joy
  • Category name
    • Ex: Smileys & People
  • Subcategory name
    • Ex: face-positive
  • Keywords
    • Ex: face, face with tears of joy, joy, laugh, tear
  • Qualification
    • Ex: fully-qualified

Scraping Method

Gathering data can be done in a few steps:

  • Categorize emoji-specific codepoints
  • Parsing basic info
  • Cross referencing keywords

Gathering the emoji-specific codepoints and initial data is easy. It's found in cldr/tools/java/org/unicode/cldr/util/data/emoji/emoji-test.txt via the CLDR link above. I believe this is the same data as where this project currently pulls from, but this should be double checked and a link to latest should be found.

Parsing basic info is done directly from the above file. This gives codepoint, string representation, qualification, and canonical name.

Cross referencing is done by examining files within common/annotations/*.xml.

Packaging

Interpreting the scraped data and dumping into a rust crate should be done with great care. There is a lot of data here, and I think a lot of room for improvement over the current method. I propose the following method:

  • Use build.rs to download files and generate Rust code. This removes the dependency on javascript that this crate currently has, and would allow for a very small footprint -- all generation is done during build time.
  • After scraping data, use build.rs to dump pre-formatted rust code into OUT_DIR to be included directly in lib.rs.
  • In addition to having each codepoint chronicled, include a final metadata marker -- a compile time hashmap in lib.rs to help in searching and filtering emoji.

Localization

Good news is that CLDR gives annotations in a large number of languages. Bad news is this project should eventually account for that. I propose we stick with English for now and work that out later.

However, here is a rough idea of what I was thinking:

  • Use crate features for each localization. This will require semi-manual updating when new CLDR localizations come out, but I think it's worth it.
  • Each feature is the name of the annotations/*.xml file. For example, English localication would be enabled via the en feature.
  • en should be a default feature

Dependencies

I recommend some of the following crates while working on this project:

  • phf to create perfect compile-time hashtables
  • xml-rs to parse the annotations
  • quote to generate the rust library code
  • proc_use for separating large modules into different files (these files will get seriously huge if not seperated)

More will be needed obviously, but I've had positive experiences with the ones above.

searching

I'm trying to figure out the easiest way to search and find all skin tones for a particular emoji like clapping hands, for instance. any tips?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.