Giter Club home page Giter Club logo

charabia's Introduction

Charabia

Library used by Meilisearch to tokenize queries and documents

Role

The tokenizer’s role is to take a sentence or phrase and split it into smaller units of language, called tokens. It finds and retrieves all the words in a string based on the language’s particularities.

Details

Charabia provides a simple API to segment, normalize, or tokenize (segment + normalize) a text of a specific language by detecting its Script/Language and choosing the specialized pipeline for it.

Supported languages

Charabia is multilingual, featuring optimized support for:

Script / Language specialized segmentation specialized normalization Segmentation Performance level Tokenization Performance level
Latin ✅ CamelCase segmentation compatibility decomposition + lowercase + nonspacing-marks removal + Ð vs Đ spoofing normalization 🟩 ~23MiB/sec 🟨 ~9MiB/sec
Greek compatibility decomposition + lowercase + final sigma normalization 🟩 ~27MiB/sec 🟨 ~8MiB/sec
Cyrillic - Georgian compatibility decomposition + lowercase 🟩 ~27MiB/sec 🟨 ~9MiB/sec
Chinese CMN 🇨🇳 jieba compatibility decomposition + kvariant conversion 🟨 ~10MiB/sec 🟧 ~5MiB/sec
Hebrew 🇮🇱 compatibility decomposition + nonspacing-marks removal 🟩 ~33MiB/sec 🟨 ~11MiB/sec
Arabic ال segmentation compatibility decomposition + nonspacing-marks removal + [Tatweel, Alef, Yeh, and Taa Marbuta normalization] 🟩 ~36MiB/sec 🟨 ~11MiB/sec
Japanese 🇯🇵 lindera IPA-dict compatibility decomposition 🟧 ~3MiB/sec 🟧 ~3MiB/sec
Korean 🇰🇷 lindera KO-dict compatibility decomposition 🟥 ~2MiB/sec 🟥 ~2MiB/sec
Thai 🇹🇭 dictionary based compatibility decomposition + nonspacing-marks removal 🟩 ~22MiB/sec 🟨 ~11MiB/sec
Khmer 🇰🇭 ✅ dictionary based compatibility decomposition 🟧 ~7MiB/sec 🟧 ~5MiB/sec

We aim to provide global language support, and your feedback helps us move closer to that goal. If you notice inconsistencies in your search results or the way your documents are processed, please open an issue on our GitHub repository.

If you have a particular need that charabia does not support, please share it in the product repository by creating a dedicated discussion.

About Performance level

Performances are based on the throughput (MiB/sec) of the tokenizer (computed on a scaleway Elastic Metal server EM-A410X-SSD - CPU: Intel Xeon E5 1650 - RAM: 64 Go) using jemalloc:

  • 0️⃣⬛️: 0 -> 1 MiB/sec
  • 1️⃣🟥: 1 -> 3 MiB/sec
  • 2️⃣🟧: 3 -> 8 MiB/sec
  • 3️⃣🟨: 8 -> 20 MiB/sec
  • 4️⃣🟩: 20 -> 50 MiB/sec
  • 5️⃣🟪: 50 MiB/sec or more

Examples

Tokenization

use charabia::Tokenize;

let orig = "Thé quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!";

// tokenize the text.
let mut tokens = orig.tokenize();

let token = tokens.next().unwrap();
// the lemma into the token is normalized: `Thé` became `the`.
assert_eq!(token.lemma(), "the");
// token is classfied as a word
assert!(token.is_word());

let token = tokens.next().unwrap();
assert_eq!(token.lemma(), " ");
// token is classfied as a separator
assert!(token.is_separator());

Segmentation

use charabia::Segment;

let orig = "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!";

// segment the text.
let mut segments = orig.segment_str();

assert_eq!(segments.next(), Some("The"));
assert_eq!(segments.next(), Some(" "));
assert_eq!(segments.next(), Some("quick"));

charabia's People

Contributors

afluffyhotdog avatar bors[bot] avatar carofg avatar choznerol avatar crudiedo avatar curquiza avatar cymruu avatar daniel-shuy avatar datamaker avatar dependabot[bot] avatar draliragab avatar dureuill avatar gmourier avatar goodhoko avatar harshalkhachane avatar irevoire avatar kerollmops avatar kination avatar manythefish avatar marinpostma avatar matthias-wright avatar meili-bors[bot] avatar meili-bot avatar miiton avatar mosuka avatar qbx2 avatar roms1383 avatar samyak2 avatar xshadowlegendx avatar yenwel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

charabia's Issues

Handle words containing non-separating dots and coma in Latin tokenization

Summary

Handle S.O.S as one word (S.O.S) instead of three (S, O, S) or numbers like 3.5 as one word (3.5) instead of two (3, 5).

Explanation

The actual tokenizer considers any . or , as a hard separator meaning that the two separated words are not considered to be part of the same context.
But there is some exceptions for some words like number that are separated by . or , but should be considered as one and only one word.

We should modify the actual latin tokenizer to handle this case.

Handle non-breakable spaces

The tokenizer must handle non-breakable spaces.

For example, it should handle the following examples this way:

  • 3 456 678 where the (space) is not considered as a separator
    - Альфа where ь is not considered as space, so a separator

Related to meilisearch/meilisearch#1335 cf @shekhirin comment

Implement a Japanese specialized Normalizer

Today, there is no specialized normalizer for the Japanese Language.

drawback

Meilisearch is unable to find the hiragana version of a word with a katakana query, for instance, ダメ, is also spelled 駄目, or だめ

Technical approach

Create a new Japanese normalizer that unifies hiragana and katakana equivalences.

Interesting libraries

  • wana_kana seems promising to convert everything in Hiragana

Files expected to be modified

Misc

related to product#532

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Latin script: Segmenter should split camelCased words

Today, Meilisearch is splitting snake_case, SCREAMING_CASE and kebab-case properly but doesn't split PascalCase nor camelCase.

drawback

Meilisearch doesn't completely support code documentation.

enhancement

Make Latin Segmenter split camelCased/PascalCase words:

  • "camelCase" -> ["camel", "Case"]
  • "PascalCase" -> ["Pascal", "Case"]
  • "IJsland" -> ["IJsland"] (Language trap)
  • "CASE" -> ["CASE"] (another trap)

Files expected to be modified

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Handle multi languages in the same attribute

The Tokenizer currently uses the whatlang library that detects the language of the attribute (in probability).

The Tokenizer must be able to detect several languages in the same attributes.

Also, maybe it's a better idea to let the user decide the language?

Arabic script: Implement specialized Segmenter

Currently, the Arabic Script is segmented on whitespaces and punctuation.

Drawback

Following the dedicated discussion on Arabic Language support and the linked issues, the agglutinative words are not segmented, for example in this comments:

the agglutinated word الشجرة => The Tree is a combination of الـ and شجرة
الـ is equivalent to The and it's always connected (not space separated) to the next word.

Enhancement

We should find a specialized segmenter for the Arabic Script, or else, a dictionary to implement our own segmenter inspired by the Thaï Segmenter.


Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Classify tokens after Segmentation instead of Normalization

Classifying tokens after Segmentation instead of Normalization in the tokenization pipeline would enhance the precision of the stop_words classification.
Today, stop words need to be normalized to be properly classified, however, the normalization is more or less lossy and can classy unexpected stop words.
For instance, in French maïs (corn in 🇬🇧) would be normalized as mais (but in 🇬🇧), and so, maïs will be classified as a stop word if the stop word list contains mais.
This would not be possible if the classifier is called before the normalizer.

Technical approach

Invert the normalization step and the classification step in the tokenization process

Add Actual Tokenizer state

Export actual meilisearch Tokenizer in this repository to start with a compatible version of it.
The goal it to enhance this iso state and be able to test it iteratively on meilisearch instead of deliver a final version

Refactor normalizers

Today creating a normalizer is way harder than creating a segmenter, this is mainly due to the char map, a necessary field to manage highlights.

Technical Approach

Refactor the Normalizer trait to implement a normalize_str and normalize_char method that takes a Cow<str> in parameter and return a Cow<str>. All the char map creation should be done in a function calling these two methods.

Make Latin Segmenter split on `'`

In French, some determiners and adverbs are fusioned with words that begin with a vowel using the ' character:

  • l'aventure
  • d'avantage
  • qu'il
  • ...

By default, the Latin segmenter doesn't split them.

Implement Pinyin normalizer

Today Meilisearch normalizes Chinese characters by converting traditional characters into simplified ones.

drawback

This normalization process doesn't seem to enhance the recall of Meilisearch.

enhancement

Following the official discussion about Chinese support in Meilisearch, it is more relevant to normalize Chinese characters by transliterating them into a Phonological version.
In order to have accurate phonology for Mandarin, we should normalize Chinese characters into Pinyin using the pinyin crates.

Files expected to be modified

Misc

related to product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Tokenizer for Ja/Ko

Hello~
I'm currently testing tokenizer with Japanese/Korean, but seems it is not working correctly.

Is there some working plan for this?

Thanks.

Upgrade Whatlang dependency

Whatlang introduced new Languages and Scripts in the newer version.
We should upgrade our dependency to the latest version.

Decompose Japanese compound words

Summary

The morphological dictionary that Lindera includes by default is IPADIC.
IPADIC includes many compound words. For example, 関西国際空港 (Kansai International Airport).
However, if you index in the default mode, the word 関西国際空港 (Kansai International Airport) will be indexed in the term 関西国際空港, and you will not be able to search for the keyword 空港 (Airport).
So, Lindera has a function to decompose such compound words.
This is a feature similar to Kuromoji's search mode.

`num_graphemes_from_bytes` does not work when used for a prefix of a raw Token

The Issue

The output of num_graphemes_from_bytes is wrong when:

  • num_bytes is smaller than the length of the string
  • the token does not have the char_map initialized - possibly since the Token was created outside of Tokenizer or because the unicode segmenter was not run.

It should return num_bytes back since each character is assumed to occupy one byte. Instead, it returns the length of the underlying string.

Context

This bug was introduced by me in #59 😆

See also: meilisearch/milli#426 (comment)

Publish tokenizer to crate.io

We should automate this push in a CI (triggered on each release for example)

  • Publish manually the first version
  • Add meili-bot as an Owner
  • Automate using CI

⚠️ Should be done by a core-engine team member

Enhance Chinese normalizer by unifying `Z`, `Simplified`, and `Semantic` variants

Following the official discussion about Chinese support in Meilisearch, it is relevant to normalize Chinese characters by unifying Z Simplified and Semantic variants before transliterating them into Pinyin.

to know more about each variant, you can read the dedicated report on unicode.org

There are several dictionaries listing variations that we can use, I suggest using the kvariants dictionary made by hfhchan (see the related documentation on the same repo).

technical approach

Import and Rework the dictionary to be a key-value binding of each variant, then, in the Chinese normalizer, convert the provided character before transliterating it into Pinyin.

Files expected to be modified

Misc

related to meilisearch/product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

readme Hebrew segmentation link points to jieba

As the title: on the readme, clicking on "unicode-segmentation" on the Hebrew row takes the user to the jieba repo.

I assume the correct link would be the same as Latin's "unicode-segmentation."

Implement an efficient `Nonspacing Mark` Normalizer

In the Information Retrieval (IR) context, removing Nonspacing Marks like diacritics is a good way to increase recall without losing much precision, like in Latin, Arabic, or Hebrew.

Technical Approach

Implement a new Normalizer, named NonspacingMarkNormalizer, that removes the nonspacing marks from a provided token (find a naive implementation with the exhaustive list in the Misc section).
Because there are a lot of sparse character ranges to match, it would be inefficient to create a big if-forest to know if a character is a nonspacing mark.
This way, I suggest trying several implementations of the naive implementation below in a small local project.

Interesting Rust Crates

  • hyperfine: a small command-line tool to benchmark several binaries
  • roaring-rs: a bitmap data structure that has an efficient contains method
  • once_cell: a good Library to create lazy statics already used in the repository

Misc

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Korean support

Hello. I’m going to submit a PR for korean support, please review.

Implement Jyutping normalizer

Today Meilisearch normalizes Chinese characters by converting traditional characters into simplified ones.

drawback

This normalization process doesn't seem to enhance the recall of Meilisearch.

enhancement

Following the official discussion about Chinese support in Meilisearch, it is more relevant to normalize Chinese characters by transliterating them into a Phonological version.
In order to have accurate phonology for Cantonese, we should normalize Chinese characters into Jyutping using the kCantonese dictionary of the unihan database.
We should find an efficent way to normalize characters, and so, the dictionary may be reformated.

Files expected to be modified

Misc

related to product#503
original source of the dictionnary: unihan.zip in https://unicode.org/Public/UNIDATA/

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Explain the name of the repo in the README

Following the @CaroFG idea, we could explain the name of the repo in the README since some people finds it "offensive"

Here are some explanations Many made on Twitter:

we choose the name of this repository in the same mood as discord or meili, giving the name of the problem we want to solve.
Personally, I don’t feel like it’s an offensive word, but more a funny pun with “char”.
Moreover, other tokenizers don’t always have an understandable name, for instance lindera maintained by @minoru_osuka or even jieba.
I hope my explanation was clear enough and I hope the name will not discourage you to use or even contribute to the project! 😊

The requirement or advice of chinese word segmentation

Describe the requirement
i expect the chinese input text will be splited into all possible words
for example:
1652617278(1)

The behavior of Current Version
1652617473(1)

The advice of optimization
i notice that you use jieba default constraction and this will cause some highlight errors or search errors of chinese word segmentation.So,can you use the cat_all method from jieba library to get chinese word segmentation?
1652617925(1)

Additional text or screenshots
1652618015(1)

I except your reply,thanks @ManyTheFish

Reimplement Japanese Segmenter

Reimplement Japanese segmenter using Lindera.

TODO list

  • Read Contributing.md about Segmenters implementation
  • Lindera loads dictionaries at initialization
    • Ensure that Lindera is not initialized at each tokenization
    • Add a feature flag for Japanese
  • use a custom config to initailize Lindera (better segmentation for search usage)

TokenizerConfig { mode: Mode::Decompose(Penalty::default()), ..TokenizerConfig::default() }

  • test segmenter

関西国際空港限定トートバッグ すもももももももものうち should give ["関西", "国際", "空港", "限定", "トートバッグ", " ", "すもも", "も", "もも", "も", "もも", "の", "うち"]

  • Add benchmarks

Disable HMM feature of Jieba

Today, we are using the Hidden Markov Model algorithm (HMM) provided by the cut method of Jieba to segment unknown Chinese words in the Chinese segmenter.

drawback

Following the subdiscussion in the official discussion about Chinese support in Meilisearch, it seems that the HMM feature of Jieba is not relevant in the context of a search engine. This feature creates longer words and inconsistencies in the segmentation, which reduces the recall of Meilisearch without significantly raising the precision.

enhancement

Deactivate the HMM feature in Chinese segmentation.

Files expected to be modified

Misc

related to product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Add an allowlist to the tokenizer builder

Today Charabia detects automatically the Language of the provided text choosing the best tokenization pipeline in consequence.

drawback

Sometimes the detection is not accurate, mainly when the provided text is short, and the user can't choose manually the Languages contained in the provided text.

enhancement

Add a new setting in the TokenizerBuilder forcing the detection to choose in a subset of Languages, and when there are no choices, skip the detection and pick directly the specialized pipeline.
Whatlang, the library used to detect the Language, provides a way to set a subset of Languages that can be detected with the Detector::with_allowlist method.

Technical approach:

  1. add an optional allowlist parameter to the method detect of the Detect trait in detection/mod.rs
  2. add a segment_with_allowlist and a segment_str_with_allowlist with an additional allowlist parameter to the Segment trait in segmenter/mod.rs
  3. add an allowlist method to the TokenizerBuilder struct in tokenizer.rs

The allowlist should be a hashmap of Script -> [Languages]

Files expected to be modified

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Move the FST based Segmenter in a standalone file

For the Thaï segmenter, we tried a Final-state-transducer (FST) based segmenter.
This segmenter has really good performance and the dictionaries encoded as FSTs are smaller than raw txt/csv/tsv dictionaries.
For now, the segmenter is in the Thaï segmenter file (segmenter/thai.rs), and, in order to reuse it for other Languages, it would be better to move this segmenter to its own file.
A new struct FstSegmenter may be created wrapping all the iterative segmentation logic.

File expected to be modified

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Implement a Compatibility Decomposition Normalizer

Meilisearch is unable to find Canonical and Compatibility equivalences, for instance, ガギグゲゴ can't be found with a query ガギグゲゴ.

Technical approach

Implement a new Normalizer CompatibilityDecompositionNormalizer using the method nfkd of the unicode-normalization crate.

Files expected to be modified

Misc

related to product#532

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Compile/Instal Charabia on openBSD

I am on openBSD running on Raspberry Pi 4. I am unable to install meilisearch due to cargo not able to find charabia. So i decided to
compile from source.
I downloaded the source from github and did "cargo run' in the charabia source code. I get the error:

error: failed to parse manifest at `/home/kabira/LibOpenSource/charabia/Cargo.toml`

Caused by:
  namespaced features with the `dep:` prefix are only allowed on the nightly channel and requires the `-Z namespaced-features` flag on the command-line

Any workaround suggesting would be great.

Tokenizer refactoring strategy

Implementation Branch: tokenizer-v1.0.0
Draft PR: #77

Summary

As a fast search engine, Meilisearch needs a tokenizer that is a pragmatic balance between processing time and relevancy.
The current implementation of the tokenizer leaks clarity and contains ugly hotfixes making contributions, optimizations, and maintenance difficult.

How to find a pragmatic balance between processing time and relevancy?

First of all, we are not linguists and we don't speak or understand most of the Language that we would want to support, this means that we can't write a tokenizer from scratch and prove that this tokenizer is relevant or not.
That's why the current implementation, and the future ones, rely on segmentation libraries like jieba, unicode-segmentation, or lindera to segment texts in words, theses libraries are recommended and included by external contributors in the library.
But this has some limits and the main one is the processing time, some libraries, even if they have good relevancy, don't suit our needs because the processing time is too long (👋 Jieba).

Relevancy

Because we can't measure relevancy by ourselves, we want to continue to rely on the community and external libraries.
In this perspective, we need to make the inclusion of an external library by an external contributor the easiest as possible:

Code shape

  • Refactor Pipeline by removing preprocessors and making normalizers global #76
  • Refactor Analyzer in order to make a new Tokenizerregistration straightforward #76
  • Simplify the return value of Tokenizer (returning a Script and a &str instead of a Token) #76
  • Wrap normalizer in an iterator allowing them to yield several items from 1 (["l'aventure"] -> ["l", "'", "adventure"]) #76
  • Add a search mode in Segmenter returning all the word derivation (tokenizers search mode are doing ngrams internally)
  • Enhance clarity by renaming some structure, function, and files (Segmenter instead of Tokenizer, chinese_cmn.rs instead of jieba.rs) #76
  • Create test macro allowing contributors to easily test their tokenizer and improve the trust we have in tests assuring that all tokenizers are equally tested

Documentation and contribution processes

  • Add documenting comments in main structures (Token, Tokenizer trait..) #76
  • Add a template of a tokenizer as a dummy example of how to add a new tokenizer #76
  • Add a template of a normalizer as a dummy example of how to add a new normalizer
  • Add a CONTRIBUTING.md explaining how to test, bench, and implement tokenizers
  • Enhance README.md
  • Create an issue triage process differencing each tokenizer scopes (detector, segmenter, normalizer, classifier) #88

Minimal requirement to have no regressions

  • Use unicode-segmentation instead of legacy tokenizer for Latin tokenization #76
  • Reimplement Chinese Segmenter (using Jieba)
  • Reimplement Japanese Segmenter (using Lindera) #89
  • Reimplement Deunicode Normalizer only on Script::Latin
  • Reimplement traditional Chinese translation preprocessor into a Normalizer only on Language::Cmn
  • Reimplement control Character remover Normalizer

Processing time

Because tokenization has an impact on Meilisearch performances, we have to measure the processing time of every new implementation and define limits that can't be reached in order to be merged. Sometimes, we should think of implementing by ourselves instead of relying on an external library that could significantly impact Meilisearch performances.

  • Refactor benchmarks to ease benchmarks creation by any contributor
  • Defines hard limits, like throughput thresholds, to objectively accept or refuse a contribution
  • Add workflows that run benchmarks on the main branch #91

Publish meilisearch tokenizer as a crates

In order to increase the visibility and external contributions, we may publish this library as a crate.

  • #51
  • Add a user documentation
  • #35

crates link: https://crates.io/crates/charabia

NLP

For now, we don't plan to use NLP to tokenize in Meilisearch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.