embeddings-benchmark / mteb Goto Github PK
View Code? Open in Web Editor NEWMTEB: Massive Text Embedding Benchmark
Home Page: https://arxiv.org/abs/2210.07316
License: Apache License 2.0
MTEB: Massive Text Embedding Benchmark
Home Page: https://arxiv.org/abs/2210.07316
License: Apache License 2.0
Hello,
Thanks for this extensive work.
I've question about the sequence length of the various models used for this benchmark. Different models supports different sequence lengths, like text-embedding-ada upto 8191 tokens, while instructor-xl trained only with 512 max token length. Are these considered during evaluation?
Please forgive if I'm being ignorant.
Hi,
thank you for providing the benchmark and easy-to-use codebase!
When evaluating the sickr-sts task, there exists a KeyError: 'validation'. The reason is that the 'validation' is included in the "eval_splits" of SickrSTS task description while mteb/sts15-sts only provides the test set. Should the 'validation' be removed from the task description?
The BEIR tasks are currently all marked as S2S, but some of them are P2P or S2P / P2S. Retrieval is the only task where we have S2P / P2S. Does that make sense?
Options I see:
Any thoughts? cc @NouamaneTazi
I'd propose to add the commit hash of the revision to tasks:
from mteb import MTEB
from mteb.abstasks.AbsTaskReranking import AbsTaskReranking
from sentence_transformers import SentenceTransformer
class MindSmallReranking(AbsTaskReranking):
@property
def description(self):
return {
"name": "MindSmallReranking",
"hf_hub_name": "mteb/mind_small",
"description": "Microsoft News Dataset: A Large-Scale English Dataset for News Recommendation Research",
"reference": "https://www.microsoft.com/en-us/research/uploads/prod/2019/03/nl4se18LinkSO.pdf",
"type": "Reranking",
"category": "s2s",
"eval_splits": ["validation"],
"eval_langs": ["en"],
"main_score": "map",
"revision": "75937953179...",
}
model = SentenceTransformer("average_word_embeddings_komninos")
evaluation = MTEB(tasks=[MindSmallReranking()])
evaluation.run(model)
This is then fed into load_dataset
via revision=
& added to the results json file.
This partly addresses #21
^ It should be explained in the logs why it seems to be repeating the same thing
Task: AmazonReviewsClassification, split: test, language: en. Running...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 40 training sentences...
Batches: 100%|██████████| 2/2 [00:03<00:00, 1.60s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 5000 test sentences...
Batches: 100%|██████████| 157/157 [03:04<00:00, 1.18s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Fitting logistic regression classifier...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Evaluating...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 40 training sentences...
Batches: 100%|██████████| 2/2 [00:02<00:00, 1.39s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 5000 test sentences...
Batches: 100%|██████████| 157/157 [03:04<00:00, 1.18s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Fitting logistic regression classifier...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Evaluating...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 40 training sentences...
Batches: 100%|██████████| 2/2 [00:03<00:00, 1.68s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 5000 test sentences...
Batches: 100%|██████████| 157/157 [03:04<00:00, 1.18s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Fitting logistic regression classifier...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Evaluating...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 40 training sentences...
Batches: 100%|██████████| 2/2 [00:04<00:00, 2.48s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 5000 test sentences...
Batches: 100%|██████████| 157/157 [03:04<00:00, 1.18s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Fitting logistic regression classifier...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Evaluating...
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 40 training sentences...
Batches: 100%|██████████| 2/2 [00:03<00:00, 1.71s/it]
INFO:mteb.evaluation.evaluators.ClassificationEvaluator:Encoding 5000 test sentences...
Mind Small Reranking has no test split - For all other ds we use the test split afaik, so should we use the val split for that one?
Hello, this work is wonderful. However, I have one question: How do you ensure that the comparisons are fair? Data leak may occur if some of the models use the train/test data for pretraining or finetuning, particularly for new models to submit.
as far as I can tell, the sentence-transformers
dependency is not necessary for this code to run, only as a shorthand for model loading in cmd
. Because installing sentence-transformers
also installs torch, sentencepiece, tokenizers and transformers itself, this is quite a big dependency to package. Maybe the installation of sentence-transformers can be split off into an optional dependency?
i.e., pip install mteb[sentencetransformers]
could install mteb packaged with sentencetransformers. When running the functionality that requires sentencetransformers, the user could be prompted to install it.
Hi
I'm trying to run the below code on Colab
from mteb import MTEB
from sentence_transformers import SentenceTransformer
from mteb.tasks import QuoraRetrieval
model_name = "average_word_embeddings_komninos"
model = SentenceTransformer(model_name)
evaluation = MTEB(tasks=["QuoraRetrieval"]) # Only select clustering and retrieval tasks
results = evaluation.run(model, output_folder=f"results/{model_name}")
I get the following error
Retrieval
- QuoraRetrieval, beir, s2s
ERROR:mteb.evaluation.MTEB:Error while evaluating QuoraRetrieval: File /root/.cache/huggingface/datasets/BeIR/quora/qrels/validation.tsv not present! Please provide accurate file.
ERROR:mteb.evaluation.MTEB:Please check all the error logs at: error_logs.txt
When I check the qrels folder, I only find dev and test tsvs. This issue occurs for other tasks as well, such as MSMARCO.
Any idea what I'm doing wrong?
Packages:
!pip install -q git+https://github.com/UKPLab/sentence-transformers.git
!pip install -q git+https://github.com/embeddings-benchmark/mteb.git
!pip install -q git+https://github.com/NouamaneTazi/beir.git@fix_drpes_ids
!pip install -q evaluate
Doing
import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer
class SentenceTransformerX(SentenceTransformer):
pass
model_name = "sentence-transformers/average_word_embeddings_komninos"
model = SentenceTransformerX(model_name)
evaluation = MTEB(tasks=["SciFact"])
a = time.time()
results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
b = time.time()
hangs at
p = ctx.Process(
target=SentenceTransformer._encode_multi_process_worker,
args=(process_id, device_name, self.model, input_queue, output_queue),
daemon=True,
)
I think you're the expert here - any ideas? @NouamaneTazi
This only affects the latest BEIR, i.e. I think it has something to do with DPRES. Using the below is fine
!pip install -q git+https://github.com/UKPLab/sentence-transformers.git
!pip install -q git+https://github.com/embeddings-benchmark/mteb.git
!pip install beir==1.0.0
Since BEIR doesn't provide a HF dataset for the CQA corpus we uploaded this one I think: https://huggingface.co/datasets/mteb/cqadupstack-retrieval/tree/main/data
However it cannot be loaded currently - possibly cause of the different format than other BEIR datasets (json's instead of jsonl).
Thus, CQADupstack tasks only work with beir <= 1.0.0
using the old data loading as of right now
Was Instruct model tuned per task or was the default setting used?
We currently skip when running a new split & a result file of the same name exists
It'd be better to run & append the new results of the new split to the existing result file.
In fact I'm not sure if this is a bug. Below is what I thought the problem was
Before evaluating for MTOPIntentClassification, mteb will download a module in cache. In my case the module is located at /data2/.cache/huggingface/modules/datasets_modules/datasets/mteb--mtop_intent/7353fdf5b13e9bfd297fbf98bf66e7e0ee626def6321bd9293bbc6ee1d5fae7b
and there is a script called mtop_intent.py
:
import json
import datasets
_DESCRIPTION = "MTOP: Multilingual Task-Oriented Semantic Parsing"
_LANGUAGES = ["en", "de", "es", "fr", "hi", "th"]
URL = "" # https://huggingface.co/datasets/mteb/mtop/resolve/main/"
The URL is empty so the module will assume the files are located at pwd
, which causes an error.
I change the URL to
URL = "https://huggingface.co/datasets/mteb/mtop_intent/resolve/main/"
and everything works fine.
Hello,
I've been trying to evaluate several custom models on MTEB
I faced some errors,
2022-11-08 11:12:04.822852 >>> ClimateFEVER
Traceback (most recent call last):
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/mteb/evaluation/MTEB.py", line 235, in run
results = task.evaluate(model, split, **kwargs)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/mteb/abstasks/AbsTaskRetrieval.py", line 93, in evaluate
results = retriever.retrieve(corpus, queries)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/beir/retrieval/evaluation.py", line 23, in retrieve
return self.retriever.search(corpus, queries, self.top_k, self.score_function, **kwargs)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/beir/retrieval/search/dense/exact_search_multi_gpu.py", line 150, in search
cos_scores_top_k_values, cos_scores_top_k_idx, chunk_ids = metric.compute()
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/evaluate/module.py", line 433, in compute
self._finalize()
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/evaluate/module.py", line 390, in _finalize
self.data = Dataset(**reader.read_files([{"filename": f} for f in file_paths]))
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/datasets/arrow_reader.py", line 236, in read_files
pa_table = self._read_files(files, in_memory=in_memory)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/datasets/arrow_reader.py", line 171, in _read_files
pa_table: Table = self._get_table_from_filename(f_dict, in_memory=in_memory)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/datasets/arrow_reader.py", line 306, in _get_table_from_filename
table = ArrowReader.read_table(filename, in_memory=in_memory)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/datasets/arrow_reader.py", line 325, in read_table
return table_cls.from_file(filename)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/datasets/table.py", line 1036, in from_file
table = _memory_mapped_arrow_table_from_file(filename)
File "/home/qmin/anaconda3/envs/sgbert/lib/python3.7/site-packages/datasets/table.py", line 51, in _memory_mapped_arrow_table_from_file
pa_table = opened_stream.read_all()
File "pyarrow/ipc.pxi", line 691, in pyarrow.lib.RecordBatchReader.read_all
File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Expected to be able to read 12300328 bytes for message body, got 12300316
(other retrieval tasks have the same issue)
Is there any workaround?
It's inconvenient to have two datasets with the same case insensitive name: SciDocs
& SCIDOCS
. E.g. MacOS is by default case insensitive.
Let's rename SciDocs? I'd propose SciDocsS2S or SciDocsRR instead, but maybe someone has a better idea.
Also do I understand correctly that SciDocs is the same SciDocs as in useb? I.e. it includes all the tasks of Cite, Co-Cite, Co-Read, Co-View, see their paper for details (+ recomm).
cc @loicmagne as I think you added it?
I understand that standard word embeddings (like average_word_embeddings_glove.6B.300d) are downloaded from hugging face, but is there code to evaluate new embeddings? I have a .txt file with vectors trained with the GloVe model that I would like to evaluate.
I see in the documentation that we can write our own encoder model that can be evaluated. But is there a way to only input a .txt file of the word embeddings for evaluation?
If there is no code to support a .txt file input, then for the encoder, is the input sentences
tokenized already?
cc @NouamaneTazi ?
The MindSmallReranking
dataset contains 2,362,514 queries, 107,968 positive docs, 2,550,123 negative docs.
Currently, RerankingEvaluator.compute_metrics_batched()
just gather all texts together and encode them, which would require a lot of memory / GPU memory. (I got CUDA OOM on 32GB V100.)
I made minor modifications to the code to implement chunked computation, reducing memory usage.
If this change is acceptable, I would be glad to make a PR.
Thanks.
It's referenced in https://arxiv.org/pdf/2212.09741.pdf
STS22 scores should not be negative https://competitions.codalab.org/competitions/33835#results
valuation = MTEB(tasks=["AmazonCounterfactualClassification"],task_langs=["zh"])
valuation.run(model)
Task: AmazonCounterfactualClassification, split: validation, language: en. Running...
^CTraceback (most recent call last):
File "", line 1, in
The following code should give the same results
import logging
from mteb import MTEB
from sentence_transformers import SentenceTransformer
logging.basicConfig(level=logging.INFO)
model_name = "average_word_embeddings_komninos"
model = SentenceTransformer(model_name)
evaluation = MTEB(tasks=["Banking77Classification"])
evaluation.run(model, output_folder=None)
It would be nice to write a test for that as well in tests
folder
I'm getting the below for
from mteb import MTEB
from mteb.abstasks.AbsTaskClustering import AbsTaskClustering
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("average_word_embeddings_komninos")
evaluation = MTEB(tasks=["BUCC"])
evaluation.run(model)
{
"dataset_version": null,
"mteb_version": "0.0.2",
"test": {
"de-en": {
"accuracy": 0.0017745302713987473,
"f1": 0.0017745302713987473,
"precision": 0.0017745302713987473,
"recall": 0.0017745302713987473
},
"evaluation_time": 456.59,
"fr-en": {
"accuracy": 0.0,
"f1": 0.0,
"precision": 0.0,
"recall": 0.0
},
"ru-en": {
"accuracy": 6.927606511950121e-05,
"f1": 6.927606511950121e-05,
"precision": 6.927606511950121e-05,
"recall": 6.927606511950121e-05
},
"zh-en": {
"accuracy": 0.0,
"f1": 0.0,
"precision": 0.0,
"recall": 0.0
}
}
}
Seems too low - I think there's a bug
It looks like the multi-gpu support for the BEIR benchmarks is still disabled as of https://github.com/embeddings-benchmark/mteb/releases/tag/1.0.1.
What is the current status of it? Is it actively developed in BEIR repo?
Btw. we are successful in running the multi gpu utils in the BEIR repository with downgraded dependencies but would like to switch over to MTEB to have a broader benchmark collection.
Sometimes we would like to override already computed evaluations. It would be cool to have a flag override_results
that would handle that
For BUCC-fr-en we currently search from english texts given french texts.
a) Most uses cases are probably the inverse
b) We should generally support both ways / automatically run both (e.g. like it's done in https://arxiv.org/pdf/2007.01852.pdf)
What do you think @NouamaneTazi ?
It looks like the TwitterSemEval2015
test data combines the train, dev, and test data from the original task. Was this intentional? My assumption is that some of the models would have trained on that data
An important factor in choosing embeddings is the speed of embedding.
I suggest adding a "tab" in the evluation called "Speed" and it would be represented in sentences/sec for example (canalso be tokens/sec).
This is a very useful feature of the SBERT site for example:
https://www.sbert.net/docs/pretrained-models/msmarco-v3.html
and efficiency as a parameter is already mentioned in your paper.
It'd be nice to also have information about the hardware used in the results file in addition to the evaluation time if this is easy to get!
If they're the same, one should be removed; if not should be fixed
I think we should have some form of versioning.
E.g. for each task have an additional field in the json results file called "version" or "revision". We can set it to 0 for all tasks for now or to e.g. the commit string of the dataset on the Hub.
Currently we use the following for classification:
As the test set embeddings will be the same, we can compute the test set embeddings once and just need to feed them to the LR classifier. This will make the 10-times repeated evaluation much faster.
It's a great dataset for bitext mining! Any help welcome 🤗
Edit: Done via #218
Hi all,
thank you for sharing this awesome repo!
I am having experiments on classification tasks.
I am wondering if the inference results (e.g., predicted class for each test sentence) and evaluation results (e.g., whether predicted class for each test sentence is correct) are available via some commands?
Best regards,
Jihyuk
Is the validation set used for RedditClustering and StackOverflowDupQuestions? Related to #83 and #84.
2022-12-16 05:26:27.662809 >>> RedditClustering
Traceback (most recent call last):
File "/home/aiops/shenxd/Dependency/anaconda3/envs/HuggingFace/lib/python3.7/site-packages/mteb/evaluation/MTEB.py", line 235, in run
results = task.evaluate(model, split, **kwargs)
File "/home/aiops/shenxd/Dependency/anaconda3/envs/HuggingFace/lib/python3.7/site-packages/mteb/abstasks/AbsTaskClustering.py", line 17, in evaluate
for cluster_set in tqdm.tqdm(self.dataset[split], desc="Clustering"):
File "/home/aiops/shenxd/Dependency/anaconda3/envs/HuggingFace/lib/python3.7/site-packages/datasets/dataset_dict.py", line 57, in getitem
return super().getitem(k)
KeyError: 'validation'
2022-12-16 05:26:39.846393 >>> StackOverflowDupQuestions
Traceback (most recent call last):
File "/home/aiops/shenxd/Dependency/anaconda3/envs/HuggingFace/lib/python3.7/site-packages/mteb/evaluation/MTEB.py", line 235, in run
results = task.evaluate(model, split, **kwargs)
File "/home/aiops/shenxd/Dependency/anaconda3/envs/HuggingFace/lib/python3.7/site-packages/mteb/abstasks/AbsTaskReranking.py", line 21, in evaluate
data_split = self.dataset[split]
File "/home/aiops/shenxd/Dependency/anaconda3/envs/HuggingFace/lib/python3.7/site-packages/datasets/dataset_dict.py", line 57, in getitem
return super().getitem(k)
KeyError: 'validation'
Currently when a task name is wrong nothing happens upon evaluation.run
I think it'd be nice to raise a warning that a task wasn't found
We could add the prompt retrieval benchmark: https://arxiv.org/abs/2209.01975
Hello @Muennighoff , I encountered the following issue when loading ArxivClusteringP2P dataset
repro
from mteb import MTEB
def test_loading_data():
eval = MTEB(tasks=["ArxivClusteringP2P"])
eval.load_tasks_data()
return
if __name__ == "__main__":
test_loading_data()
output:
Generating test split: 23 examples [00:04, 6.38 examples/s]Failed to read file '/root/.cache/huggingface/datasets/downloads/extracted/2368c5e45f666e09c88b163b1db73ad115ce53e3954755e8936da145b036ae4b' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Missing a closing quotation mark in string. in row 0
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json/json.py", line 153, in _generate_tables
dataset = json.load(f)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 25447588)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/datasets/builder.py", line 1817, in _prepare_split_single
for _, table in generator:
File "/usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json/json.py", line 156, in _generate_tables
raise e
File "/usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json/json.py", line 132, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 259, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Missing a closing quotation mark in string. in row 0
I would like to add Universal Sentence Encoder family of models to the automated evaluation.
It is relatively simply to evaluate it (thanks for making it straightforward), but it is not clear how to create a pull request to add the model to automated evaluation on the website. Please advise.
# !pip install tensorflow_text
import tensorflow_hub as hub
from tensorflow_text import SentencepieceTokenizer
import tensorflow as tf
embedder=hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
class USE():
def encode(self, sentences, batch_size=32, **kwargs):
embeddings = []
for i in range(0, len(sentences), batch_size):
batch_sentences = sentences[i:i+batch_size]
batch_embeddings = embedder(batch_sentences)
embeddings.extend(batch_embeddings)
return embeddings
model = USE()
Hi all,
I would like to compare varying configurations for AbsTaskClassification
.
For example, I am wondering about evaluation results with method=kNN
.
But, I am not sure how can I change those parameters in python scripts.
Could you help me with this?
method=kNN
(as described in the paper; 3.2 Tasks and evaluation - Classification) and method=logReg
(which is the default value for method param in the codes).Best regards,
Jihyuk
When running clustering tasks, I keep seeing this warning:
FutureWarning: The default value of `n_init` will change from 3 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
To keep the behavior or MTEB stable across versions of sklearn you should set n_init
to 3 explicitly. If someone happens to run this with an sklearn version >= 1.4, they would start getting different results. If you want, I can make a PR.
I'm on scikit-learn 1.2.2
It seems like the leaderboard on Huggingface is down, https://huggingface.co/spaces/mteb/leaderboard, it just says "Preparing Space" until it times out.
Some other people on Huggingface having the same issue:
https://huggingface.co/spaces/mteb/leaderboard/discussions/7
Is there a static version of the leaderboard, or another way of accessing the data?
For MIND we only compute MAP & MRR, while their leaderboards main scores are AUC & NDCG & MRR.
We should also compute AUC & NDCG.
would the maintainers be interested in the addition of a code retrieval task (CodeSearchNet, uses text queries to retrieve code documents), either as a new code retrieval type or added into the existing retrieval category?
Can look into this once I have some bandwidth
Hi! Thanks for this easy-to-use repo!
While I'm getting this error when running the example script https://github.com/embeddings-benchmark/mtebscripts/blob/main/run_array_simcse.py on retrieval benchmarks like QuoraRetrieval.
How do I evaluate on retrieval tasks when using my own model with a wrapper?
It'd be great if we could figure out using multiple gpus on tasks other than BEIR.
E.g. RedditClusteringP2P
takes >20h for a 5.8B model with embeddings of 4096 dimensions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.