Giter Club home page Giter Club logo

evalrank-embedding-evaluation's Introduction

Word & Sentence Embedding Evaluation

In this project, we provide an easy-to-use toolkit for both word and sentence embedding evaluations.

For more details: ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities

(Slides) (Poster) (Video)

Update

Outline

Section Description
Evluation Tasks Evluation Tasks
Environment Setup Environments
Models and Quick Start Models and Quick Start
Benchmarking - Word Leaderboard
Benchmarking - Sentence Leaderboard
References References
Acknowledge Acknowledge

Evluation Tasks

The following are the supported evaluation tasks:

Environment Setup

Tested with the following dependencies:

  • python==3.8.12
  • pytorch==1.11.0
  • transformers==4.11.3
  • scikit-learn==0.23.2

Please look into the details of the following script file for how to set up the environment.

bash environment.sh

Models and Quick Start

We have supoorted a list of word & sentence embedding models for quick evaluation and benchmarking.

  • Word Embedding Models

    • Any word embedding files follow this format.
    • Integrate one post-processing method.
  • Word-level EvalRank and Similarity

    • To test on your own model, simply change the word embedding path.
    bash word_evaluate.sh
    
    # To evaluate on your own word embedding model
    update file: word_evaluate.sh
    WORD_EMB_PATH='PATH/TO/WORD/EMBEDDING'
    
  • Sentence Embedding Models

    • Bag-of-word (averaging word embedding)
    • Bag-of-word with post-processing
    • InferSent
    • BERT
    • BERT-Whitening
    • BERT-Flow
    • Sentence-BERT
    • SimCSE
  • Sentence-level EvalRank and Similarity

    • You can also easily test your own sentence embedding model using our provided template.
    bash sentence_evaluate.sh
    
    # To evaluate on your own sentence embedding model modify the following to files
    update file: sentence_evaluate.sh
    SENT_EMB_MODEL='customize'
    update file: ./src/models/sent_emb/_customize.py
    

    For better classification performance, edit the following part (in file src/s_evaluation.py):

    params_senteval = {'task_path': './data/', 'usepytorch': True, 'kfold': 5}
    params_senteval['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
                                    'tenacity': 3, 'epoch_size': 2}
    

    to

    params_senteval.update({'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10})
    params_senteval['classifier'] = {'nhid': 50, 'optim': 'adam', 'batch_size': 64,
                                    'tenacity': 5, 'epoch_size': 4}
    

For a complete set of model performance, refer to the bash and log files in scripts/. Simply run the corresponding script for results.

Benchmarking - Word

Word Embedding (cos) EvalRank (MRR) Hits1 Hits3
toy_emb.txt 3.18 1.18 3.54
glove.840B.300d.txt 13.15 4.66 15.72
GoogleNews-vectors-negative300.txt 12.88 4.57 14.35
crawl-300d-2M.vec 17.22 5.77 19.99
dict2vec-300d.vec 12.71 4.04 13.04
  • More benchmarking results can be found in this page: word_evalrank, word_similarity.
  • More benchmarking results can also be found in scripts and their corresponding logs.

Benchmarking - Sentence

Sentence Embedding (cos) EvalRank (MRR) Hits1 Hits3
toy_emb.txt 41.15 28.79 49.65
glove.840B.300d.txt 61.00 44.94 74.66
InferSentv1 60.72 41.92 77.21
InferSentv2 63.89 45.59 80.47
BERT(first-last-avg) 68.01 51.70 81.91
BERT-whitening 66.58 46.54 84.22
Sentence-BERT 64.12 47.07 79.05
SimCSE 69.50 52.34 84.43

References

If you find our package useful, please cite our paper.

@inproceedings{wang-etal-2022-just,
    title = "Just Rank: Rethinking Evaluation with Word and Sentence Similarities",
    author = "Wang, Bin  and
      Kuo, C.-C.  and
      Li, Haizhou",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.419",
    pages = "6060--6077"
}
@article{evalrank_2022,
  title={Just Rank: Rethinking Evaluation with Word and Sentence Similarities},
  author={Wang, Bin and Kuo, C.-C. Jay and Li, Haizhou},
  journal={arXiv preprint arXiv:2203.02679},
  year={2022}
}

Acknowledge

  • We borrow a portion of sentence embedding evaluation from SentEval. Please consider cite their work if you found that part useful.

Contact Info: [email protected].

evalrank-embedding-evaluation's People

Contributors

binwang28 avatar punchwes avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.