Giter Club home page Giter Club logo

evalrank-embedding-evaluation's Introduction

Word & Sentence Embedding Evaluation

In this project, we provide an easy-to-use toolkit for both word and sentence embedding evaluations.

For more details: ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities

(Slides) (Poster) (Video)

Update

Outline

Section Description
Evluation Tasks Evluation Tasks
Environment Setup Environments
Models and Quick Start Models and Quick Start
Benchmarking - Word Leaderboard
Benchmarking - Sentence Leaderboard
References References
Acknowledge Acknowledge

Evluation Tasks

The following are the supported evaluation tasks:

Environment Setup

Tested with the following dependencies:

  • python==3.8.12
  • pytorch==1.11.0
  • transformers==4.11.3
  • scikit-learn==0.23.2

Please look into the details of the following script file for how to set up the environment.

bash environment.sh

Models and Quick Start

We have supoorted a list of word & sentence embedding models for quick evaluation and benchmarking.

  • Word Embedding Models

    • Any word embedding files follow this format.
    • Integrate one post-processing method.
  • Word-level EvalRank and Similarity

    • To test on your own model, simply change the word embedding path.
    bash word_evaluate.sh
    
    # To evaluate on your own word embedding model
    update file: word_evaluate.sh
    WORD_EMB_PATH='PATH/TO/WORD/EMBEDDING'
    
  • Sentence Embedding Models

    • Bag-of-word (averaging word embedding)
    • Bag-of-word with post-processing
    • InferSent
    • BERT
    • BERT-Whitening
    • BERT-Flow
    • Sentence-BERT
    • SimCSE
  • Sentence-level EvalRank and Similarity

    • You can also easily test your own sentence embedding model using our provided template.
    bash sentence_evaluate.sh
    
    # To evaluate on your own sentence embedding model modify the following to files
    update file: sentence_evaluate.sh
    SENT_EMB_MODEL='customize'
    update file: ./src/models/sent_emb/_customize.py
    

    For better classification performance, edit the following part (in file src/s_evaluation.py):

    params_senteval = {'task_path': './data/', 'usepytorch': True, 'kfold': 5}
    params_senteval['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
                                    'tenacity': 3, 'epoch_size': 2}
    

    to

    params_senteval.update({'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10})
    params_senteval['classifier'] = {'nhid': 50, 'optim': 'adam', 'batch_size': 64,
                                    'tenacity': 5, 'epoch_size': 4}
    

For a complete set of model performance, refer to the bash and log files in scripts/. Simply run the corresponding script for results.

Benchmarking - Word

Word Embedding (cos) EvalRank (MRR) Hits1 Hits3
toy_emb.txt 3.18 1.18 3.54
glove.840B.300d.txt 13.15 4.66 15.72
GoogleNews-vectors-negative300.txt 12.88 4.57 14.35
crawl-300d-2M.vec 17.22 5.77 19.99
dict2vec-300d.vec 12.71 4.04 13.04
  • More benchmarking results can be found in this page: word_evalrank, word_similarity.
  • More benchmarking results can also be found in scripts and their corresponding logs.

Benchmarking - Sentence

Sentence Embedding (cos) EvalRank (MRR) Hits1 Hits3
toy_emb.txt 41.15 28.79 49.65
glove.840B.300d.txt 61.00 44.94 74.66
InferSentv1 60.72 41.92 77.21
InferSentv2 63.89 45.59 80.47
BERT(first-last-avg) 68.01 51.70 81.91
BERT-whitening 66.58 46.54 84.22
Sentence-BERT 64.12 47.07 79.05
SimCSE 69.50 52.34 84.43

References

If you find our package useful, please cite our paper.

@inproceedings{wang-etal-2022-just,
    title = "Just Rank: Rethinking Evaluation with Word and Sentence Similarities",
    author = "Wang, Bin  and
      Kuo, C.-C.  and
      Li, Haizhou",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.419",
    pages = "6060--6077"
}
@article{evalrank_2022,
  title={Just Rank: Rethinking Evaluation with Word and Sentence Similarities},
  author={Wang, Bin and Kuo, C.-C. Jay and Li, Haizhou},
  journal={arXiv preprint arXiv:2203.02679},
  year={2022}
}

Acknowledge

  • We borrow a portion of sentence embedding evaluation from SentEval. Please consider cite their work if you found that part useful.

Contact Info: [email protected].

evalrank-embedding-evaluation's People

Contributors

binwang28 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.