Giter Club home page Giter Club logo

Comments (2)

Tiiiger avatar Tiiiger commented on July 18, 2024

Hi @g32M7fT6b8Y ,

We believe this is mostly an issue of usage, not a weakness in the method itself. Indeed, we have found that BERTScore computed with deep contextual embedding models can sometimes have a small numerical range (also pointed out by #20 ). However, this does not suggest that BERTScore cannot distinguish bad candidates (bad responses in your case) from good candidates. If we rank the candidates, the good candidates would score higher than the bad candidates. On this note, we also refer you to the correlation studies in our paper.



We also don’t want to simply ignore this “numerical range” problem because it hinders the readability of our method. After rounds of considerations, here’s what we propose: 


We take a large monolingual corpus and randomly assign sentences to be candidate-reference pairs. When we evaluate these pairs with BERTScore, the output score (averaged) should serve as a lower bound because the candidate and reference are irrelevant to each other. We propose to use this lower bound to rescale BERTScore. To do this, we subtract this lower bound from a BERTScore and divide the difference by 1-lower bound.

For some numbers:
On the WMT17 news crawl English corpus, a lower bound for BERTScore computed with RoBERTa-Large is 0.83. With this recalling, the average BERTScore on the WMT18 De-EN translation evaluation dataset drops from 0.9311 to 0.5758. For a concrete example, let's look at the example mentioned in #20. Before this rescaling the score distribution is like this:
image

After rescaling, this distribution looks like this:
image



Note that this modification would only change the range of BERTScore and won’t affect BERTScore’s correlation with human judgment. Currently, we are adding software support in this repo. Stay tuned and we’ll push this change into the new version soon. 



I am closing this issue but feel free to continue the thread here.

from bert_score.

gmftbyGMFTBY avatar gmftbyGMFTBY commented on July 18, 2024

Thank you for your response.
I think it maybe a appropriate way to alleviate this issue.
Cannot wait to try the new version of BERTScore.

from bert_score.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.