🐛 Bug Using comet-compare I noticed that the exact same outpu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

small score difference for identical outputs about comet HOT 4 CLOSED

unbabel commented on May 8, 2024

small score difference for identical outputs

from comet.

Comments (4)

ricardorei commented on May 8, 2024

Hey, Nicola thanks for reporting this... The way bootstrap resampling works will not be affected by very small segment level differences because for the wins, losses, ties count we will average across several segments and we will do this several times across different splits.

If two systems have the exact same translations I can ensure you that you will have x_win = 0.5 and y_win = 0.5 or at least something really close to that which is not statistically significant.

Btw the root of this problem comes from the layerwise normalization we do which can be affected by the batch_size. In practice, it's not desirable but negligible as you said.

from comet.

nicolabertoldi commented on May 8, 2024

why do not apply a smooth decision for the ties count?
Something like

        if subsample_x_scr > subsample_y_scr + epsilon:
            win_count[0] += 1
        elif subsample_y_scr > subsample_x_scr + epsilon:
            win_count[1] += 1
        else:
            win_count[2] += 1

with a reasonably small value for epsilon

from comet.

ricardorei commented on May 8, 2024

that's a reasonable idea for the system-level score. I'll fix it along with the other issues you reported

from comet.

ricardorei commented on May 8, 2024

@nicolabertoldi thanks for the issues! I believe everything is working properly now. Please tell me if not

from comet.

Recommend Projects

small score difference for identical outputs about comet HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent