Thank you for this very interesting analysis! If you all feel inclined to include it,

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="19

Ebisu author here 👋 <a class="issue-link js-issue-link" data-error-text="Failed to lo

The PR is here: <a class="issue-link js-issue-link" data-error-text="Failed to load ti

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ebisu? about srs-benchmark HOT 7 CLOSED

open-spaced-repetition commented on September 26, 2024

Ebisu?

from srs-benchmark.

Comments (7)

Expertium commented on September 26, 2024

fasiha/ebisu.js#23
I suggested that, but it seems that LMSherlock and @fasiha just kinda weren't very interested.

from srs-benchmark.

fasiha commented on September 26, 2024

Ebisu author here 👋 fasiha/ebisu.js#23 has the discussion and links to the results. Ebisu v3 release candidate didn’t do well! For separate reasons, I’ve been working on alternatives to that version and have something I like more (see fasiha/ebisu#66) and I’m happy to support rerunning the benchmarks on that version.

I am ashamed to admit it but I haven’t made time to properly understand how the benchmarks here work, and I haven’t made time to figure out how to run FSRS etc. on the benchmarks I personally use to compare Ebisu versions. Part of the reason is, Ebisu and its benchmarks handle not just binary quizzes but also binomial and noisy-binary and passive quizzes; and it’s not been obvious how to adapt those quiz styles to various other probabilistic SRS systems to ensure we’re doing apples-to-apples comparisons.

(Background. I use a focal loss-ified log likelihood for all these various quiz types—see the link above and references therein—because the standard log loss/binary cross entropy (https://github.com/open-spaced-repetition/srs-benchmark/?tab=readme-ov-file#metrics) was ranking “bad” Ebisu models higher than “good” ones, i.e., ones I preferred and thought were more accurate.)

I know I should just wrap up working on Ebisu v3 and release it so folks can do benchmarks without being confused what version to run 😓 sorry! I’m hoping to release v3 this… year 🤞

from srs-benchmark.

Expertium commented on September 26, 2024

Part of the reason is, Ebisu and its benchmarks handle not just binary quizzes but also binomial and noisy-binary and passive quizzes; and it’s not been obvious how to adapt those quiz styles to various other probabilistic SRS systems to ensure we’re doing apples-to-apples comparisons.

We can benchmark any algorithm as long as it:

Uses interval lengths and grades, no other info (like text)
Outputs a number between 0 and 1 that can be interpreted as a probability

Also, Anki has 4 grades (answer buttons), so previously I suggested using different values of q0 for each grade. Idk if that suggestion makes a lot of sense, I only have surface-level knowledge of Ebisu and Bayesian stuff is pretty arcane for me.

from srs-benchmark.

andymatuschak commented on September 26, 2024

Oh, perfect! Thanks for sharing that thread. I like Ebisu’s approach in principle, still curious if its empirical deficits can be overcome. I like that its theory more directly handles issues like, say, the fact that if we’re targeting 90% retrievability, we should expect to miss 10% of items, even if their underlying stabilities are identical. Most algorithms handle that with an ad-hoc solution (eg FSRS’s low-pass back to default stability), and maybe that’s fine, but Bayesian stats seem like a better approach in principle (tho evidently perhaps not in practice!)

I’ll leave this issue open since Ebisu is still not in the official benchmark results for this repo but feel free to close if you like, since I got what I wanted! :)

from srs-benchmark.

Expertium commented on September 26, 2024

if we’re targeting 90% retrievability, we should expect to miss 10% of items, even if their underlying stabilities are identical. Most algorithms handle that with an ad-hoc solution (eg FSRS’s low-pass back to default stability

I'm not sure what you're trying to say. That just because a user pressed Again, it doesn't necessarily mean that the value of stability should be decreased? Our findings suggest that stability can drop very significantly in case of a memory lapse. Very crudely, post-lapse stability (PLS) as a function of previous S is like this:

Of course, the actual formula is more nuanced, this is without the retrievability and difficulty dimensions and also without the constant. I just simplified it as much as I could to focus purely on the relationship between S and PLS. Once LMSherlock is less busy, we will perform an interesting analysis to try to find weaknesses in our formulas for S and PLS. But I don't expect to find any flaws with PLS=f(S).

from srs-benchmark.

L-M-Sherlock commented on September 26, 2024

The PR is here: #11. It doesn't perform well, so I haven't merged it. And the dataset has been updated, so the PR is outdated. If you're interested in the result, I would rerun the benchmark when I'm available. But I need to check whether my implementation is correct at first. It requires the help from @fasiha.

from srs-benchmark.

Expertium commented on September 26, 2024

@fasiha we're working on FSRS-5, and I will make another Reddit post about benchmarking, so if you are still interested, you can come back to implementing Ebisu in the benchmark.

from srs-benchmark.

Ebisu? about srs-benchmark HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent