Giter Club home page Giter Club logo

Comments (7)

Expertium avatar Expertium commented on September 26, 2024

fasiha/ebisu.js#23
I suggested that, but it seems that LMSherlock and @fasiha just kinda weren't very interested.

from srs-benchmark.

fasiha avatar fasiha commented on September 26, 2024

Ebisu author here šŸ‘‹ fasiha/ebisu.js#23 has the discussion and links to the results. Ebisu v3 release candidate didnā€™t do well! For separate reasons, Iā€™ve been working on alternatives to that version and have something I like more (see fasiha/ebisu#66) and Iā€™m happy to support rerunning the benchmarks on that version.

I am ashamed to admit it but I havenā€™t made time to properly understand how the benchmarks here work, and I havenā€™t made time to figure out how to run FSRS etc. on the benchmarks I personally use to compare Ebisu versions. Part of the reason is, Ebisu and its benchmarks handle not just binary quizzes but also binomial and noisy-binary and passive quizzes; and itā€™s not been obvious how to adapt those quiz styles to various other probabilistic SRS systems to ensure weā€™re doing apples-to-apples comparisons.

(Background. I use a focal loss-ified log likelihood for all these various quiz typesā€”see the link above and references thereinā€”because the standard log loss/binary cross entropy (https://github.com/open-spaced-repetition/srs-benchmark/?tab=readme-ov-file#metrics) was ranking ā€œbadā€ Ebisu models higher than ā€œgoodā€ ones, i.e., ones I preferred and thought were more accurate.)

I know I should just wrap up working on Ebisu v3 and release it so folks can do benchmarks without being confused what version to run šŸ˜“ sorry! Iā€™m hoping to release v3 thisā€¦ year šŸ¤ž

from srs-benchmark.

Expertium avatar Expertium commented on September 26, 2024

Part of the reason is, Ebisu and its benchmarks handle not just binary quizzes but also binomial and noisy-binary and passive quizzes; and itā€™s not been obvious how to adapt those quiz styles to various other probabilistic SRS systems to ensure weā€™re doing apples-to-apples comparisons.

We can benchmark any algorithm as long as it:

  1. Uses interval lengths and grades, no other info (like text)
  2. Outputs a number between 0 and 1 that can be interpreted as a probability

Also, Anki has 4 grades (answer buttons), so previously I suggested using different values of q0 for each grade. Idk if that suggestion makes a lot of sense, I only have surface-level knowledge of Ebisu and Bayesian stuff is pretty arcane for me.

from srs-benchmark.

andymatuschak avatar andymatuschak commented on September 26, 2024

Oh, perfect! Thanks for sharing that thread. I like Ebisuā€™s approach in principle, still curious if its empirical deficits can be overcome. I like that its theory more directly handles issues like, say, the fact that if weā€™re targeting 90% retrievability, we should expect to miss 10% of items, even if their underlying stabilities are identical. Most algorithms handle that with an ad-hoc solution (eg FSRSā€™s low-pass back to default stability), and maybe thatā€™s fine, but Bayesian stats seem like a better approach in principle (tho evidently perhaps not in practice!)

Iā€™ll leave this issue open since Ebisu is still not in the official benchmark results for this repo but feel free to close if you like, since I got what I wanted! :)

from srs-benchmark.

Expertium avatar Expertium commented on September 26, 2024

if weā€™re targeting 90% retrievability, we should expect to miss 10% of items, even if their underlying stabilities are identical. Most algorithms handle that with an ad-hoc solution (eg FSRSā€™s low-pass back to default stability

I'm not sure what you're trying to say. That just because a user pressed Again, it doesn't necessarily mean that the value of stability should be decreased? Our findings suggest that stability can drop very significantly in case of a memory lapse. Very crudely, post-lapse stability (PLS) as a function of previous S is like this:
image

Of course, the actual formula is more nuanced, this is without the retrievability and difficulty dimensions and also without the constant. I just simplified it as much as I could to focus purely on the relationship between S and PLS. Once LMSherlock is less busy, we will perform an interesting analysis to try to find weaknesses in our formulas for S and PLS. But I don't expect to find any flaws with PLS=f(S).

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 26, 2024

The PR is here: #11. It doesn't perform well, so I haven't merged it. And the dataset has been updated, so the PR is outdated. If you're interested in the result, I would rerun the benchmark when I'm available. But I need to check whether my implementation is correct at first. It requires the help from @fasiha.

from srs-benchmark.

Expertium avatar Expertium commented on September 26, 2024

@fasiha we're working on FSRS-5, and I will make another Reddit post about benchmarking, so if you are still interested, you can come back to implementing Ebisu in the benchmark.

from srs-benchmark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.