<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I don't think that's what the paper suggests. <p dir="a

Findings about optimizing in deck and collection level,about open-spaced-repetition/fsrs-when-to-separate-presets

Comments (17)

giacomoran commented on July 16, 2024 2

The paper is correct, independently of using lowess in $f$.

Using the notation from the paper, we don't know $\phi$. We can only observe an empirical distribution $\hat{\Phi}_n$ from the predicted probabilities.
There is a result that says that

$\mathbb{E}_{\hat{\Phi}_n}[f(X)] = \frac{1}{n}\displaystyle\sum_{i=1}^n f(x_i)$

where $X \sim P$ and $x_1, \dots, x_n$ observations from $P$.

In our case, the lhs is $\displaystyle\int_0^1 f(x) d\hat{\Phi}_n$ which approximates $\displaystyle\int_0^1 f(x) \phi(x) dx$; the rhs is np.mean(np.abs(observation - p)).

See for example https://math.stackexchange.com/q/1267634

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024 1

What's ICI?

You can see this issue:

open-spaced-repetition/spaced-repetition-algorithm-metric#2

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

next_day_starts_at = 4
It's 5. It's not super important, but still.
Can you explain what this code is doing? I'm havign a hard time understanding it.
Also, seems like you've added some new metrics:

I'm assuming E50 is the median error (based on bins) and E90 is the 90 percentile (also based on bins). What's ICI?

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

Ok, but what about the code itself? Does it just run the optimizer on every single deck?

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

ici = np.mean(np.abs(observation - p))
I don't think that's what the paper suggests. In the paper, the values are weighted by the empirical density function of
the predicted probabilities.

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

So basically, right now we are using the number of reviews in each bin as weights. For ICI, we should use probability density. I think I could do that with FFTKDE, I'll try to tinker with it later and maybe I'll submit a PR.

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024

I don't think that's what the paper suggests.

Did you check the appendix of the paper?

So basically, right now we are using the number of reviews in each bin as weights.

ICI does't require any bins.

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024

Ok, but what about the code itself? Does it just run the optimizer on every single deck?

It filter out the decks containing >=1000 reviews and generate deck level's parameters for each one and predict one by one. We can evaluate the average error after joining them. And then optimize FSRS in the joined dataset and evaluate it with the collection level's parameters.

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

Did you check the appendix of the paper?

That's weird, in the paper they clearly say that it should be weighted.

ICI does't require any bins.

I meant RMSE, sorry, my wording wasn't clear. I was trying to say "RMSE uses n reviews in each bin as weights, but since ICI is continuous, it should use a continuous counterpart - probability density".

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024

The probability density has been in the array of p.

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

p is predicted probability, observation is smoothed using lowess. What I'm saying is that, if I interpreted the paper correctly, then instead of this:
ici = np.mean(np.abs(observation - p))
it should be this:
ici = np.average(np.abs(observation - p), weights=pdf(p))
Where pdf(p) is an empirical probability density function. Remember, not all values of p are equally likely to occur. This is why for RMSE bins are used.

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024

observation is smoothed using lowess

Here the lowess has applied pdf to observation. Because lowess is locally weighted scatterplot smoothing.

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

Ok, my bad then.

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

@L-M-Sherlock there is something I want you to investigate. Try selecting different thresholds, like 1000 reviews, 2000 reviews, 4000 reviews, etc., and seeing how well FSRS performs if all subdecks with <threshold reviews inherit the parent's parameters. The goal is to see whether there is such a thing as an optimal threshold. If the threshold is too low, it may not be a good idea to run FSRS on all decks, since a lot of them will have very few reviews, and we know that RMSE decreases as n(reviews) increases. But if the threshold is too high, we might end up grouping together decks with very different material. So there probably exists an optimal threshold.

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024

Try selecting different thresholds, like 1000 reviews, 2000 reviews, 4000 reviews, etc., and seeing how well FSRS performs if all subdecks with <threshold reviews inherit the parent's parameters.

Assuming the threshold is 1000, the decks and their sizes are

deck   size
A::1   1000
A::2   2000
A::3   500

How to separate them? Which parameters should A::3 use?

from fsrs-when-to-separate-presets.

Expertium commented on July 16, 2024

If A3 has a parent deck, it should use the parameters of the parent deck. If not, then use the global parameters, which can be obtained by running the optimizer on the entire collection.

from fsrs-when-to-separate-presets.

L-M-Sherlock commented on July 16, 2024

OK. I guess the best way here is to optimize FSRS in all level of decks and save all parameters in a table for following tests.

from fsrs-when-to-separate-presets.

Findings about optimizing in deck and collection level about fsrs-when-to-separate-presets HOT 17 CLOSED

Comments (17)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent