open-spaced-repetition / fsrs-optimizer Goto Github PK
View Code? Open in Web Editor NEWFSRS Optimizer Package
Home Page: https://pypi.org/project/FSRS-Optimizer/
License: BSD 3-Clause "New" or "Revised" License
FSRS Optimizer Package
Home Page: https://pypi.org/project/FSRS-Optimizer/
License: BSD 3-Clause "New" or "Revised" License
On last two version of fsrs-optimizer
(5.0.2 and 5.0.1; 5.0.0 worked fine, if I remember correctly) optimizer.create_time_series
fails to generate time_series
in case where user only uses 1 (Again) and 3 (Good) buttons for reviews with following traceback:
How to reproduce:
fsrs4anki_Test.apkg
inside unzipped folder).fsrs4anki_optimizer.ipynb
(with %pip install -q fsrs_optimizer==5.0.2
or %pip install -q fsrs_optimizer==5.0.1
and timezone = 'Turkey'
if relevant)I'm not familiar with agg, but one possible workaround (and my preference) would be to use plt.savefig so I can take a look at them more than one at a time, and can have them for future reference. Would that be problematic? I can just change a forked/local copy to change out that one line if thats not something you'd want included in the code for some reason. Or maybe a terminal flag to default to how you have it but -w
for write plot files to local directory or something?
Originally posted by @topherbuckley in open-spaced-repetition/fsrs4anki#382 (comment)
A lot of users have reported that parameters and RMSE can change significantly even after doing just a few dozen more reviews. I think a good way to investigate this is to do the following:
This should be fairly straightforward to implement. The problem is what comes next. If it's true that even a few more (or fewer) reviews can vastly change RMSE and parameters, the question is "How do we make it more robust?". I don't know a good answer. One way that I have proposed is to run the optimizer from different starting points. In other words, instead of running it from just one set of the default parameters, we use 3-5 different sets of default parameters. This will ensure that we explore the parameter space more thoroughly.
With Anki 23.10.1, the optimized w[3] for my collection is 60.9883.
With Anki 23.12, the optimized w[3] for my collection is just 21.8133 (which is the same as w[2]).
With the Python optimizer (v4.20.2), the optimized w[3] is 21.2803. So, the problem is common to both Python and Rust optimizers.
Note: All the three values above are obtained using exactly the same collection.
Forgetting curve for Easy first rating:
I suggest the following condition: lapses≥10 AND reviews≥40.
This should filter out the worst leeches. This should be applied before optimization. Btw, I don't really know how the current filter works, perhaps there is already something similar.
@user1823 what do you think?
Currently, the simulator uses the same value of recall_cost
, which, I assume, is the amount of time (in seconds) it takes the user to answer a card. Speaking from personal experience, when I press Hard, it takes me a lot more time to think than when I press Easy. So I believe that 3 different values of recall_cost
should be used to make the simulator more precise.
While using the median (+filter) solved the problem with outliers, there is still a problem: what if the user doesn't have a lot of reviews? The fewer reviews he has, the less accurate the estimation of median review time will be.
Here's my idea:
n_reviews
is the number of times user pressed this particular button, t_user
is the median time per button estimated from the user's data, and t_default
is the default median time, estimated from 20k collections. 50 is a constant that determines how quickly the "smooth" value transitions from the default value to the actual value which is estimated from the user's data.Example: suppose the user pressed this button 5 times, his estimated median time is 10 seconds, and the default is 7 seconds. Then the "smooth" time will be (5 / (50 + 5)) * 10 + (1 - (5 / (50 + 5))) * 7 = 7.27. Now, let's say the median stayed the same while the number of reviews increased to 500. (500 / (50 + 500)) * 10 + (1 - (500 / (50 + 500))) * 7 = 9.73. We need to store 5 n_reviews
values: 4 for each button + 1 for new cards. We also need to store 5 default median values.
The key idea is that when the number of times the user presses a button is very small, the estimate is not very accurate, and we should rely more on the default time, which we know is accurate. So we use a "smooth" estimate that is equal to the default time when n_reviews
is 0 and approaches the real time of this user as n_reviews
increases. This way, when a user has very few reviews, the simulation will rely mostly on the default values, and when the user has a lot of reviews, the simulation will rely mostly on his own review time values. As a result, users with a small number of reviews will notice that the output of "Compute minimum recommended retention" varies less now. A weighted average of two medians is a bit unorthodox, but I don't see any issues with it.
Currently, we filter out reviews where time=0 and time>=20 minutes. However, if the user set their "Maximum answer seconds" to 60 (default), none of this will help. So I have an idea:
max(t)
Here's the key idea: we don't know what value the user chose as their "Maximum answer seconds". We don't have access to that setting. But we can guess what it was based on the max. value of all t. For example, if the maximum is 60 seconds, it's reasonable to assume that that's the "Maximum answer seconds". Then we can remove all reviews that are equal to that.
So if a user has times like this:
7, 8, 9, 10, 12, 15, 20, 60, 60, 60.
After the filter is applied, they will become this:
7, 8, 9, 10, 12, 15, 20
@user1823 I want to know your opinion as well
Summary: The issue is that the optimizer is filtering out a part of my data just because I have a large amount of data with delta_t = 1 (which is a result of my past Anki settings).
Detailed description:
The current optimizer is filtering out my data with first rating = 3 and delta_t = 4 (and above). But, the initial stability for Good for my collection is more than 10 days (exact value depends on which version of the optimizer I use). This means that the optimizer is not using my newer data for training (unless the count with high delta_t eventually becomes so large that it fits into the threshold).
You might say that there is not much data for delta_t > 10 in the above image. This is because of two reasons:
As I said before, the reason that a large number of my cards (3944) have the first interval = 1d (with first rating = 3) is that 1d was my second learning step when I was using the default Anki algorithm. But, this should not be a factor to decide how well the optimizer works for me.
Originally posted by @user1823 in open-spaced-repetition/fsrs4anki#348 (comment)
@L-M-Sherlock I think in the current version of the optimizer it's possible that a value for "Good" will be larger than for "Easy" if "Good" has more datapoints.
params, _ = curve_fit(power_forgetting_curve, delta_t, recall, sigma=1/np.sqrt(count), bounds=((0.1), (30 if total_count < 1000 else 365)))
You should probably add some kind of extra cap to ensure that S0 for "Good" cannot be greater than S0 for "Easy" even if total_count
is greater than 1000 for "Good" and less than 1000 for "Easy".
Originally posted by @Expertium in open-spaced-repetition/fsrs4anki#348 (comment)
Followed the steps in the tutorial but while training the FSRS parameters I got this message repeatedly.
Research
Enter an [x] character to confirm the points below:
Question
Dear all,
I would like to understand why my deck is displaying such absurd values. The optimizer is providing parameters that deviate significantly from the original values. When I optimize the deck, the "w" values increase to an unreasonable extent, causing the application of these values to result in absurdly long intervals between card reviews. I don't understand the underlying problem.
The parameters I used were as follows:
Next Day Starts at (Step 2): 12
Timezone (Step 3.1): America/Bahia
Filter out suspended cards: checked
Advanced Settings (Step 3.2): 0.9
Revlog Start Date: Optimize review logs after this date: 2006-10-05
Example original "w" values: If I keep the original "w" as follows:
[0.4, 0.6, 2.4, 5.8, 4.93, 0.94, 0.86, 0.01, 1.49, 0.14, 0.94, 2.18, 0.05, 0.34, 1.26, 0.29, 2.61]
A new card will have a good interval of 2 days and an easy interval of 7 days.
Example optimized "w" values: If I use the optimized "w" for my deck as follows:
[1.46, 4.9, 8.82, 10.08, 4.8994, 0.8604, 0.8373, 0.007, 1.5252, 0.1, 0.9738, 2.2595, 0.0204, 0.3773, 1.5048, 0.3335, 2.3037]
A new card will have a good interval of 10 days and an easy interval of 12 days.
A good interval of 10 days for a new card seems clearly absurd...
I would like to know if others are experiencing such significant deviations from the original parameters or if it's just affecting my deck. This might help identify the issue.
This problem has been occurring for me since version 3, but it seems to have worsened recently.
I believe it would be helpful to find a way to maintain these parameters within an acceptable standard deviation for a human. In my opinion, it's unrealistic for a human to fully memorize content from today to 10 days in the future after seeing it only once. Perhaps, if the optimizer could set a maximum good interval of 3 or 4 days within a standard deviation, it would be more reasonable.
These discrepancies persist in older reviews, resulting in significantly increased intervals. Additionally, there is a substantial difference between the "good" and "easy" buttons. For example, one card had a good interval of 29 days while the easy interval was 2.6 months, representing a 250% difference.
I have included my complete deck, including media, for your evaluation. Thank you in advance for any effort made to help solve this case.
Is your feature request related to a problem? Please describe.
I checked my parameters again and noticed that w[9] is 0.1 for some easy decks.
Describe the solution you'd like
I suggest changing the clamping to w[9] = w[9].clamp(0.025, 0.8)
This is a pretty minor thing so I won't make a new issue for it, but @L-M-Sherlock, I suggest changing the clampings for w[10] and w[8] (the parameters in new_s = state[:,0] * (1 + torch.exp(self.w[8]) * (11 - new_d) * torch.pow(state[:,0], -self.w[9]) * (torch.exp((1 - r) * self.w[10]) - 1) * hard_penalty * easy_bonus)
) from
w[10] = w[10].clamp(0.01, 1.5)
w[8] = w[8].clamp(0, 2)
to
w[10] = w[10].clamp(0.01, 2.5)
w[8] = w[8].clamp(0, 3)
I noticed that they both maxed out for one of my decks.
Originally posted by @Expertium in open-spaced-repetition/fsrs4anki#351 (comment)
大佬,我的 CSV 用 FSRS-4 优化器可以正常优化,用 FSRS-5 优化器会报错:“error Columns not found: 1, 2, 3, 4”
训练用的 CSV 文件如下:
revlog2024-0501.csv
I copied sicpy.optimize.brent, removed some stuff and hard-coded the 0.75-0.95 boundary into it. You can read about the Brent method on the Wikipedia and also you can read the scipy documentation. It says "The minimizer x will not necessarily satisfy xa <= x <= xb.", but again, I hard-coded my implementation to always return values between 0.75 and 0.95.
If time(loss(x)) >> time(math), in other words, if the amount of time required to calculate the value of the loss function is much greater than the amount of time required to do the math, then Brent is faster due to the fact that it calls the loss function less often.
If time(loss(x)) << time(math), in other words, if the value of the loss function can be calculated extremely quickly, then Brent is slower due to doing all of the extra math.
In the example in the .py file below, Brent called the loss function 7 times vs 16 times for the naive loop.
This was originally intended for best fit D, but it works just as well with any function that is continuous within some interval [a, b] and has a minimum. I hardcoded a and b to be 0.75 and 0.95.
Describe the bug
Running the optimizer crashes in part 4. Evaluating the model. seem similar to bug #123
To Reproduce
Steps to reproduce the behavior:
Expected behavior
optimizer can evaluate the model
Environment
Additional context
"w": [0.1, 0.537, 6.6827, 27.1943, 4.9824, 1.2089, 0.8091, 0.0024, 1.5319, 0.1056, 0.9654, 2.0932, 0.1437, 0.2684, 0.5902, 0.0092, 3.3688],
I could send you the deck if that helps you in any way.
Describe the bug
A clear and concise description of what the bug is.
there is a typo on the 2.1c Command Line wiki in the usage section, module name it's fsrs_optimizer instead of fsrs-optimizer
Additional context
Also by the way, filter_out_flags = [] option is missing, it is because the module works with a former version of the optimizer?
thanks for the hard work :)
I tried to input the data from obsidian-spaced-repetition-recall, ob-revlog.csv, into optimizer in the colab. I could not find the right method for the data, ob-revlog.csv.
Thank @L-M-Sherlock and @Newdea for your help. Based on the answer below, the solution are as followed:
# bash
python -m pip install fsrs-optimizer
python -m fsrs_optimizer /path/to/ob_revlog.csv -y
NB:
If the number of valid records is too small (<100), the program would crash.
Describe the bug
When the optimizer is being run locally it fails to find the second deck.
To Reproduce
Debian 12.1
$ sudo apt install python3-full
$ python3 -m venv ./anki_env
$ cd ./anki_env/
$ ./bin/pip3 install fsrs-optimizer --upgrade
export multiple decks into ~/anki_env, each in .apkg format
$ ./bin/python3 -m fsrs-optimizer ./*.apkg -o ./result.txt
the optimizer runs correctly for the first file and fails on the second one, here's the end of the output:
RMSE: 0.0137
MAE: 0.0107
[0.38874161 0.58629084]
Loss of SM-2: 0.3448
R-squared: -20.6841
RMSE: 0.1205
MAE: 0.0822
[0.8973963 0.01543027]
Universal Metric of FSRS: 0.0124
Universal Metric of SM2: 0.0714
The defaults will switch to whatever you entered last.
Timezone list: https://gist.github.com/heyalexej/8bf688fd67d7199be4a1682b3eec7568
input used timezone : Europe/Moscow
input used next day start hour (default: 4):
input the date at which before reviews will be ignored (default: 2006-10-05):
input filter out suspended cards? (y/n) (default: n): y
Save graphs? (y/n) (default: y): n
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ilya/anki_env/lib/python3.11/site-packages/fsrs_optimizer/__main__.py", line 177, in <module>
process(filename)
File "/home/ilya/anki_env/lib/python3.11/site-packages/fsrs_optimizer/__main__.py", line 74, in process
optimizer.anki_extract(
File "/home/ilya/anki_env/lib/python3.11/site-packages/fsrs_optimizer/fsrs_optimizer.py", line 322, in anki_extract
with zipfile.ZipFile(f'{filename}', 'r') as zip_ref:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/zipfile.py", line 1281, in __init__
self.fp = io.open(file, filemode)
^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '.././all__other__English.apkg'
Expected behavior
The optimizer runs correctly for all deck files.
Environment
All packages installed by pip3:
When I exec python -m fsrs_optimizer ~/Desktop/foo.apkg
, the program asked for some basic config at first, then error was thrown:
[Errno 2] No such file or directory: '..//home/foo/Desktop/foo.apkg'
Failed to process /home/foo/Desktop/foo.apkg
After I copied the file into the same folder, and exec python -m fsrs_optmizer ./foo.apkg
, everything just worked normally.
If this is a feature, it makes more sense to raise the error at the beginning.
Otherwise, it must be just some trivial problems of path handling. If you want, I can create a PR of it.
Currently we are using curve_fit to extrapolate values of S0, which often leads to values that are too small or too big. Here is what I propose:
Again = Hard^2/Good
Hard = sqrt(Again * Good)
Good = sqrt(Hard * Easy)
Easy = Good^2/Hard
Basically, we assume that Hard and Good can be calculated as geometric averages of their neighboring values, and from these formulas we can derive formulas for Again and Easy as well. Note that this should be used only for extrapolation, when the number of reviews for a particular grade is 0.
Since stability can vary by orders of magnitude, MAPE is more appropriate.
recall_costs = recall_card_revlog.groupby(by="review_rating")[
"review_duration"
].mean()
I suggest replacing the mean with the median here, as the median is not sensitive to outliers. Relevant problem: https://forums.ankiweb.net/t/clarify-what-optimal-retention-means/42803/50?u=expertium
This could help mitigate such problems when, for example, the user went away to make dinner and, as a result, the review time ended up being orders of magnitude greater than usual, skewing the mean. And don't forget to modify how the learn cost is calculated as well, the median should be used for all time costs.
Additionally, to make the estimate even more robust (granted, the median is already robust), we can remove all times >20 minutes before calculating the median, since obviously nobody spends that much time per card.
I really like the new post-lapse stability analysis feature, and I think we can improve the post-lapse stability (PLS) formula if we analyze how post-lapse stability is related to D and R as well.
In other words, the two new graphs should have D and predicted R (at the time of the review) on the x axis and PLS on the y axis, with both true PLS (dots) and predicted PLS (curve), of course. This will help us determine how exactly PLS depends on D and R and whether the current formals provide a good fit. In the future, a similar approach can be used for the main formula as well, just with three different graphs for each grade. For now, the idea is to analyze how PLS depends on the three components of memory.
Describe the bug
When running the optimizer, some files cause the error depicted in the screenshot below.
Settings within the optimizer
Environment
apkg-files which caused this error:
https://drive.google.com/drive/folders/1R5ihs830VfCjFfeKZKZ76tl45EtLh3yy?usp=sharing
I've been experimenting a bit with the optimizer and noticed that I get the same weights regardless of whether I include the review_duration column in my revlog file.
My question: does review_duration have any effect on the resulting optimal weights? Or does it only affect requestRetention?
I'm asking this in part because I'm building some software with FSRS and am trying to figure out whether I should be including review duration in my revlogs.
Thanks!
drop-in replacement.
smooth_and_fill
isn't strict enough. We need Again < Hard < Good < Easy, not just Again <= Hard <= Good <= Easy. This is so that the user is less likely to see the same interval if learning steps are removed.
@L-M-Sherlock, I recommend using s0 = 1.5 for Hard. Currently, 0.6 is used, which is too small.
Originally posted by @user1823 in #16 (comment)
When I was running the benchmark on 66 collections, I also wrote down S0. Here are the average values, weighted by ln(reviews):
S0(Again)=0.6
S0(Hard)=1.4
S0(Good)=3.3
S0(Easy)=10.1I suggest running a statistical significance test to determine whether these values are better than the ones currently used.
Originally posted by @Expertium in #16 (comment)
In my opinion, we should just replace the S0 for Hard because the currently used value for
Hard
doesn't make much sense.Also, the result of such a change would not be statistically significant because it would only affect the values in those collections that have a very low number of reviews with
Hard
as the first rating. So, we don't need to run a statistical significance test here.
Originally posted by @user1823 in #16 (comment)
I want Sherlock to replace all 4 values though. There is a pretty big difference between the currently used values (all four of them) and the ones I obtained from the benchmark. We need to find out which ones provide a better fit to user's repetition histories.
Current: 0.4, New: 0.6
Current: 0.6, New: 1.4
Current: 2.4, New: 3.3
Current: 5.8, New: 10.1
The values obtained from benchmarking are 50-100% greater.
Originally posted by @Expertium in #16 (comment)
That's the section of the optimizer that I am least familiar with. I would like to understand it better. I was looking at the code, and the only part that I clearly understand is this:
exp_times = (
p_recalls * recall_times + (1.0 - p_recalls) * forget_times
)
Expected time equals the probability of a pass (Hard/Good/Easy) multiplied by the amount of time user spends on such reviews plus the probability of a failure (Again) multiplied by the amount of time user spends on such reviews.
However, the rest of the code is magic to me, and I don't understand what difficulty has to do with anything there. I would greatly appreciate a detailed explanation of what each piece code does, ideally both a general, abstract overview (aka explain the key principles) and a more in-depth explanation of each chunk of code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.