<a href="http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/409s15516709cog

My code for example: <div class="highlight highlight-source-python notranslate pos

The following equations should be more complete. <p dir

[Feature request] Add the ACT-R model (see paper),about open-spaced-repetition/srs-benchmark

Comments (21)

ElementaryZ commented on June 24, 2024 1

The model described in the paper is not exactly the same as the ACT-R model, but it is based on it from 2005. The latest version of ACT-R from http://act-r.psy.cmu.edu/software/ seems to be from 2023.

The equations are recursive and they also add an interference scalar $h$, which is not in the given equations due to focusing on the spacing effect and not the effect of interference. $h$ then depends on whether the intervals took place during review or not so it's not completely constant.

The following equations should be more complete.

$$ \begin{align} p_r(m) &= \frac{1}{1+e^{\frac{\tau-m}{s}}} \\ m_n(t_{1\dots n}) &= ln\left[\sum_{i=1}^n (h\times t_i)^{-d_i}\right] \\ d_i(m_{i-1})&=ce^{m_{i-1}}+a \end{align} $$

Example:
(EDIT: note that all previous presentations need to be calculated with their respective decay, therefore simply taking the previous output of $m_i$ is not sufficient, these have been updated to reflect that).

$$ \begin{align} m_1 &= ln\left[(h\times t_1)^{-a}\right] \\ m_2 &= ln\left[(h\times t_1)^{-a}+(h\times t_2)^{-(ce^{m_1}+a)}\right] \\ m_3 &= ln\left[(h\times t_1)^{-a}+(h\times t_2)^{-(ce^{m_2}+a)}+(h\times t_3)^{-(ce^{m_3}+a)}\right] \\ &\vdots \end{align} $$

where $t_0=0$ and $m_0=-\infty$.

$\tau$ - threshold parameter
$s$ - measure of noise
$p_r$ - probability of recall
$t_i$ - time until $i$ th practice ($t_i = spacing_n-spacing_i$)
$c$ - decay scale
$a$ - decay intercept
$d_i$ - decay rate
$h$ - interference scalar
$m_i$ - activation

The code for the model is available in Excel format on their website http://act-r.psy.cmu.edu/?post_type=publications&p=14206, under downloads in the Model and Sequence Files: http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2013/09/model-and-seq.zip

from srs-benchmark.

ElementaryZ commented on June 24, 2024 1

My code for example:

import math

sp = [0, 126, 252, 4844, 5877] # spacing

a = 0.176786766570677          # decay intercept
c = 0.216967308403809          # decay scale
s = 0.254893976981164          # noise
tau = -0.704205679427144       # threshold

m = [-999999]
t = [0]
d = [a]

def act():
    for u in range(0, len(sp)-1):
        sumact = 0
        prev = len(m)
        for i in range(1,prev):
            mi = math.exp(m[i])
            ti = sp[prev] - sp[i]
            sumact = sumact + ti**(-(c*mi + a))
        
        t1 = sp[prev]
        act = math.log(sumact+(t1**(-a)))
        
        m.append(act)
        t.append(t1)

def activation(m):
    return 1/(1+math.exp((tau-m)/s))

act()
print("m: ", m)
print("t: ", t)

p = map(activation, m[1:])
print("p: ", list(p))

Results:

m:  [-999999, -0.8549906405542196, -0.4332005949791639, -0.9299686617111677, -0.6174756722171177]
t:  [0, 126, 252, 4844, 5877]
p:  [0.3562771058516888, 0.7433029478836352, 0.2919952459321521, 0.584253471491992]

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

Could you rewrite the ACT-R model in state-transition equations of D, S and R?

By the way, I'm benchmarking FSRS with short-term schedule. It reduces 3.2% RMSE(bins) compared with FSRS-4.5. I'm wondering whether it's worth to release it.

from srs-benchmark.

Expertium commented on June 24, 2024

Could you rewrite the ACT-R model in state-transition equations of D, S and R?

Well, it's not exactly the same as DSR, but if you're asking me to code it - I'll do my best.

from srs-benchmark.

Expertium commented on June 24, 2024

It seems that their approach is quite different from ours. The first review (delta_t=0) is called "study" and consecutive reviews are called "tests". So their first "test" is our second review, we'll have to discard the first review. The way they calculate R is also different, and this is a bit difficult to explain. We use r = power_forgetting_curve(X[:, 0], state[:, 0]) and calculate new_s later, but what they do is more like r = power_forgetting_curve(X[:, 0], new_s). Basically, we use S[n] to calculate R[n+1], they use m[n] (activation) to calculate R[n].
Because of these differences I cannot implement this model myself.

from srs-benchmark.

Expertium commented on June 24, 2024

I just realized that there is another problem: the way they calculate delta_t. We calculate delta_t as the difference between the most recent review and the previous review, but they calculate it as the total time since the first review.
Suppose that our delta_t's look like this:
1 day
2 day
5 days
15 days

Their delta_t's would look like this
1 day
3 days
8 days
23 days

And that's not everything. Even though their notation says
, it's actually not what it looks. Calculating that sum seems complicated and their notation is misleading. Read the appendix in the linked paper.

from srs-benchmark.

Expertium commented on June 24, 2024

Btw, there is source code of software that uses ACT-R, but it's in freaking Lisp: http://act-r.psy.cmu.edu/actr7.x/actr7.x.zip. This code is probably more ancient than both of us.

from srs-benchmark.

Expertium commented on June 24, 2024

http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2013/09/model-and-seq.zip

I can't download it.
EDIT: I downloaded it from the website.

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

The following equations should be more complete.

Could you calculate the $p_r(m)$ from following review history?

r_history = [0, 0, 1, 1, 0, 1]
t_history = [0, 4, 4, 15, 10, 1]
delta_t = 1

from srs-benchmark.

Expertium commented on June 24, 2024

The appendix (first link at the very top of this issue) has an example.

from srs-benchmark.

ElementaryZ commented on June 24, 2024

The following equations should be more complete.

Could you calculate the pr(m) from following review history?
r_history = [0, 0, 1, 1, 0, 1]
t_history = [0, 4, 4, 15, 10, 1]
delta_t = 1

I have python code that replicates the example in the appendix, I'll create a gist of that and see if I can calculate p from the history given. I'll just need to understand what r_history, t_history and delta_t represent.

from srs-benchmark.

Expertium commented on June 24, 2024

I don't know either, the actual dataset uses a different format.
Here's an example:
3000.csv
card_id is self-explanatory, review_th is some sort of order thingy that tells you which card was reviewed before which card (I think?), delta_t is the time elapsed between the last review and the new review and rating is like this: Again=1, Hard=2, Good=3, Easy=4.
And delta_t can be -1 for some reason, I don't know why. Only Sherlock knows how to use this stuff.

from srs-benchmark.

ElementaryZ commented on June 24, 2024

delta_t is the time elapsed between the last review and the new review and rating is like this: Again=1, Hard=2, Good=3, Easy=4.

Is delta_t measures in days? Since this model requires time in seconds from initial review.

from srs-benchmark.

Expertium commented on June 24, 2024

Yes, in days. Btw, none of the models in the benchmark use same-day reviews, so we won't need h (interference scalar). It would be unfair if this was the only model that uses same-day reviews.

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

My code:

import numpy as np

h = 1
a = 0.177
c = 0.217
tau = -0.704
s = 0.255

def next_m(t, m):
    if t == 0:
        return m
    return np.log(np.exp(m) + np.power(h * t, - c * np.exp(m) - a))

def p_recall(m):
    return 1 / (1 + np.exp((tau - m) / s))

m = -np.inf

for t in (0, 126, 252, 4844, 5877):
    m = next_m(t, m)
    print(m, p_recall(m))

Results:

-inf 0.0
-0.8560218975304116 0.35522173003731444
-0.42991484236460425 0.7455169723229881
-0.33159217939281044 0.8115973363882435
-0.2568675632858423 0.8523887453193757

The first three lines are consistent with the appendix. I don't know why the time changes hugely in the 3rd activation.

Edit: I get it. The t is not the time between adjacent activations. It's the time elapsed from the activation to now.

from srs-benchmark.

Expertium commented on June 24, 2024

~~It's because they transform time. Read about h.~~ Nevermind, you used transformed time.

It's the time elapsed from the activation to now.

And that too. I've mentioned that before.

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

The problem is the code has two nested loops and use lists to store the inter-output. It's hard to implement it in torch.

from srs-benchmark.

ElementaryZ commented on June 24, 2024

The problem is the code has two nested loops and use lists to store the inter-output. It's hard to implement it in torch.

I'll see if I can port it to torch. Not sure if gradient descent would handle this properly, I could also try using SciPy, which has more general optimization methods and use the same method as in the paper.

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

Never mind. I have implemented a basic version:

import torch

sp = torch.tensor([0, 126, 252, 4844, 5877]) # spacing

a = 0.176786766570677          # decay intercept
c = 0.216967308403809          # decay scale
s = 0.254893976981164          # noise
tau = -0.704205679427144       # threshold

m = torch.zeros_like(sp, dtype=torch.float)
m[0] = -torch.inf
t = torch.zeros_like(sp, dtype=torch.float)
d = torch.zeros_like(sp, dtype=torch.float)
d[0] = a

def act():
    for i in range(1, len(sp)):
        sumact = 0
        for j in range(1, i):
            mi = torch.exp(m[j])
            ti = (sp[i] - sp[j])
            sumact = sumact + ti**(-(c*mi + a))
        
        t1 = sp[i]
        act = torch.log(sumact+(t1**(-a)))
        
        m[i] = act
        t[i] = t1

def activation(m):
    return 1/(1+torch.exp((tau-m)/s))

act()
print("m: ", m)
print("t: ", t)


p = activation(m[1:])
print("p: ", p)

The next step is to move it into https://github.com/open-spaced-repetition/fsrs-benchmark/blob/main/other.py and make some necessary modification.

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

        sumact = 0
        for j in range(1, i):
            mi = torch.exp(m[j])
            ti = (sp[i] - sp[j])
            sumact = sumact + ti**(-(c*mi + a))

Avoid loop:

        sumact = torch.sum((sp[i] - sp[1:i])**(-(c*torch.exp(m[1:i]) + a)))

from srs-benchmark.

L-M-Sherlock commented on June 24, 2024

Model:

import torch
import torch.nn as nn
from torch import Tensor


class ACT_R(nn.Module):
    a = 0.176786766570677  # decay intercept
    c = 0.216967308403809  # decay scale
    s = 0.254893976981164  # noise
    tau = -0.704205679427144  # threshold
    init_w = [a, c, s, tau]

    def __init__(self):
        super().__init__()
        self.w = nn.Parameter(torch.tensor(self.init_w))

    def forward(self, sp: Tensor):
        """
        :param inputs: shape[seq_len, batch_size, 1]
        """
        m = torch.zeros_like(sp, dtype=torch.float)
        m[0] = -torch.inf
        for i in range(1, len(sp)):
            act = torch.log(
                torch.sum(
                    (sp[i] - sp[0:i]) ** (-(self.w[1] * torch.exp(m[0:i]) + self.w[0])),
                    dim=0,
                )
            )
            m[i] = act
        return self.activation(m)

    def activation(self, m):
        return 1 / (1 + torch.exp((self.w[3] - m) / self.w[2]))


model = ACT_R()

sp = torch.tensor(
    [[[0], [0]], [[126], [252]], [[252], [512]], [[4844], [9581]], [[5877], [18853]]]
)  # spacing
p = model(sp)
print("p: ", p)

Results:

p:  tensor([[[0.0000],
         [0.0000]],

        [[0.3563],
         [0.2550]],

        [[0.7433],
         [0.6352]],

        [[0.2920],
         [0.2174]],

        [[0.5843],
         [0.3131]]], grad_fn=<MulBackward0>)

The next step is to extract the spacing feature from the dataset.

from srs-benchmark.

[Feature request] Add the ACT-R model (see paper) about srs-benchmark HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent