Giter Club home page Giter Club logo

Comments (21)

ElementaryZ avatar ElementaryZ commented on June 24, 2024 1

The model described in the paper is not exactly the same as the ACT-R model, but it is based on it from 2005. The latest version of ACT-R from http://act-r.psy.cmu.edu/software/ seems to be from 2023.

The equations are recursive and they also add an interference scalar $h$, which is not in the given equations due to focusing on the spacing effect and not the effect of interference. $h$ then depends on whether the intervals took place during review or not so it's not completely constant.

The following equations should be more complete.

$$ \begin{align} p_r(m) &= \frac{1}{1+e^{\frac{\tau-m}{s}}} \\ m_n(t_{1\dots n}) &= ln\left[\sum_{i=1}^n (h\times t_i)^{-d_i}\right] \\ d_i(m_{i-1})&=ce^{m_{i-1}}+a \end{align} $$

Example:
(EDIT: note that all previous presentations need to be calculated with their respective decay, therefore simply taking the previous output of $m_i$ is not sufficient, these have been updated to reflect that).

$$ \begin{align} m_1 &= ln\left[(h\times t_1)^{-a}\right] \\ m_2 &= ln\left[(h\times t_1)^{-a}+(h\times t_2)^{-(ce^{m_1}+a)}\right] \\ m_3 &= ln\left[(h\times t_1)^{-a}+(h\times t_2)^{-(ce^{m_2}+a)}+(h\times t_3)^{-(ce^{m_3}+a)}\right] \\ &\vdots \end{align} $$

where $t_0=0$ and $m_0=-\infty$.

$\tau$ - threshold parameter
$s$ - measure of noise
$p_r$ - probability of recall
$t_i$ - time until $i$ th practice ($t_i = spacing_n-spacing_i$)
$c$ - decay scale
$a$ - decay intercept
$d_i$ - decay rate
$h$ - interference scalar
$m_i$ - activation

The code for the model is available in Excel format on their website http://act-r.psy.cmu.edu/?post_type=publications&p=14206, under downloads in the Model and Sequence Files: http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2013/09/model-and-seq.zip

from srs-benchmark.

ElementaryZ avatar ElementaryZ commented on June 24, 2024 1

My code for example:

import math

sp = [0, 126, 252, 4844, 5877] # spacing

a = 0.176786766570677          # decay intercept
c = 0.216967308403809          # decay scale
s = 0.254893976981164          # noise
tau = -0.704205679427144       # threshold

m = [-999999]
t = [0]
d = [a]

def act():
    for u in range(0, len(sp)-1):
        sumact = 0
        prev = len(m)
        for i in range(1,prev):
            mi = math.exp(m[i])
            ti = sp[prev] - sp[i]
            sumact = sumact + ti**(-(c*mi + a))
        
        t1 = sp[prev]
        act = math.log(sumact+(t1**(-a)))
        
        m.append(act)
        t.append(t1)

def activation(m):
    return 1/(1+math.exp((tau-m)/s))

act()
print("m: ", m)
print("t: ", t)

p = map(activation, m[1:])
print("p: ", list(p))

Results:

m:  [-999999, -0.8549906405542196, -0.4332005949791639, -0.9299686617111677, -0.6174756722171177]
t:  [0, 126, 252, 4844, 5877]
p:  [0.3562771058516888, 0.7433029478836352, 0.2919952459321521, 0.584253471491992]

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024

Could you rewrite the ACT-R model in state-transition equations of D, S and R?

By the way, I'm benchmarking FSRS with short-term schedule. It reduces 3.2% RMSE(bins) compared with FSRS-4.5. I'm wondering whether it's worth to release it.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

Could you rewrite the ACT-R model in state-transition equations of D, S and R?

Well, it's not exactly the same as DSR, but if you're asking me to code it - I'll do my best.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

It seems that their approach is quite different from ours. The first review (delta_t=0) is called "study" and consecutive reviews are called "tests". So their first "test" is our second review, we'll have to discard the first review. The way they calculate R is also different, and this is a bit difficult to explain. We use r = power_forgetting_curve(X[:, 0], state[:, 0]) and calculate new_s later, but what they do is more like r = power_forgetting_curve(X[:, 0], new_s). Basically, we use S[n] to calculate R[n+1], they use m[n] (activation) to calculate R[n].
Because of these differences I cannot implement this model myself.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

I just realized that there is another problem: the way they calculate delta_t. We calculate delta_t as the difference between the most recent review and the previous review, but they calculate it as the total time since the first review.
Suppose that our delta_t's look like this:
1 day
2 day
5 days
15 days

Their delta_t's would look like this
1 day
3 days
8 days
23 days

And that's not everything. Even though their notation says
image, it's actually not what it looks. Calculating that sum seems complicated and their notation is misleading. Read the appendix in the linked paper.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

Btw, there is source code of software that uses ACT-R, but it's in freaking Lisp: http://act-r.psy.cmu.edu/actr7.x/actr7.x.zip. This code is probably more ancient than both of us.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2013/09/model-and-seq.zip

I can't download it.
EDIT: I downloaded it from the website.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024

The following equations should be more complete.

Could you calculate the $p_r(m)$ from following review history?

r_history = [0, 0, 1, 1, 0, 1]
t_history = [0, 4, 4, 15, 10, 1]
delta_t = 1

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

The appendix (first link at the very top of this issue) has an example.
image

from srs-benchmark.

ElementaryZ avatar ElementaryZ commented on June 24, 2024

The following equations should be more complete.

Could you calculate the pr(m) from following review history?

r_history = [0, 0, 1, 1, 0, 1]
t_history = [0, 4, 4, 15, 10, 1]
delta_t = 1

I have python code that replicates the example in the appendix, I'll create a gist of that and see if I can calculate p from the history given. I'll just need to understand what r_history, t_history and delta_t represent.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

I don't know either, the actual dataset uses a different format.
Here's an example:
3000.csv
card_id is self-explanatory, review_th is some sort of order thingy that tells you which card was reviewed before which card (I think?), delta_t is the time elapsed between the last review and the new review and rating is like this: Again=1, Hard=2, Good=3, Easy=4.
And delta_t can be -1 for some reason, I don't know why. Only Sherlock knows how to use this stuff.

from srs-benchmark.

ElementaryZ avatar ElementaryZ commented on June 24, 2024

delta_t is the time elapsed between the last review and the new review and rating is like this: Again=1, Hard=2, Good=3, Easy=4.

Is delta_t measures in days? Since this model requires time in seconds from initial review.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

Yes, in days. Btw, none of the models in the benchmark use same-day reviews, so we won't need h (interference scalar). It would be unfair if this was the only model that uses same-day reviews.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024

My code:

import numpy as np

h = 1
a = 0.177
c = 0.217
tau = -0.704
s = 0.255

def next_m(t, m):
    if t == 0:
        return m
    return np.log(np.exp(m) + np.power(h * t, - c * np.exp(m) - a))

def p_recall(m):
    return 1 / (1 + np.exp((tau - m) / s))

m = -np.inf

for t in (0, 126, 252, 4844, 5877):
    m = next_m(t, m)
    print(m, p_recall(m))

Results:

-inf 0.0
-0.8560218975304116 0.35522173003731444
-0.42991484236460425 0.7455169723229881
-0.33159217939281044 0.8115973363882435
-0.2568675632858423 0.8523887453193757

The first three lines are consistent with the appendix. I don't know why the time changes hugely in the 3rd activation.

Edit: I get it. The t is not the time between adjacent activations. It's the time elapsed from the activation to now.

from srs-benchmark.

Expertium avatar Expertium commented on June 24, 2024

It's because they transform time. Read about h. Nevermind, you used transformed time.
image

It's the time elapsed from the activation to now.

And that too. I've mentioned that before.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024

The problem is the code has two nested loops and use lists to store the inter-output. It's hard to implement it in torch.

from srs-benchmark.

ElementaryZ avatar ElementaryZ commented on June 24, 2024

The problem is the code has two nested loops and use lists to store the inter-output. It's hard to implement it in torch.

I'll see if I can port it to torch. Not sure if gradient descent would handle this properly, I could also try using SciPy, which has more general optimization methods and use the same method as in the paper.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024

Never mind. I have implemented a basic version:

import torch

sp = torch.tensor([0, 126, 252, 4844, 5877]) # spacing

a = 0.176786766570677          # decay intercept
c = 0.216967308403809          # decay scale
s = 0.254893976981164          # noise
tau = -0.704205679427144       # threshold

m = torch.zeros_like(sp, dtype=torch.float)
m[0] = -torch.inf
t = torch.zeros_like(sp, dtype=torch.float)
d = torch.zeros_like(sp, dtype=torch.float)
d[0] = a

def act():
    for i in range(1, len(sp)):
        sumact = 0
        for j in range(1, i):
            mi = torch.exp(m[j])
            ti = (sp[i] - sp[j])
            sumact = sumact + ti**(-(c*mi + a))
        
        t1 = sp[i]
        act = torch.log(sumact+(t1**(-a)))
        
        m[i] = act
        t[i] = t1

def activation(m):
    return 1/(1+torch.exp((tau-m)/s))

act()
print("m: ", m)
print("t: ", t)


p = activation(m[1:])
print("p: ", p)

The next step is to move it into https://github.com/open-spaced-repetition/fsrs-benchmark/blob/main/other.py and make some necessary modification.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024
        sumact = 0
        for j in range(1, i):
            mi = torch.exp(m[j])
            ti = (sp[i] - sp[j])
            sumact = sumact + ti**(-(c*mi + a))

Avoid loop:

        sumact = torch.sum((sp[i] - sp[1:i])**(-(c*torch.exp(m[1:i]) + a)))

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on June 24, 2024

Model:

import torch
import torch.nn as nn
from torch import Tensor


class ACT_R(nn.Module):
    a = 0.176786766570677  # decay intercept
    c = 0.216967308403809  # decay scale
    s = 0.254893976981164  # noise
    tau = -0.704205679427144  # threshold
    init_w = [a, c, s, tau]

    def __init__(self):
        super().__init__()
        self.w = nn.Parameter(torch.tensor(self.init_w))

    def forward(self, sp: Tensor):
        """
        :param inputs: shape[seq_len, batch_size, 1]
        """
        m = torch.zeros_like(sp, dtype=torch.float)
        m[0] = -torch.inf
        for i in range(1, len(sp)):
            act = torch.log(
                torch.sum(
                    (sp[i] - sp[0:i]) ** (-(self.w[1] * torch.exp(m[0:i]) + self.w[0])),
                    dim=0,
                )
            )
            m[i] = act
        return self.activation(m)

    def activation(self, m):
        return 1 / (1 + torch.exp((self.w[3] - m) / self.w[2]))


model = ACT_R()

sp = torch.tensor(
    [[[0], [0]], [[126], [252]], [[252], [512]], [[4844], [9581]], [[5877], [18853]]]
)  # spacing
p = model(sp)
print("p: ", p)

Results:

p:  tensor([[[0.0000],
         [0.0000]],

        [[0.3563],
         [0.2550]],

        [[0.7433],
         [0.6352]],

        [[0.2920],
         [0.2174]],

        [[0.5843],
         [0.3131]]], grad_fn=<MulBackward0>)

The next step is to extract the spacing feature from the dataset.

from srs-benchmark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.