infer-actively / pymdp Goto Github PK
View Code? Open in Web Editor NEWA Python implementation of active inference for Markov Decision Processes
License: MIT License
A Python implementation of active inference for Markov Decision Processes
License: MIT License
.github/workflows/python-package.yml:
- name: Lint with flake8
run: |
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
While it is true that the code gets linted, it happens in context of a github docker image and is lost after the tests are run.
If you want auto-linting to happen before commits, there are a variety of ways to do it, but I suggest looking into precommit. If you set expectations that code is linted before commit, this step can be updated so that it fails if the linting results in changing the files at all, because it implies that the files weren't linted prior to committing.
Nothing is perfect and there are ways around any such solution, but you can see how we're using precommit to make sure that static artifacts are build correctly before commits here: https://github.com/VersesTech/c4-verses-ai/blob/main/.pre-commit-config.yaml
Hey there,
First of all, I'm grateful for your work and your effort to make active inference more accessible for everyone. I'm quite new to this approach, so I started playing around with the provided notebooks.
I was wondering why the (controllable) transition matrix B (for the Agent class inside pymdp.agent) accepts direct state interaction only, i.e. it has to be in the form s^i_{t+1} = B(s^i_t, a) and not in a form with internal state interactions like s^i_{t+1} = B(s^i_t, s^j_t ,a), where s^i, s^j refer to different hidden states. Am I wrong here? Because imho it shouldn't be a problem, since you can always define a world state \tilde{s}(s^i, s^j), but makes your program more complicated.
Best,
Martin
Right now, action precision (alpha
) in case of stochastic action sampling, is defaulted to alpha = 16.0
within the function of the control
module sample_action(..., alpha = 16.0)
. Change this so that an instance of Agent.agent
itself stores the action precision, and calls it in its method self.sample_action()
.
Originally posted by @helenegu in #4 (comment)
I think it would be helpful to have the badges in README to link to something other than the picture of the badge.
E.g. the pypi could link to https://pypi.org/project/inferactively-pymdp/
Further I would suggest a badge to link to your read the docs site, and if you have CI set up for your pytest tests, a badge for that would be very helpful
Hi!
I think it would be useful to provide full API documentation for your package on your RTD website.
For example, I think the agent class and the envs subpackage are quite important, but I can't find API docs for them on the site.
I'd find it easier to follow if they were listed in the modules section of the site, or even if you wanted to provide a separate full API documentation section that would work too I think.
Thanks!
AssertionError Traceback (most recent call last)
in
----> 1 A = utils.convert_A_stub_to_ndarray(A_stub, model_labels)
~\pymdp\core\utils.py in convert_A_stub_to_ndarray(A_stub, model_labels)
490 for g, modality_name in enumerate(model_labels['observations'].keys()):
491 A[g] = A_stub.loc[modality_name].to_numpy().reshape(num_obs[g], *num_states)
--> 492 assert (A[g].sum(axis=0) == 1.0).all(), 'A matrix not normalized! Check your initialization....\n'
493
494 return A
AssertionError: A matrix not normalized! Check your initialization....
or unrelated?
Thanks
Originally posted by @osaaso3 in #26 (comment)
In line 149 of agent.py
, this assertion statement is used:
assert all([n_c == max_action for (n_c, max_action) in zip(self.num_controls, list(np.max(all_policies, axis =0)+1))]), "Maximum number of actions is not consistent with `num_controls`"
But sometimes certain actions will not be allowed (they will be pruned / absent in policies
) and thus even when taking the maximum across all policies, you won't see the maximum action taken within a given control factor.
Therefore, @tverbele suggests to change this to
assert all([n_c >= max_action for (n_c, max_action) in zip(self.num_controls, list(np.max(all_policies, axis =0)+1))]), "Maximum number of actions is not consistent with `num_controls`"
When going through mmp with @conorheins yesterday, I noticed this section:
# past message
if t == 0:
lnB_past = spm_log(prior[f])
else:
past_msg = B[f][:, :, int(policy[t - 1, f])].dot(qs_seq[t - 1][f])
lnB_past = spm_log(past_msg)
# future message
if t >= future_cutoff:
lnB_future = qs_T[f]
else:
future_msg = trans_B[f][:, :, int(policy[t, f])].dot(qs_seq[t + 1][f])
lnB_future = spm_log(future_msg)
I am not sure if the past_msg
and future_msg
lines are consistent with their use of t-1
, t
and t +1
. I think this is worth checking against 1) the original SPM to see what they do and 2) deciding what the correct behaviour should be.
Add better error /warning messages when you try to initialise an Agent()
instance with the MMP inference algorithm (inference_algo="MMP"
) , but that choice doesn't make sense (inconsistent other choices of policy_depth
, etc.).
Hi, thank you for this package I'm really enjoying learning about active inference.
I deeply appreciate the contributors to this package.
However, I have a question when I tried to implement the explore-exploit task in this article (Smith et al., A step-by-step tutorial on active inference and its application to empirical data, https://doi.org/10.1016/j.jmp.2021.102632) which is already implemented in MATLAB and "pymdp".
I tried to run a loop for active inference of deep policy (two time-steps) according to the "complete recipe for active inference" as written in the "pymdp" tutorial notebook, but I found that the "sample_action" method of the "Agent" class only sample action from the first timestep of policy (each policy has the shape of (2,2), the first dim is the number of timesteps and the second dim is the number of factors) using "control.sample_policy" function as below:
(line 674-675, control.py)
for factor_i in range(num_factors):
selected_policy[factor_i] = policies[policy_idx][0, factor_i]
My setting of the agent class was:
timepoints = [0,1,2]
agent = Agent(
A = A_gm,
B = B,
C = C,
D = D_gm,
E = E,
pA = pA,
pD = pD,
policies = policies,
policy_len = policies[0].shape[0],
inference_horizon = len(timepoints),
inference_algo="MMP",
sampling_mode="full",
modalities_to_learn=[1],
use_BMA = True,
policy_sep_prior = False,
)
In my thought, to sample the action of the other timestep in each policy, line 675 would be better if changed like this:
selected_policy[factor_i] = policies[policy_idx][timestep, factor_i]
If I didn't understand this package well, then please let me know how to correct it.
Thank you!
In line 95 of agent.py
assert utils.is_normalized(self.B), "A matrix is not normalized (i.e. A.sum(axis = 0) must all equal 1.0"
I believe this is an error, as it should be the B matrix not the A matrix.
Originally posted by @mahault in #4 (comment)
Thanks @conorheins and @Arun-Niranjan,
things getting better.
The A-matrix worked with your my-A-matrix but B-matrix was not fixed I think. A sample pandas matrix might help.
An issue remained with T-maze-learning (T-maze-demo works fine), that I pase it here but I can open an issue if needed.
Best
Originally posted by @osaaso3 in #26 (comment)
Dear researchers
Unfortunatelly this wonderfull pyton toolbox for active inference cannot be opened in colab since Thursday 23.3.23 evening. What can be done?
I hope this can be fixed soon! Thank you in advance.
Best
Susanne Valavanis
I see that there are a lot of print statements build into the library (e.g. print(f'Factorized version: lnA at time {t}: {lnA}')
in pymdp/algos/mmp.py). It would be preferrable to use a canonical logging library. This is a little article making the case for using logging rather than raw print statements: https://www.loggly.com/blog/4-reasons-a-python-logging-library-is-much-better-than-putting-print-statements-everywhere/
This PR should do the trick: #140
Add an option to control.sample_action()
or perhaps create an entirely new function in control.py
, that allows you to directly sample a policy from q_pi
and take the action entailed by that policy along each control factor, at the first timestep.
Thanks to @helenegu for requesting this
Dear sir,
In the official document "Active Inference Demo: Epistemological Chains",
Starting with the 0-th modality, the Location observation modality: A[0]
A[0] = np.tile(np.expand_dims(np.eye(num_grid_points), (-2, -1)), (1, 1, num_states[1], num_states[2]))
i run it but with the error "A matrix is not normalized (i.e. A.sum(axis = 0) must all be equal to 1.0)",this looks initialized but not normalized
Add functions to learning
module (and update corresponding update_A()
and update_B()
methods of Agent
class) to allow learning to incorporate an entire past sequence of hidden state posteriors, rather than just a single hidden state posterior.
I thought it'd be good to get a discussion going on how we represent distributions in pymdp.
Currently we have the categorical and dirichlet classes, which I believe we are trying to move away from as they're tricky to use and it means all of the algorithms which call them have to constantly check if their inputs are one of these classes or not.
It seems for now we're moving towards using numpy arrays of numpy arrays to shift data around. IIUC This is necessary because we often have varying sub structures depending on the number of factors involved (e.g. we have multi-dimensional tensors representing an agents priors, policy choices etc.)
I previously considered whether we should write our own dataclass for representing what we need, but I think it's likely to end up the same way as the Dirichlet/Categorical classes. It would be a step forward, but still vulnerable to the same problems of being fragile to refactor and improve upon.
I suggest we use the third party Awkward Array to solve this problem for us. They seem to have a stable API, and we should be able to use any existing numpy methods with those arrays as long as they are not jagged (which we have to sort out anyway in the current implementation).
I think using this library will provide the following benefits:
There is always a risk of introducing third party dependencies, but I am of the opinion that if it's good enough for particle physics (tracks and decay events) at the scale CERN is dealing with, it is almost certainly good enough for us.
@conorheins @alec-tschantz what do you think? I could have a go at introducing it in a PR if you think this idea is worth investigating further
(tutorial video here is nice: https://www.youtube.com/watch?v=WlnUF3LRBj4)
@OzanCatalVerses pointed out that norm_dist
within the pymdp.utils
module strangely treats the case of 3-tensors differently than other numbers of dimensions, see here and code below.
def norm_dist(dist):
""" Normalizes a Categorical probability distribution (or set of them) assuming sufficient statistics are stored in leading dimension"""
if dist.ndim == 3:
new_dist = np.zeros_like(dist)
for c in range(dist.shape[2]):
new_dist[:, :, c] = np.divide(dist[:, :, c], dist[:, :, c].sum(axis=0))
return new_dist
else:
return np.divide(dist, dist.sum(axis=0))
Will correct this to:
def norm_dist(dist):
""" Normalizes a Categorical probability distribution (or set of them) assuming sufficient statistics are stored in leading dimension"""
return np.divide(dist, dist.sum(axis=0))
Attribute error in agent.py
at line 124:
Line 124 in 4a81f1b
whennum_controls
is passed in as argument to agent.Agent()
constructor. Need an else statement after 118:
Line 118 in 4a81f1b
num_controls
gets assigned to a property of self
if provided as argument.
Originally posted by @conorheins in #4 (reply in thread)
Hi~
I want to know if this package can fit model to behavior datas collected by my experiment?
Or will there be future plans to implement this functionality?
Thanks~
There is no LOG in the cross entropy on line 368.
Hello,
Thanks so much with your work on bringing Active Inference to Python.
While going through your epistemic chaining demo, it appears there is a problem with the agent navigation. When I moved cue1 location from (2,0) to (0,1), the agent takes two moves downward and then tries to move "left" into the wall. The agent never recovers from this and doesn't seem to know to try a different direction.
I assume this is a problem with the B Matrix but I'm not smart enough to figure out if this is some challenge in the agent class or in the rules set up during the demo itself for (["UP", "DOWN", "LEFT", "RIGHT", "STAY"]); elif
Any help/advice would be greatly appreciated! Please see the output log from the agent movements below . The only change I make to your demo is the my_env section where I change the cue1_loc to (0,1) - you'll see that once it completes the second action it tries to go LEFT... then STAY, then tries LEFT a few more times:
Action at time 0: DOWN
Grid location at time 0: (1, 0)
Reward at time 0: Null
Action at time 1: DOWN
Grid location at time 1: (2, 0)
Reward at time 1: Null
Action at time 2: LEFT
Grid location at time 2: (2, 0)
Reward at time 2: Null
Action at time 3: STAY
Grid location at time 3: (2, 0)
Reward at time 3: Null
Action at time 4: STAY
Grid location at time 4: (2, 0)
Reward at time 4: Null
Action at time 5: STAY
Grid location at time 5: (2, 0)
Reward at time 5: Null
Action at time 6: LEFT
Grid location at time 6: (2, 0)
Reward at time 6: Null
Action at time 7: LEFT
Grid location at time 7: (2, 0)
Reward at time 7: Null
Action at time 8: LEFT
Grid location at time 8: (2, 0)
Reward at time 8: Null
Action at time 9: LEFT
Grid location at time 9: (2, 0)
Reward at time 9: Null
hi, guys, i have a question about calculating the free energy. While calculating einsum, why only sum to the first dim:
arg_list = [X, list(range(X.ndim))] + list(chain(*([x[xdim_i],[dims[xdim_i]]] for xdim_i in range(len(x))))) + [[0]]
and then use:
spm_dot(likelihood, qs)[0]
to fetch only the first level of s0 (suppose hidden state s is factorized as s0, s1, ... ) ?
in my opinion, we should sum over all the tensor by removing the last + [[0]] and the Y should just be a scalar.
am I totally wrong, if i am, please correct me, thanks a lot!
allow users to specify an optional list of hidden state factor indices when passing in A
and B
arrays, that specify which hidden state factors correspond to the lagging dimensions of these tensors. The purpose of this feature is to speed up computation speed and limit memory consumption. It leverages the (often-satisfied) assumption in generative models that not all observation modalities depend on all hidden state factors. This means we don't have to include many of the lagging dimensions in the likelihood tensors that correspond to hidden state factors that the observation modality in question doesn't depend on. The same goes for the B
arrays, which are already accordingly sparse due to the baked-in mean-field approximation of the posterior (statistical independence of hidden state factors in the prior).
A
and one for B
), structured as follows:A_factor_list
, where A_factor_list[m]
contains a list of the hidden state factor indices that correspond to the factors that observation modality m
is conditionally dependent on.B_factor_list
where B_factor_list[f]
is the list of hidden state factor indices that are needed to predict the dynamics of factor f
. For now we will assume control factors are still fully conditionally independent, such that one B[f]
sub-array only depends on the control state factor with index f
, even though it may depend on other hidden state factors.completing the above will also require building in the option of having additional dependencies in each factor-specific B
array, on the other hidden state factors.
In line 180 of agent.py
, error message should be changed to:
"D vector is not normalized (i.e. D[f].sum() must all equal 1.0)"
Originally posted by @helenegu in #73 (comment)
Bug found by @SamGijsen:
Running "stochastic" action selection throws an error when using the current version. I believe it's because actions are sampled even if num_controls=1, in which case utils.sample() attempts to squeeze an array of size 1. Reverting to deterministic selection if
num_controls==1
has fixed this for me.
(control.sample_action() line 565if action_selection == 'deterministic' or num_controls[factor_i]==1:
) Happy to submit a PR if that's useful, otherwise I believe it's just this line of code.
Originally posted by @SamGijsen in #82 (comment)
I really enjoyed this tutorial, overall. It shows the power of Agent. I think it's a little hard to read in some places, and I wanted to flag some of these issues:
One thing that's not explained here is how the agent's inference mechanism differs from those of other, more well-known agents from the RL literature. Having read the first few chapters of Richard Sutton's book eons ago, I wondered whether the inference was equivalent to a finite horizon dynamic programming solution, or similar in spirit to an agent that uses a UCB heuristic or Thompson sampling. If you could have a few references in the tutorial about this, it would be great.
Hello!
I couldn't find implementation of habit learning (updating of prior over policies E) and precision(gamma) updating through betta prior.
Am I missing it? And if not, is it planned for future releases?
Thanks!
The following new Arxiv reference may be noted in the readme and/or docs, etc.:
https://arxiv.org/abs/2201.03904
pymdp: A Python library for active inference in discrete state spaces
When action_selection == "determinstic"
in control.sample_action()
, but all the action probabilities are equal, we should sample rather than deterministically choose the first action (which is the default behavior of np.argmax
Creating an object array from a list of numpy arrays seems to break if the component arrays of the list have matching leading axis. For example
arrs = [np.zeros((3, 6)), np.zeros((3, 4, 5))]
obj_arr = np.array(arrs, dtype=object)
will throw the following error:
ValueError: could not broadcast input array from shape (3,6) into shape (3,)
whereas
arrs = [np.zeros((4, 6)), np.zeros((3, 4, 5))]
obj_arr = np.array(arrs, dtype=object)
works just fine.
Hi all,
First of, many thanks for your work, it looks very promising!
I was comparing some simple agents between the current and matlab implementation and noticed that in terms of reversal learning, the pymdp version appears to adapt considerably slower. I've played around with a variety of setups and hyperparameters but the difference is quite significant.
Example setup: slot machine task without hints
2 actions: 'button' 0 and 'button' 1
A 'button 0 is better' context and a 'button 1 is better' context.
40 trials, with the hidden context switching after 20 trials. Here I plot the state posterior (black) of 100 agents starting with flat context priors, compared to the true but unknown state/context (red). Below I'll include the pymdp code. I'm assuming I'm using the package wrong, and would love to know my misunderstanding.
import pymdp
from pymdp import utils
from pymdp.agent import Agent
import numpy as np
import matplotlib.pyplot as plt
num_obs = [3, 3] # 3 Rewards, 3 choice observations
num_states = [3, 2] # 3 choice states, 2 hidden states
num_factors = len(num_states)
# Press one of two buttons
num_controls = [2, 1]
A_shapes = [[o_dim] + num_states for o_dim in num_obs]
# initialize the A array to all 0's
A = utils.obj_array_zeros(A_shapes)
# reward probabilities
r1=0.9
r2=0.9
# Reward observations
# STATE 1 Start a0 a1
A[0][0,:,0] = [0, 1-r1, r2 ] # Positive reward
A[0][1,:,0] = [0, r1 , 1-r2] # Negative reward
A[0][2,:,0] = [1, 0 , 0 ] # Neutral (start state)
# STATE 2 Start a0 a1
A[0][0,:,1] = [0, r1 , 1-r2] # Positive
A[0][1,:,1] = [0, 1-r1, r2 ] # Negative
A[0][2,:,1] = [1, 0 , 0 ] # Neutral (start state)
# No uncertainty about choice observations
A[1][:,:,0] = np.eye(num_obs[1])
A[1][:,:,1] = np.eye(num_obs[1])
B_shapes = [[s_dim, s_dim, num_controls[f]] for f, s_dim in enumerate(num_states)]
B = utils.obj_array_zeros(B_shapes)
for i in range(2):
B[0][0,:,i] = np.ones(3)
B[0][:,0,0] = [0, 1, 0] # action 0: Start -> a0
B[0][:,0,1] = [0, 0, 1] # action 1: Start -> a1
B[1][:,:,0] = np.eye(num_states[1])
C = utils.obj_array_zeros(num_obs)
C[0] = np.array([1, -1, 0]) # Prefer rewards
D = utils.obj_array_uniform(num_states)
D[0] = np.array([1, 0, 0]) # Start in the 'start'-state
# ENVIRONMENT
class my_env():
def __init__(self, A):
self.A = A
def step(self, action, state):
obs = utils.sample(self.A[0][:, action[0].astype(int)+1, state])
return [obs, action[0].astype(int)+1]
# SIMULATIONS
T = 40
alpha = 16
gamma = 16
AS = "deterministic"
for run in range(100):
D2 = D.copy()
model = Agent(A=A, B=B, C=C, D=D, policy_len=1, action_selection=AS)
switches = [20,40,50]
state = 0
states = []
pstate = []
pact = []
e = my_env(A)
model.infer_states([2,0]) # 2 = neutral obs, 0 = start state
for t in range(T):
# if t > 0:
# D2[1] = model.qs[1]
# model = Agent(A=A, B=B, C=C, D=D2, policy_len=1, action_selection=AS)
if t in switches:
state = 1 - state
states.append(state)
# Start position for the trial (I believe you don't use this in the tutorial, but it doesnt seem to matter much)
model.infer_states([2,0]) # 2 = neutral reward, 0 = start state observation
q_pi, neg_efe = model.infer_policies()
action = model.sample_action()
obs = e.step(action, state=state)
model.infer_states(obs)
# Save belief and output
pstate.append(model.qs[1][0])
pact.append(q_pi[0])
plt.plot([1-s for s in pstate], label="p(s)", linewidth=3, alpha=0.1, color='k')
plt.plot([s*1.1-0.05 for s in states], label="s", color="r", linewidth=3)
plt.xlabel("trial")
plt.ylim([-0.05, 1.05])
plt.title("Python")
There are two instances which read THe, which should be changed to The
A heading is mislabeled:
Should read:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.