plot_policies.py: Plotting policies in 2D heat map.
policy_eval_recurrent.py, policy_eval_recurrent.py: Evaluation of the policies produced by SAL and AL algorithms.
policy_evaluate.py: Plotting the similarity curve of the target policy and the produced policies.
policy_mixer.py: Mixing historic policies into new policies for AL or SAL.
quad_env.py: Quadrotor environment for simulations.
rewardconstruct.py: Reward function contstructions.
rl_3D.py: RL simulation with Q-learning.
rl_3d_agent.py: Q-learning agent.
rl_policy_dir.py: The simulation that directly use policies rather than state-action value functions.
state_action_value.py: State-action value function.
How to use
Run RL (Q-learning) experiments: Open rl_3D.py, scroll down to the last part. Under "if name == "main"" change parameters then run.
Run RL (Q-learning) with policy files: Policy files contains numbers in [0, 1] as the probability of selecting "action 1". Open rl_policy_dir.py, scroll down.
Choose the directory that contains policy file then run.
Run AL and SAL experiments: Open experiments.py or experiments_sal.py, enter directories and other parameters, then run.
Plot AL and SAL results: Open plot_AL_results.py, plot_SAL_AL.py, plot_SAL_results.py or plot_timetraces.py, select directories, then run.
Notice
The default setting for number of repeats of AL and SAL is 10. It usually takes more than 4 hours to finish.