A Deep Q-Learning Network (DQN) for the LunarLander-v2 in OpenAI Gym.
This code implements the DQN algorithm with experience replay and target network.
- Tensorflow v2
pip install gym
pip install box2d-py
pip install matplotlib
python dqn.py
During the training phase the agent uses an epsilon-greedy policy.
The network weights are saved every 25 episodes.
The below chart shows the moving average of Reward and Time Steps as well as loss values and the decay of epsilon over the episodes.
You can test the trained agent once checkpoints and log files are created by changing lines 116 and 117 in tester.py
.
The current tester code uses the weights of a trained network.
python tester.py