Giter Club home page Giter Club logo

reinforcement-learning-an-introduction's Introduction

Reinforcement Learning: An Introduction

Build Status

Python code for Sutton & Barto's book Reinforcement Learning: An Introduction (2nd Edition)

If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly.

Contents

Chapter 1

  1. Tic-Tac-Toe

Chapter 2

  1. Figure 2.1: An exemplary bandit problem from the 10-armed testbed
  2. Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed
  3. Figure 2.3: Optimistic initial action-value estimates
  4. Figure 2.4: Average performance of UCB action selection on the 10-armed testbed
  5. Figure 2.5: Average performance of the gradient bandit algorithm
  6. Figure 2.6: A parameter study of the various bandit algorithms

Chapter 3

  1. Figure 3.2: Grid example with random policy
  2. Figure 3.5: Optimal solutions to the gridworld example

Chapter 4

  1. Figure 4.1: Convergence of iterative policy evaluation on a small gridworld
  2. Figure 4.2: Jack’s car rental problem
  3. Figure 4.3: The solution to the gambler’s problem

Chapter 5

  1. Figure 5.1: Approximate state-value functions for the blackjack policy
  2. Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES
  3. Figure 5.3: Weighted importance sampling
  4. Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates

Chapter 6

  1. Example 6.2: Random walk
  2. Figure 6.2: Batch updating
  3. Figure 6.3: Sarsa applied to windy grid world
  4. Figure 6.4: The cliff-walking task
  5. Figure 6.6: Interim and asymptotic performance of TD control methods
  6. Figure 6.7: Comparison of Q-learning and Double Q-learning

Chapter 7

  1. Figure 7.2: Performance of n-step TD methods on 19-state random walk

Chapter 8

  1. Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps
  2. Figure 8.4: Average performance of Dyna agents on a blocking task
  3. Figure 8.5: Average performance of Dyna agents on a shortcut task
  4. Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task
  5. Figure 8.7: Comparison of efficiency of expected and sample updates
  6. Figure 8.8: Relative efficiency of different update distributions

Chapter 9

  1. Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task
  2. Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task
  3. Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task
  4. Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy
  5. Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task

Chapter 10

  1. Figure 10.1: The cost-to-go function for Mountain Car task in one run
  2. Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
  3. Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
  4. Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
  5. Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

  1. Figure 11.2: Baird's Counterexample
  2. Figure 11.6: The behavior of the TDC algorithm on Baird’s counterexample
  3. Figure 11.7: The behavior of the ETD algorithm in expectation on Baird’s counterexample

Chapter 12

  1. Figure 12.3: Off-line λ-return algorithm on 19-state random walk
  2. Figure 12.6: TD(λ) algorithm on 19-state random walk
  3. Figure 12.8: True online TD(λ) algorithm on 19-state random walk
  4. Figure 12.10: Sarsa(λ) with replacing traces on Mountain Car
  5. Figure 12.11: Summary comparison of Sarsa(λ) algorithms on Mountain Car

Chapter 13

  1. Example 13.1: Short corridor with switched actions
  2. Figure 13.1: REINFORCE on the short-corridor grid world
  3. Figure 13.2: REINFORCE with baseline on the short-corridor grid-world

Environment

Usage

All files are self-contained

python any_file_you_want.py

Contribution

If you want to contribute some missing examples or fix some bugs, feel free to open an issue or make a pull request.

Following are missing figures/examples:

  • Figure 12.14: The effect of λ

reinforcement-learning-an-introduction's People

Contributors

aoboturov avatar cbrom avatar datahaki avatar findmyway avatar gwding avatar huanghua1668 avatar justinnie avatar kentan avatar loopinvariant4 avatar ndvanforeest avatar sergii-bond avatar shangtongzhang avatar tegg89 avatar vfdev-5 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.