Giter Club home page Giter Club logo

gridworld's Introduction

SARSA: Windy Gridworld

I am going to attempt to implement on-policy Sarsa to solve the task Windy Gridworld, a task detailed in page 130 from the Introduction to Reinforcement Learning by Sutton & Barto. The Sarsa algorithm is also detailed in that page.

Both policies are stochastic, but overtime both become of the policies will become deterministic. Therefore, the two policies should be e-greedy policies. This will ensure exploration. e-greedy policies are described in page 100.

To run the algorithm, type make AI in the linux terminal. It will then show you the Episodes, Steps, Last total steps, and the total timesteps. Pay close attention to the "Last total steps", as that will tell how many timesteps the agent (a.k.a the computer) had to take to reach the goal state. If that number is around 15-20 after ~8000 total timesteps, the agent has successfully been trained on the environment. Simply hit b to break the program.

You could also run the algorithm to see how the computer plays. Type make AISim in your linux terminal. This will show you the Last total reward, the total timesteps, the episode, the steps, the explore toggle, and the game itself. If the simulation is too fast or too slow, simply change the delay variable (in milliseconds) in AISim.cpp. You could hit b to end the program (you may need to press it multiple times). The last total reward is the same thing as last total timesteps, but it is negated (since each action will result in -1 reward)

You could use the explore toggle, which will decide whether the agent should look and execute new actions. The explore toggle is usually switched off (0 = off, 1 = on in the simulation) after it has been trained for a decent amount of timesteps. If the computer does not explore at beginning of training, the computer will do fairly poor at the environment. The agent should be trained after ~8000 total timesteps.

If you have any question, you are welcome to email me at [email protected]

gridworld's People

Contributors

eshaancoding avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.