Giter Club home page Giter Club logo

cartpole-evolutionary-strategies's Introduction

Evolutionary Strategies for Reinforcement Learning: CartPole

Running

python3 es-cartpole.py

The above command will run the program.

Algorithm Explanation

This program uses Lambda-Mu Evolutionary Strategies in combination with a Neural Network with no hidden layers (essentially a linear combination of the inputs) to make an action decision at each time step.

The evolution process is as follows:

Generate population of lambda random agents
Evaluate each agent by running it
Get the mu best scoring agents (mu must divide lambda)
These mu agents ("parents") each produce lambda / mu "children"
  To produce a child, the "parent" is blurred with random noise
The children become the new population

Note that this is not Lambda+Mu ES, which is elitist.

An agent makes decisions using the following policy:

We suppose the observation has N real values ([o_1, ... , o_N]).
The agent has N real valued weights ([w_1, ... , w_N]).
The agent also has a real valued bias (b).
An action decision is made by linearly combining the weights/bias and the observation,
  then squishing via sigmoid to (0,1), then stretching this range out to the number
  of possible actions (A), then mapping this continuous space to the discrete space of actions
  via flooring.

Stated mathematically,

action = floor(A * ฯƒ(w_1*o_1 + w_2*o_2 + ... + w_N*o_N + b))

This weight/bias to output scenario can be viewed as a neural network from the inputs directly to a single output
(which is then mapped to the discrete action space) with no hidden layers.

Output Explanation

Every generation, the scores of each individual of the population is printed, along with the score of the best performing individual, and the average score across the population.

At the end of the program, a graph is displayed of the best score of each population over the generations.

CartPole Environment

This program can play "CartPole-v0" and "CartPole-v1" from OpenAI's Gym environment. The code is more generic, and may be adapted to play other games in future projects. CartPole is a game where the player must push a cart either left or right at each timestep, to balance a pole upright, like balancing a pencil on your palm. There are 4 variables in a state observation, and the score is how long an agent is able to last without dropping the pole past a certain angle. The action space has only 2 elements, either a left push or a right push. There is a maximum score, as the default CartPole environment cuts off the simulation after a set number of timesteps.

cartpole-evolutionary-strategies's People

Contributors

mparigi avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.