Giter Club home page Giter Club logo

rllib's Introduction

RLLib

(C++ Template Library to Predict, Control, Learn Behaviors, and Represent Learnable Knowledge using On/Off Policy Reinforcement Learning)

RLLib is a lightweight C++ template library that implements incremental, standard, and gradient temporal-difference learning algorithms in Reinforcement Learning. It is an optimized library for robotic applications and embedded devices that operates under fast duty cycles (e.g., < 30 ms). RLLib has been tested and evaluated on RoboCup 3D soccer simulation agents, physical NAO V4 humanoid robots, and Tiva C series launchpad microcontrollers to predict, control, learn behaviors, and represent learnable knowledge. The implementation of the RLLib library is inspired by the RLPark API, which is a library of temporal-difference learning algorithms written in Java.

Features

  • Off-policy prediction algorithms:
  • GTD(λ)
  • GTD(λ)True
  • GQ(λ)
  • Off-policy control algorithms:
  • Q(λ)
  • Greedy-GQ(λ)
  • Softmax-GQ(λ)
  • Off-PAC (can be used in on-policy setting)
  • On-policy algorithms:
  • TD(λ)
  • TD(λ)AlphaBound
  • TD(λ)True
  • Sarsa(λ)
  • Sarsa(λ)AlphaBound
  • Sarsa(λ)True
  • Sarsa(λ)Expected
  • Actor-Critic (continuous actions, discrete actions, discounted reward settting, averaged reward settings, and so on)
  • Supervised learning algorithms:
  • Adaline
  • IDBD
  • K1
  • SemiLinearIDBD
  • Autostep
  • Policies: Random RandomX%Bias Greedy Epsilon-greedy Boltzmann Normal Softmax
  • Dot product: An efficient implementation of the dot product for tile coding based feature representations (with culling traces).
  • Benchmarking environments: Mountain Car Mountain Car 3D Swinging Pendulum Continuous Grid World Bicycle Cart Pole Acrobot Non-Markov Pole Balancing Helicopter
  • Optimization: Optimized for very fast duty cycles (e.g., with culling traces, RLLib has been tested on the Robocup 3D simulator agent, and on the NAO V4 (cognition thread)).
  • Usage: The algorithm usage is very much similar to RLPark, therefore, swift learning curve.
  • Examples: There are a plethora of examples demonstrating on-policy and off-policy control experiments.
  • Visualization: We provide a Qt4 based application to visualize benchmark problems.

Build

Build Status

New: OpenAI Gym Binding

Open AI Gym is a toolkit for developing and comparing reinforcement learning algorithms. We have developed a bridge between Gym and RLLib to use all the functionalities provided by Gym, while writing the agents (on/off-policy) in RLLib. The directory, openai_gym, contains our bridge as well as RLLib agents that learn and control the classic control environments.

Extension

  • Extension for Tiva C Series EK-TM4C123GXL LaunchPad, and Tiva C Series TM4C129 Connected LaunchPad microcontrollers.

  • Tiva C series launchpad microcontrollers: https://github.com/samindaa/csc688

Demo

Off-PAC ContinuousGridworld AverageRewardActorCritic SwingPendulum (Continuous Actions)

Usage

RLLib is a C++ template library. The header files are located in the include directly. You can simply include/add this directory from your projects, e.g., -I./include, to access the algorithms.

To access the control algorithms:

#include "ControlAlgorithm.h"

To access the predication algorithms:

#include "PredictorAlgorithm"

To access the supervised learning algorithms:

#include "SupervisedAlgorithm.h"

RLLib uses the namespace:

using namespace RLLib

Testing

RLLib provides a flexible testing framework. Follow these steps to quickly write a test case.

  • To access the testing framework: #include "HeaderTest.h"
#include "HeaderTest.h"

RLLIB_TEST(YourTest)

class YourTest Test: public YourTestBase
{
  public:
    YourTestTest() {}

    virtual ~Test() {}
    void run();

  private:
    void testYourMethod();
};

void YourTestBase::testYourMethod() {/** Your test code */}

void YourTestBase::run() { testYourMethod(); }
  • Add YourTest to the test/test.cfg file.
  • You can use @YourTest to execute only YourTest. For example, if you need to execute only MountainCar test cases, use @MountainCarTest.

Test Configuration

We are using CMAKE >= 2.8.7 to build and run the test suite.

  • mkdir build
  • cd build; cmake ..
  • make -j

Visualization

RLLib provides a QT5 based Reinforcement Learning problems and algorithms visualization tool named RLLibViz. Currently RLLibViz visualizes following problems and algorithms:

  • On-policy:

    • SwingPendulum problem with continuous actions. We use AverageRewardActorCritic algorithm.
  • Off-policy:

    • ContinuousGridworld and MountainCar problems with discrete actions. We use Off-PAC algorithm.
  • In order to run the visualization tool, you need to have QT4.8 installed in your system.

  • In order to install RLLibViz:

    • Change directory to visualization/RLLibViz
    • qmake RLLibViz.pro
    • make -j
    • ./RLLibViz

Documentation

Operating Systems

  • Ubuntu >= 11.04
  • Windows (Visual Studio 2013)
  • Mac OS X

TODO

  • Variable action per state.
  • Non-linear algorithms.
  • Deep learning algorithms.

Publications

Contact

Saminda Abeyruwan, PhD ([email protected], [email protected])

rllib's People

Contributors

samindaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rllib's Issues

Compiling error

I met a compline error on ubuntu 12.10

$make
g++ -I. -I./src -Wall -Werror -O3 simulation/Main.cpp -o Main
g++ -I. -I./src -Wall -Werror -O3 test/VectorTest.cpp -o VectorTest
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testActiveIndices()’:
test/VectorTest.cpp:85:41: error: taking address of temporary array
test/VectorTest.cpp:88:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testSparseVectorSet()’:
test/VectorTest.cpp:114:36: error: taking address of temporary array
test/VectorTest.cpp:118:36: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testSetEntry()’:
test/VectorTest.cpp:143:41: error: taking address of temporary array
test/VectorTest.cpp:149:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testPlus()’:
test/VectorTest.cpp:175:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testMinus()’:
test/VectorTest.cpp:185:44: error: taking address of temporary array
test/VectorTest.cpp:188:43: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testMapTimes()’:
test/VectorTest.cpp:200:43: error: taking address of temporary array
test/VectorTest.cpp:203:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testMaxNorm()’:
test/VectorTest.cpp:212:43: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testEbeMultiply()’:
test/VectorTest.cpp:272:25: error: taking address of temporary array
test/VectorTest.cpp:275:26: error: taking address of temporary array
test/VectorTest.cpp:279:27: error: taking address of temporary array
make: *** [VectorTest] Error 1

$ g++ --version
g++ (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

License?

What License is this under? May I use it for a personal project?

ActorLambda::initialize calls Actor::initialize with an extra parameter

I am trying to wrap this library using SWIG. However, the compiler complains that Base, which is of type Actor (in file ControlAlgorithm.h) only provides a method initialize() with no parameters, while method ActorLambda::initialize calls it passing an argument.
These are lines 736 and 748--752 of file ControlAlgorithm.h

      typedef Actor<T> Base;
      void initialize(const Vector<T>* x)
      {
        Base::initialize(x);
        e->clear();
      }

and this the output of my compiler:

In file included from RLLib_/Function.h:25:0,
                 from PyRLLib_wrap.cxx:3163:
RLLib_/Vector.h: In instantiation of ‘RLLib::Vectors_<T>::Vectors_(const RLLib::Vectors_<T>&) [with T = double]’:
PyRLLib_wrap.cxx:8792:111:   required from here
RLLib_/Vector.h:1173:63: error: conversion from ‘RLLib::Vectors_<double>::const_iterator {aka __gnu_cxx::__normal_iterator<RLLib::Vector_<double>* const*, std::vector<RLLib::Vector_<double>*, std::allocator<RLLib::Vector_<double>*> > >}’ to non-scalar type ‘RLLib::Vectors_<double>::iterator {aka __gnu_cxx::__normal_iterator<RLLib::Vector_<double>**, std::vector<RLLib::Vector_<double>*, std::allocator<RLLib::Vector_<double>*> > >}’ requested
         for (typename Vectors_<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
                                                               ^
In file included from PyRLLib_wrap.cxx:3172:0:
RLLib_/ControlAlgorithm.h: In instantiation of ‘void RLLib::ActorLambda_<T>::initialize(const RLLib::Vector_<T>*) [with T = double]’:
PyRLLib_wrap.cxx:28151:60:   required from here
RLLib_/ControlAlgorithm.h:750:27: error: no matching function for call to ‘RLLib::ActorLambda_<double>::initialize(const RLLib::Vector_<double>*&)’
         Base::initialize(x);
                           ^
RLLib_/ControlAlgorithm.h:750:27: note: candidate is:
RLLib_/ControlAlgorithm.h:691:12: note: void RLLib::Actor_<T>::initialize() [with T = double]
       void initialize()
            ^
RLLib_/ControlAlgorithm.h:691:12: note:   candidate expects 0 arguments, 1 provided

Broken multi dimensional continuous actions

maybe I just have don't get it, ---could I get some help?

RLLib::RLProblem<T>

has: Base::discreteActions && Base::continuousActions

almost all models uses Base::discreteActions,

there is Helicopter.h that uses Base::continuousActions but is incomplete.


I've been trying to model a problem with two continuous actions,


I've try this:
Base::continuousActions->push_back(0, 0.0); Base::continuousActions->push_back(1, 0.0);
this would create something like:[[0],[0]]
but this crashes the distribution in ASSERT((phi->dimension() == 1) && (actions->dimension() == 1));

based on Helicopter.h I've also try:
Base::continuousActions->push_back(0, 0.0); Base::continuousActions->push_back(0, 0.0);
this would create something like:[[0, 0]]

here the code runs, but the distribution ignores the second dimension and uses the first always: actions->getEntry(defaultAction);


before I attempt to modify the distributions, it would be a blessing if someone knows how may I approach to solve a continuous two dimensional action space RLLib::RLProblem<T>.

Some constructors and operator= methods use iterators instead of const_iterators

Class Vectors constructors expects a const Vectors&, but the iterator in the loop is declared Vectors::iterator, and my compiler complains. However, if I change it to Vectors::const_iterator, the complaint vanishes.
These are lines 1171--1175 of the constructor at hand of ControlAlgorithm.h:

      Vectors(const Vectors<T>& that)
      {
        for (typename Vectors<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
          vectors.push_back(*iter);
      }

I believe this also affects Vectors::operator=, though I am not sure because SWIG (which I am using to build a Python interface to this library) does not expose any operator= method to Python (because Python does not have any equivalent semantic). These are lines 1177--1186 defining such method:

      Vectors_<T>& operator=(const Vectors_<T>& that)
      {
        if (this != that)
        {
          vectors.clear();
          for (typename Vectors_<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
            vectors.push_back(*iter);
        }
        return *this;
      }

Similarly, the same problem affects Ranges constructor and possible Ranges::operator=. Following are lines 328--343 of Mathema.h:

      Ranges_(const Ranges_<T>& that)
      {
        for (typename Ranges_<T>::const_iterator iter = that.begin(); iter != that.end(); ++iter)
          ranges.push_back(*iter);
      }

      Ranges_<T>& operator=(const Ranges_<T>& that)
      {
        if (this != that)
        {
          ranges.clear();
          for (typename Ranges_<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
            ranges.push_back(*iter);
        }
        return *this;
      }

It is possible that other operator= functions are affected by this problem, but my compiler is not complaining because SWIG is not exposing any operator= for the same reason as above.

'M_PI' was not declared

OS: Windows 7
Compiler: Mingw32

d:\RLLib-master>mingw32-make
[  2%] Building CXX object CMakeFiles/RLLib.dir/test/AcrobotTest.cpp.obj
D:\RLLib-master\test\AcrobotTest.cpp:1:0: warning: -fPIC ignored for target (all
 code is position independent)
 /*
 ^
In file included from D:/RLLib-master/include/RL.h:31:0,
                 from D:\RLLib-master\test\Test.h:43,
                 from D:\RLLib-master\test\AcrobotTest.h:11,
                 from D:\RLLib-master\test\AcrobotTest.cpp:8:
D:/RLLib-master/include/Mathema.h: In member function 'T RLLib::Random<T>::gauss
ianProbability(const T&, const T&, const T&) const':
D:/RLLib-master/include/Mathema.h:212:68: error: 'M_PI' was not declared in this
 scope
         return exp(-0.5f * pow((x - m) / s, 2)) / (s * sqrt(2.0f * M_PI));
                                                                    ^

DQN

Can RLlib have an demo of DQN?
If we want to separate training and running, how should we do?
( training on cloud and return parameters to local environment to control running)

How to test a policy on unseen test samples

Hello!

I'm new in Reinforcement Learning and I studied RLLib User Guide and well as examples included into RLLib.
All learning examples are ended like that:

Simulator* sim = new Simulator(agent, problem, 5000, 100, 10);
sim->setTestEpisodesAfterEachRun(true);
sim->run();
sim->computeValueFunction();

I had some experience of using Supervised learning functionality. In most cases we construct a model and train it on Train samples and then test on Testing samples. Test and Train samples are different. And this is done to estimate how well the model behave on unseen data.

Could someone advise how RLLib based code should be structured to train a policy on train data and then to feed one by one test samples and estimate the policy on unseen samples.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.