samindaa / rllib Goto Github PK

C++ Template Library to Predict, Control, Learn Behaviors, and Represent Learnable Knowledge using On/Off Policy Reinforcement Learning

Home Page: http://web.cs.miami.edu/home/saminda/rllib.html

MATLAB 0.18% C++ 95.61% C 3.72% CMake 0.17% Scilab 0.17% QMake 0.04% Python 0.10%

rllib's Introduction

RLLib

(C++ Template Library to Predict, Control, Learn Behaviors, and Represent Learnable Knowledge using On/Off Policy Reinforcement Learning)

RLLib is a lightweight C++ template library that implements incremental, standard, and gradient temporal-difference learning algorithms in Reinforcement Learning. It is an optimized library for robotic applications and embedded devices that operates under fast duty cycles (e.g., < 30 ms). RLLib has been tested and evaluated on RoboCup 3D soccer simulation agents, physical NAO V4 humanoid robots, and Tiva C series launchpad microcontrollers to predict, control, learn behaviors, and represent learnable knowledge. The implementation of the RLLib library is inspired by the RLPark API, which is a library of temporal-difference learning algorithms written in Java.

Features

Off-policy prediction algorithms:
GTD(λ)
GTD(λ)True
GQ(λ)
Off-policy control algorithms:
Q(λ)
Greedy-GQ(λ)
Softmax-GQ(λ)
Off-PAC (can be used in on-policy setting)
On-policy algorithms:
TD(λ)
TD(λ)AlphaBound
TD(λ)True
Sarsa(λ)
Sarsa(λ)AlphaBound
Sarsa(λ)True
Sarsa(λ)Expected
Actor-Critic (continuous actions, discrete actions, discounted reward settting, averaged reward settings, and so on)
Supervised learning algorithms:
Adaline
IDBD
K1
SemiLinearIDBD
Autostep
Policies: Random RandomX%Bias Greedy Epsilon-greedy Boltzmann Normal Softmax
Dot product: An efficient implementation of the dot product for tile coding based feature representations (with culling traces).
Benchmarking environments: Mountain Car Mountain Car 3D Swinging Pendulum Continuous Grid World Bicycle Cart Pole Acrobot Non-Markov Pole Balancing Helicopter
Optimization: Optimized for very fast duty cycles (e.g., with culling traces, RLLib has been tested on the Robocup 3D simulator agent, and on the NAO V4 (cognition thread)).
Usage: The algorithm usage is very much similar to RLPark, therefore, swift learning curve.
Examples: There are a plethora of examples demonstrating on-policy and off-policy control experiments.
Visualization: We provide a Qt4 based application to visualize benchmark problems.

Build

New: OpenAI Gym Binding

Open AI Gym is a toolkit for developing and comparing reinforcement learning algorithms. We have developed a bridge between Gym and RLLib to use all the functionalities provided by Gym, while writing the agents (on/off-policy) in RLLib. The directory, openai_gym, contains our bridge as well as RLLib agents that learn and control the classic control environments.

Extension

Extension for Tiva C Series EK-TM4C123GXL LaunchPad, and Tiva C Series TM4C129 Connected LaunchPad microcontrollers.
Tiva C series launchpad microcontrollers: https://github.com/samindaa/csc688

Demo

Usage

RLLib is a C++ template library. The header files are located in the include directly. You can simply include/add this directory from your projects, e.g., -I./include, to access the algorithms.

To access the control algorithms:

#include "ControlAlgorithm.h"

To access the predication algorithms:

#include "PredictorAlgorithm"

To access the supervised learning algorithms:

#include "SupervisedAlgorithm.h"

RLLib uses the namespace:

using namespace RLLib

Testing

RLLib provides a flexible testing framework. Follow these steps to quickly write a test case.

To access the testing framework: #include "HeaderTest.h"

#include "HeaderTest.h"

RLLIB_TEST(YourTest)

class YourTest Test: public YourTestBase
{
  public:
    YourTestTest() {}

    virtual ~Test() {}
    void run();

  private:
    void testYourMethod();
};

void YourTestBase::testYourMethod() {/** Your test code */}

void YourTestBase::run() { testYourMethod(); }

Add YourTest to the test/test.cfg file.
You can use @YourTest to execute only YourTest. For example, if you need to execute only MountainCar test cases, use @MountainCarTest.

Test Configuration

We are using CMAKE >= 2.8.7 to build and run the test suite.

mkdir build
cd build; cmake ..
make -j

Visualization

RLLib provides a QT5 based Reinforcement Learning problems and algorithms visualization tool named RLLibViz. Currently RLLibViz visualizes following problems and algorithms:

On-policy:
- SwingPendulum problem with continuous actions. We use AverageRewardActorCritic algorithm.
Off-policy:
- ContinuousGridworld and MountainCar problems with discrete actions. We use Off-PAC algorithm.
In order to run the visualization tool, you need to have QT4.8 installed in your system.
In order to install RLLibViz:
- Change directory to visualization/RLLibViz
- qmake RLLibViz.pro
- make -j
- ./RLLibViz

Documentation

Operating Systems

Ubuntu >= 11.04
Windows (Visual Studio 2013)
Mac OS X

TODO

Variable action per state.
Non-linear algorithms.
Deep learning algorithms.

Publications

Contact

Saminda Abeyruwan, PhD ([email protected], [email protected])

rllib's People

Contributors

Stargazers

Watchers

rllib's Issues

./RLLibViz This application failed to start because it could not find or load the Qt platform plugin "xcb".

There are no dependency issues for libqxcb.so.

According to this post, I tried the following:

QT_QPA_PLATFORM=''./RLLibViz, the error does not appear again. But I cannot see any visualiser.

Compiling error

I met a compline error on ubuntu 12.10

$make
g++ -I. -I./src -Wall -Werror -O3 simulation/Main.cpp -o Main
g++ -I. -I./src -Wall -Werror -O3 test/VectorTest.cpp -o VectorTest
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testActiveIndices()’:
test/VectorTest.cpp:85:41: error: taking address of temporary array
test/VectorTest.cpp:88:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testSparseVectorSet()’:
test/VectorTest.cpp:114:36: error: taking address of temporary array
test/VectorTest.cpp:118:36: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testSetEntry()’:
test/VectorTest.cpp:143:41: error: taking address of temporary array
test/VectorTest.cpp:149:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testPlus()’:
test/VectorTest.cpp:175:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testMinus()’:
test/VectorTest.cpp:185:44: error: taking address of temporary array
test/VectorTest.cpp:188:43: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testMapTimes()’:
test/VectorTest.cpp:200:43: error: taking address of temporary array
test/VectorTest.cpp:203:41: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testMaxNorm()’:
test/VectorTest.cpp:212:43: error: taking address of temporary array
test/VectorTest.cpp: In member function ‘void SparseVectorTest::testEbeMultiply()’:
test/VectorTest.cpp:272:25: error: taking address of temporary array
test/VectorTest.cpp:275:26: error: taking address of temporary array
test/VectorTest.cpp:279:27: error: taking address of temporary array
make: *** [VectorTest] Error 1

$ g++ --version
g++ (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

License?

What License is this under? May I use it for a personal project?

Math.h: No such file or directory while I run "visualization/RLLibViz$ make -j"

ValueFunctionView.h:17:18: fatal error: Math.h: No such file or directory
#include "Math.h"
^
compilation terminated.
make: *** [ValueFunctionView.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [moc_ValueFunctionView.o] Error 1
make: *** [RLLibVizMediator.o] Error 1

Deprecated example code

In the example code provided in http://web.cs.miami.edu/home/saminda/rllib-v2x-user-guide.html the final execution of the learning process is done by a Simulator class:
Simulator<double>* sim = new Simulator<double>(agent, problem, 5000, 100, 10);
however, this class is not defined anywhere in the current package, yielding error:
error: ‘Simulator’ does not name a type

ActorLambda::initialize calls Actor::initialize with an extra parameter

I am trying to wrap this library using SWIG. However, the compiler complains that Base, which is of type Actor (in file ControlAlgorithm.h) only provides a method initialize() with no parameters, while method ActorLambda::initialize calls it passing an argument.
These are lines 736 and 748--752 of file ControlAlgorithm.h

      typedef Actor<T> Base;

      void initialize(const Vector<T>* x)
      {
        Base::initialize(x);
        e->clear();
      }

and this the output of my compiler:

In file included from RLLib_/Function.h:25:0,
                 from PyRLLib_wrap.cxx:3163:
RLLib_/Vector.h: In instantiation of ‘RLLib::Vectors_<T>::Vectors_(const RLLib::Vectors_<T>&) [with T = double]’:
PyRLLib_wrap.cxx:8792:111:   required from here
RLLib_/Vector.h:1173:63: error: conversion from ‘RLLib::Vectors_<double>::const_iterator {aka __gnu_cxx::__normal_iterator<RLLib::Vector_<double>* const*, std::vector<RLLib::Vector_<double>*, std::allocator<RLLib::Vector_<double>*> > >}’ to non-scalar type ‘RLLib::Vectors_<double>::iterator {aka __gnu_cxx::__normal_iterator<RLLib::Vector_<double>**, std::vector<RLLib::Vector_<double>*, std::allocator<RLLib::Vector_<double>*> > >}’ requested
         for (typename Vectors_<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
                                                               ^
In file included from PyRLLib_wrap.cxx:3172:0:
RLLib_/ControlAlgorithm.h: In instantiation of ‘void RLLib::ActorLambda_<T>::initialize(const RLLib::Vector_<T>*) [with T = double]’:
PyRLLib_wrap.cxx:28151:60:   required from here
RLLib_/ControlAlgorithm.h:750:27: error: no matching function for call to ‘RLLib::ActorLambda_<double>::initialize(const RLLib::Vector_<double>*&)’
         Base::initialize(x);
                           ^
RLLib_/ControlAlgorithm.h:750:27: note: candidate is:
RLLib_/ControlAlgorithm.h:691:12: note: void RLLib::Actor_<T>::initialize() [with T = double]
       void initialize()
            ^
RLLib_/ControlAlgorithm.h:691:12: note:   candidate expects 0 arguments, 1 provided

Why can't I use GPU for acceleration

Logical resource usage: 5.0/72 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)

Subclass Dynammic of simulation/UnderwaterVehicle.h misspelled

Subclass Dynammic of simulation/UnderwaterVehicle.h is misspelled, it should have only one "m".

Broken multi dimensional continuous actions

maybe I just have don't get it, ---could I get some help?

RLLib::RLProblem<T>

has: Base::discreteActions && Base::continuousActions

almost all models uses Base::discreteActions,

there is Helicopter.h that uses Base::continuousActions but is incomplete.

I've been trying to model a problem with two continuous actions,

I've try this:
Base::continuousActions->push_back(0, 0.0); Base::continuousActions->push_back(1, 0.0);
this would create something like:[[0],[0]]
but this crashes the distribution in ASSERT((phi->dimension() == 1) && (actions->dimension() == 1));

based on Helicopter.h I've also try:
Base::continuousActions->push_back(0, 0.0); Base::continuousActions->push_back(0, 0.0);
this would create something like:[[0, 0]]

here the code runs, but the distribution ignores the second dimension and uses the first always: actions->getEntry(defaultAction);

before I attempt to modify the distributions, it would be a blessing if someone knows how may I approach to solve a continuous two dimensional action space RLLib::RLProblem<T>.

Some constructors and operator= methods use iterators instead of const_iterators

Class Vectors constructors expects a const Vectors&, but the iterator in the loop is declared Vectors::iterator, and my compiler complains. However, if I change it to Vectors::const_iterator, the complaint vanishes.
These are lines 1171--1175 of the constructor at hand of ControlAlgorithm.h:

      Vectors(const Vectors<T>& that)
      {
        for (typename Vectors<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
          vectors.push_back(*iter);
      }

I believe this also affects Vectors::operator=, though I am not sure because SWIG (which I am using to build a Python interface to this library) does not expose any operator= method to Python (because Python does not have any equivalent semantic). These are lines 1177--1186 defining such method:

      Vectors_<T>& operator=(const Vectors_<T>& that)
      {
        if (this != that)
        {
          vectors.clear();
          for (typename Vectors_<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
            vectors.push_back(*iter);
        }
        return *this;
      }

Similarly, the same problem affects Ranges constructor and possible Ranges::operator=. Following are lines 328--343 of Mathema.h:

      Ranges_(const Ranges_<T>& that)
      {
        for (typename Ranges_<T>::const_iterator iter = that.begin(); iter != that.end(); ++iter)
          ranges.push_back(*iter);
      }

      Ranges_<T>& operator=(const Ranges_<T>& that)
      {
        if (this != that)
        {
          ranges.clear();
          for (typename Ranges_<T>::iterator iter = that.begin(); iter != that.end(); ++iter)
            ranges.push_back(*iter);
        }
        return *this;
      }

It is possible that other operator= functions are affected by this problem, but my compiler is not complaining because SWIG is not exposing any operator= for the same reason as above.

'M_PI' was not declared

OS: Windows 7
Compiler: Mingw32

d:\RLLib-master>mingw32-make
[  2%] Building CXX object CMakeFiles/RLLib.dir/test/AcrobotTest.cpp.obj
D:\RLLib-master\test\AcrobotTest.cpp:1:0: warning: -fPIC ignored for target (all
 code is position independent)
 /*
 ^
In file included from D:/RLLib-master/include/RL.h:31:0,
                 from D:\RLLib-master\test\Test.h:43,
                 from D:\RLLib-master\test\AcrobotTest.h:11,
                 from D:\RLLib-master\test\AcrobotTest.cpp:8:
D:/RLLib-master/include/Mathema.h: In member function 'T RLLib::Random<T>::gauss
ianProbability(const T&, const T&, const T&) const':
D:/RLLib-master/include/Mathema.h:212:68: error: 'M_PI' was not declared in this
 scope
         return exp(-0.5f * pow((x - m) / s, 2)) / (s * sqrt(2.0f * M_PI));
                                                                    ^

DQN

Can RLlib have an demo of DQN?
If we want to separate training and running, how should we do?
( training on cloud and return parameters to local environment to control running)

How to test a policy on unseen test samples

Hello!

I'm new in Reinforcement Learning and I studied RLLib User Guide and well as examples included into RLLib.
All learning examples are ended like that:

Simulator* sim = new Simulator(agent, problem, 5000, 100, 10);
sim->setTestEpisodesAfterEachRun(true);
sim->run();
sim->computeValueFunction();

I had some experience of using Supervised learning functionality. In most cases we construct a model and train it on Train samples and then test on Testing samples. Test and Train samples are different. And this is done to estimate how well the model behave on unseen data.

Could someone advise how RLLib based code should be structured to train a policy on train data and then to feed one by one test samples and estimate the policy on unseen samples.

Thanks

Class ColisionDetection is misspelled

As per title, class ColisionDetection should be renamed CollisionDetection (double l). This class is in Hashing.h.