Giter Club home page Giter Club logo

robustrl's Introduction

Robust Policy Search Algorithms

This repo has robust (to model parameters) variants for policy search algorithms. Current implimentations primarily look at episodic tasks and with emphasis on batch policy optimization using various forms of policy gradients.

This project builds on top of OpenAI gym and rllab. You need to set up those first before proceeding. The code structure is as follows:

Things that have been implimented for you:

  • Environments: Any environment file can be used which follows the structure of OpenAI gym. For MuJoCo tasks, you need to install mujoco-py which is shipped along with rllab. We have developed some environments which draw model parameters in the first step of stochastic MDP. See examples and read paper for more information on how to use them.
  • Policy: For continuous control tasks, GaussianMLPPolicy can be chosen which is a neural network policy with gaussian exploration noise. For discrete action space, CategoricalMLPPolicy can be used. Both of these implimentations come from rllab.
  • Baselines: LinearFeatureBaseline from rllab is tested and works correctly with this project.
  • Samplers: Various samplers (both serial and parallel) for both a fixed environment as well as distributions of environments have been implimented by us.
  • Algorithms: Currently, we have TRPO and REINFORCE. These have been implimented by us by modifying the basic structure provided by rllab.
  • Evaluations: Functions for efficiently evaluating the performance of a policy on a given MDP or distribution of MDPs.

Things you need to impliment:

  • First copy the gym_env.py file from base code folder to /path/to/rllab/rllab/envs/ and replace existing file
  • Include the robustRL package into your pythonpath. PYTHONPATH="/path/to/robustRL:$PYTHONPATH" or change the ~/.bashrc file.
  • Ideally, you shouldn't have to touch any file other than job_data.txt and MDP_funcs.py
  • Inside MDP_funcs.py, you need to write a function to generate the environment of your choice. Make sure this is compatible with OpenAI gym and that you have registered this environment with the gym modules. Also remember to add a function call within the generate_environment function in MDP_funcs.py
  • Note: I recommend that you don't use a GPU for this, unless training convolutional layers. Modify the .theanorc file in your home directory to remove the GPU device set by default. Also uncomment the theano.sandbox.cuda.unuse() command in algos.py if you get a CUDA error.

Have a look at the example codes to get an idea of how the different functionality can be integrated for training. If you want to use just the training function without the wrappers, you can do this easily with just a for loop.

Parallel sampling

Theano doesn't behave well with multiprocessing modules. If you run python job_script.py, you should see a bunch of worker processes starting up and then finishing. Running it in the background using nohup sometimes affects the writing to nohup.out and you may not see all processes starting up together, but start and end will happen alternatively. If this happens, see if multiple processes are spawned using htop command in terminal.

To be included:

  • DDPG algorithm of Lillicrap et al.
  • Source domain adaptation using a Bayesian approach
  • Bayesian state estimators (eg particle filters) for POMDP tasks

robustrl's People

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.