Giter Club home page Giter Club logo

rc-nfq's Introduction

RC-NFQ: Regularized Convolutional Neural Fitted Q Iteration

A batch algorithm for deep reinforcement learning. Incorporates dropout regularization and convolutional neural networks with a separate target Q network.

Follow @cosmoharrigan on Twitter

This algorithm extends the following techniques:

  • Riedmiller, Martin. "Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method." Machine Learning: ECML 2005. Springer Berlin Heidelberg, 2005. 317-328.

  • Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.

  • Lin, Long-Ji. "Self-improving reactive agents based on reinforcement learning, planning and teaching." Machine learning 8.3-4 (1992): 293-321.

Project Status: This project is still a work in progress and is not finished.

Overview

Creating an instance of the RC-NFQ algorithm

The NFQ class creates an instance of the RC-NFQ algorithm for a particular agent and environment.

Parameters

  • state_dim - The state dimensionality. An integer if convolutional = False, a 2D tuple otherwise.
  • nb_actions - The number of possible actions
  • terminal_states - The integer indices of the terminal states
  • convolutional - Boolean. When True, uses convolutional neural networks and dropout regularization. Otherwise, uses a simple MLP.
  • mlp_layers - A list consisting of an integer number of neurons for each hidden layer. Default = [20, 20]. For convolutional = False.
  • discount_factor - The discount factor for Q-learning.
  • separate_target_network - boolean - If True, then it will use a separate Q-network for computing the targets for the Q-learning updates, and the target network will be updated with the parameters of the main Q-network every target_network_update_freq iterations.
  • target_network_update_freq - The frequency at which to update the target network.
  • lr - The learning rate for the RMSprop gradient descent algorithm.
  • max_iters - The maximum number of iterations that will be performed. Used to allocate memory for NumPy arrays. Default = 20000.
  • max_q_predicted - The maximum number of Q-values that will be predicted. Used to allocate memory for NumPy arrays. Default = 100000.

Fitting the Q network

The NFQ class has a fit_vectorized method, which is used to run an iteration of the RC-NFQ algorithm and update the Q function. The implementation is vectorized for improved performance.

The function requires a set of interactions with the environment. They consist of experience tuples of the form (s, a, r, s_prime), stored in 4 parallel arrays.

Parameters

  • D_s - A list of states s for each experience tuple
  • D_a - A list of actions a for each experience tuple
  • D_r - A list of rewards r for each experience tuple
  • D_s_prime - A list of states s_prime for each experience tuple
  • num_iters - The number of epochs to run per batch. Default = 1.
  • shuffle - Whether to shuffle the data before training. Default = False.
  • nb_samples - If specified, uses nb_samples samples from the experience tuples selected without replacement. Otherwise, all eligible samples are used.
  • sliding_window - If specified, only the last nb_samples samples will be eligible for use. Otherwise, all samples are eligible.
  • full_batch_sgd - Boolean. Determines whether RMSprop will use full-batch or mini-batch updating. Default = False.
  • validation - Boolean. If True, a validation set will be used consisting of the last 10% of the experience tuples, and the validation loss will be monitored. Default = True.

Setting up an experiment

An experiment consists of an Experiment definition and an Environment definition. These need to be configured in the api_vision.py webserver.

The webserver exposes a REST resource used for communicating with the robot. An implementation of a client for a customized LEGO Mindstorms EV3 robot is provided in client_vision.py.

Streaming video is sent by the robot. An implementation for a customized LEGO Mindstorms EV3 robot is provided in rapid_streaming_zmq.py. The streaming video is then received by the server using receive_video_zmq.py. The video stream can be monitored using show_video_zmq.py.

Citation

@misc{rcnfq,
  author = {Harrigan, Cosmo},
  title = {RC-NFQ: Regularized Convolutional Neural Fitted Q Iteration},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cosmoharrigan/rc-nfq}}
}

rc-nfq's People

Contributors

cosmoharrigan avatar

Stargazers

 avatar  avatar Krishnan Srinivasan avatar semi avatar tinyzqh avatar Kun Shao avatar  avatar Brian Stanback avatar Roman Pearah avatar sile avatar  avatar Ruotian(RT) Luo avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.