Giter Club home page Giter Club logo

memrl's Introduction

Memory-Assisted Reinforcement Learning

By Annik Carson

Basic

-- Project with Kyle Nealy Fall2020 Writing workflow to run experiments with Unity/OpenMaze environments.

To do:

  • write tests for environments (make sure they all behave the same)
  • gridworld environment to openai standards
  • representation learning as function of agent object
  • arbitration between MF and EC action selection as function of agent object
  • saving results of experiments / logging parameters

File Structure

basic
│   README.md    
│
└───Agents
|   |    Define class which is basic operator in Environment
|   |    Agent basic functionality is to take in observation from Environment and produce action
|   |    Standard agent contains:
|   |       - Model-Free Control Network (basic functional unit to produce action from state information)
|   |       - Episodic Control Module (optional) - produce action from state information
|   |       - Representation Learning Module - produce state representation from Env observation
|   |       - Transition Cache - basic storage unit for keeping track of encountered states
│   └─── Networks
│       │   - Network objects to be used by Agent class for model-free control
│   └─── EpisodicMemory
│       │   - Episodic Memory object to be used by Agent class for episodic control
│   └─── RepresentationLearning [** Unfinished **]
│       │   - Network to learn state representations from observations provided by Environment
│   └─── TransitionCache
|       |   - Structural object to store state experiences
|
└───Envs
|   └─── Gridworld (python as openai-like environment)   
|   └─── Unity Environments (to be used with openai gym wrapper)
|   |   └─── Windows
|   |   └─── Linux
|
└───Experiment
|   |   Define class for standard experiment run 
|   |   Stores data from runs
|   |   Save elements to appropriate output with unique run ID 
|
└──Tests
|   |   Run tests to make sure each element works as expected
|
└───Utils
|   |   Basic functions to be used across packages
|   |   Plotting functions
|
└───Data
|   |   For data storage -- currently unused
|
└───Analysis
|   |   Functions for analyzing collected data -- currently unused


 

Experiments

Top level to interface with. Collects and logs data from trial runs including total reward and loss. Currently data saved to csv using a unique ID which all results share (i.e. learned network weights, dictionary containing results of trial, etc) CSV file stores unique id along with all parameters used in experiment.

Arguments: agent object, environment object

Returns: Data collected over experiment run

Environment

Working on getting all environments to work as openai gym environments. Currently using unity environments with openai-gym wrapper and a gridworld environment written in python in the style of openai environments.

All environments must have functions to reset at start of trial and to take a step at each event.

The central function of Environment is step(). The step() function takes action information and produces next state, reward, information about task completion ('done'), and additional information for debugging if necessary.

Agent

Learner in environment. Model-free control via a network object (see Agents/Networks/) takes state information and produces a policy and value estimate. State information can be either raw observations from the environment or representations learned by the representationlearning module of the agent

Episodic control via a episodicmemory object (see Agents/EpisodicMemory/) takes state information and queries a dictionary of saved states and experienced returns. State information should be the same as that passed to the model free controller (i.e. either raw observation from environment or learned representation). This module is optional.

To Do:

  • Representation learning module -- a separate network trained to learn a latent representation of state observations in a goal-independent manner (i.e. prior to training the agent's model-free network on a reward task). Learned representations (rather than environment observations) will be used by both model-free and episodic modules for producing behaviour.

  • Arbitration module to control action selection between model-free control and episodic control

Networks

Define neural networks to learn policy and value functions from state information. Neural network styles (CNN, fully connected) are defined as separate files in Agents/Networks/.

To Do:

  • Write script to generate network based on size/shape/type of environment observation or learned representation (depending on what will be used by agent)

Episodic Control

pre- Oct 2020. Experiments using gridworld environment and basic episodic control.

Environments

Gridworld or OpenAI gym environments which create the tasks to be solved by the RL network

RL Network

Standard RL architecture we develop is an actorcritic network. Can also use Q-learning, etc.

Memory

Episodic caching system used to assist the RL network

Sensory

Networks used to create efficient representations of incoming state information. Can be used to supplement the RL network. These may be modified autoencoders, etc.

Notebooks

Jupyter notebooks used for running code

Data

Storage of data from runs for later analysis

Example Code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.