andnp / replaytables Goto Github PK

Python 88.28% Shell 0.38% Rust 11.34%

replaytables's Introduction

ReplayTables

Getting started

Installation:

pip install ReplayTables-andnp

Basic usage:

from typing import NamedTuple
from ReplayTables.ReplayBuffer import ReplayBuffer

class Data(NamedTuple):
    x: np.ndarray
    a: np.ndarray
    r: np.ndarray

buffer = ReplayBuffer(
    max_size=100_000,
    structure=Data,
    rng=np.random.default_rng(0),
)

buffer.add(Data(x, a, r))

batch = buffer.sample(32)
print(batch.x.shape) # -> (32, d)
print(batch.a.shape) # -> (32, )
print(batch.r.shape) # -> (32, )

Prioritized Replay

An implementation of prioritized experience replay from

Schaul, Tom, et al. "Prioritized experience replay." ICLR (2016).

The defaults for this implementation strictly adhere to the defaults from the original work, though several configuration options are available.

from typing import NamedTuple
from ReplayTables.PER import PERConfig, PrioritizedReplay

class Data(NamedTuple):
    a: float
    b: float

# all configurables are optional.
config = PERConfig(
    # can also use "mean" mode to place new samples in the middle of the distribution
    # or "given" mode, which requires giving the priority when the sample is added
    new_priority_mode='max',
    # the sampling distribution is a mixture between uniform sampling and the priority
    # distribution. This specifies the weight given to the uniform sampler.
    # Setting to 1 reverts this back to an inefficient form of standard uniform replay.
    uniform_probability=1e-3,
    # this implementation assume priorities are positive. Can scale priorities by raising to
    # some power. Default is `priority**(1/2)`
    priority_exponent=0.5,
    # if `new_priority_mode` is 'max', then the buffer tracks the highest seen priority.
    # this can cause accidental saturation if outlier priorities are observed. This provides
    # an exponential decay of the max in order to prevent permanent saturation.
    max_decay=1,
)

# if no config is given, defaults to original PER parameters
buffer = PrioritizedReplay(
    max_size=100_000,
    structure=Data,
    rng=np.random.default_rng(0),
    config=config,
)

buffer.add(Data(a=1, b=2))

# if `new_priority_mode` is 'given':
buffer.add(Data(a=1, b=2), priority=1.3)

batch = buffer.sample(32)

replaytables's People

Contributors

Watchers

Forkers

panahiparham

replaytables's Issues

Make samples delete-able

There are instances where I wish to be able to delete a sample from the buffer. Because the sampler depends on unique identifiers being contiguous integers, this is a little tricky. The basic algorithm should look something like:

def delete(self, idx: int):
  last = self.storage[self.i]
  del self.storage[self.i]
  self.storage[idx] = last
  self.i -= 1

[feat] Add combined experience replay

Paper here: https://arxiv.org/pdf/1712.01275 Basic idea is to ensure that the latest sample is always included in the mini-batch.

Could simply add this as one top-level replay type; however, it would be nice to think about composition of replay ideas. This type of component composes nicely with a vast majority of other components.

Allow partial specification of rows

When adding a row to a Table, allow adding only a subset of columns. The important (and extremely common) use-case is bootstrapping methods which will specify:

t1 -> X, A, R
t2 -> X', gamma
t2 -> X, A, R
t3 -> X', gamma

where at time t2 we will complete the data for the row created at t1 and also add a new (incomplete) row for t2 which will be completed at time t3.

Unfortunately, this will require maintaining individual indices for every column which will harm our performance moderately and readability greatly.

Internal organization

We should separate replay buffer components into their own parts, making it easier to build alternative replay mechanisms. Obviously, we should be aware of performance degradation here as we build layers of abstraction; however this could allow for some new optimizations to offset this. For instance, the storage mechanism can be smarter about when to compress/decompress and when to do copies (like in the "table" approach) vs references (like in the dictionary approach) based on properties of the requested storage size. This also makes it easier to avoid duplicating state information. We can use this combine the lag-buffer with the replay-buffer into a single coherent mechanism as long a lag-buffer w.o. replay is a possible configuration.

I can think of three components currently:

A storage mechanism which handles storage, ingress, and egress (i.e. ejecting oldest samples)
A sampler
A filtration mechanism that allows rejecting samples before being passed to the storage. Possibly this could replace the storage's ingress entirely
Possibly, we should break egress into its own component. This is slightly trickier because we want to make strong guarantees to the storage mechanism about the maximum size.

class ReplayBuffer:
  def __init__(self, ...):
    self.memory: StorageType
    self.sampler: IndexSampler
    # maybe self.ingress is a better name. This can be an observer that other buffers can also subscribe to
    self.filter: Filtration

This is likely a breaking change. Ideally, we take advantage of the refactor to eliminate some superflous interfaces (view vs buffer) in favor of giving users direct access to each individual component. That is, I'd like to be able to do the following:

buffer1 = ReplayBuffer()
buffer2 = ReplayBuffer(memory=buffer1.memory)

or something similar in order to have two buffers referencing the same underlying memory. We need to have lifecycle hooks in order for all components to be informed of new samples (e.g. in order to update the sampler). Possibly this can be accomplished with an observer pattern.

Add github actions

Add mvp documentation

Performance testing

Should add some basic performance testing to the library. Now that I'm starting to think through code optimizations, it's important to ensure we are actually rigorously tracking performance. Need to think through how to do this in a consistent way, i.e. do github actions guarantee consistent hardware? Maybe we can use external executors like my home server, but then what happens when I upgrade that (also how to avoid abuse)?

andnp / replaytables Goto Github PK

replaytables's Introduction

ReplayTables

Getting started

Prioritized Replay

replaytables's People

Contributors

Watchers

Forkers

replaytables's Issues

Make samples delete-able

[feat] Add combined experience replay

Allow partial specification of rows

Internal organization

Add github actions

Add mvp documentation

Performance testing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent