Giter Club home page Giter Club logo

replaytables's Introduction

ReplayTables

Benchmarks

Getting started

Installation:

pip install ReplayTables-andnp

Basic usage:

from typing import NamedTuple
from ReplayTables.ReplayBuffer import ReplayBuffer

class Data(NamedTuple):
    x: np.ndarray
    a: np.ndarray
    r: np.ndarray

buffer = ReplayBuffer(
    max_size=100_000,
    structure=Data,
    rng=np.random.default_rng(0),
)

buffer.add(Data(x, a, r))

batch = buffer.sample(32)
print(batch.x.shape) # -> (32, d)
print(batch.a.shape) # -> (32, )
print(batch.r.shape) # -> (32, )

Prioritized Replay

An implementation of prioritized experience replay from

Schaul, Tom, et al. "Prioritized experience replay." ICLR (2016).

The defaults for this implementation strictly adhere to the defaults from the original work, though several configuration options are available.

from typing import NamedTuple
from ReplayTables.PER import PERConfig, PrioritizedReplay

class Data(NamedTuple):
    a: float
    b: float

# all configurables are optional.
config = PERConfig(
    # can also use "mean" mode to place new samples in the middle of the distribution
    # or "given" mode, which requires giving the priority when the sample is added
    new_priority_mode='max',
    # the sampling distribution is a mixture between uniform sampling and the priority
    # distribution. This specifies the weight given to the uniform sampler.
    # Setting to 1 reverts this back to an inefficient form of standard uniform replay.
    uniform_probability=1e-3,
    # this implementation assume priorities are positive. Can scale priorities by raising to
    # some power. Default is `priority**(1/2)`
    priority_exponent=0.5,
    # if `new_priority_mode` is 'max', then the buffer tracks the highest seen priority.
    # this can cause accidental saturation if outlier priorities are observed. This provides
    # an exponential decay of the max in order to prevent permanent saturation.
    max_decay=1,
)

# if no config is given, defaults to original PER parameters
buffer = PrioritizedReplay(
    max_size=100_000,
    structure=Data,
    rng=np.random.default_rng(0),
    config=config,
)

buffer.add(Data(a=1, b=2))

# if `new_priority_mode` is 'given':
buffer.add(Data(a=1, b=2), priority=1.3)

batch = buffer.sample(32)

replaytables's People

Contributors

andnp avatar dependabot[bot] avatar panahiparham avatar

Watchers

 avatar  avatar

Forkers

panahiparham

replaytables's Issues

Make samples delete-able

There are instances where I wish to be able to delete a sample from the buffer. Because the sampler depends on unique identifiers being contiguous integers, this is a little tricky. The basic algorithm should look something like:

def delete(self, idx: int):
  last = self.storage[self.i]
  del self.storage[self.i]
  self.storage[idx] = last
  self.i -= 1

[feat] Add combined experience replay

Paper here: https://arxiv.org/pdf/1712.01275 Basic idea is to ensure that the latest sample is always included in the mini-batch.

Could simply add this as one top-level replay type; however, it would be nice to think about composition of replay ideas. This type of component composes nicely with a vast majority of other components.

Allow partial specification of rows

When adding a row to a Table, allow adding only a subset of columns. The important (and extremely common) use-case is bootstrapping methods which will specify:

t1 -> X, A, R
t2 -> X', gamma
t2 -> X, A, R
t3 -> X', gamma

where at time t2 we will complete the data for the row created at t1 and also add a new (incomplete) row for t2 which will be completed at time t3.

Unfortunately, this will require maintaining individual indices for every column which will harm our performance moderately and readability greatly.

Internal organization

We should separate replay buffer components into their own parts, making it easier to build alternative replay mechanisms. Obviously, we should be aware of performance degradation here as we build layers of abstraction; however this could allow for some new optimizations to offset this. For instance, the storage mechanism can be smarter about when to compress/decompress and when to do copies (like in the "table" approach) vs references (like in the dictionary approach) based on properties of the requested storage size. This also makes it easier to avoid duplicating state information. We can use this combine the lag-buffer with the replay-buffer into a single coherent mechanism as long a lag-buffer w.o. replay is a possible configuration.

I can think of three components currently:

  • A storage mechanism which handles storage, ingress, and egress (i.e. ejecting oldest samples)
  • A sampler
  • A filtration mechanism that allows rejecting samples before being passed to the storage. Possibly this could replace the storage's ingress entirely
  • Possibly, we should break egress into its own component. This is slightly trickier because we want to make strong guarantees to the storage mechanism about the maximum size.
class ReplayBuffer:
  def __init__(self, ...):
    self.memory: StorageType
    self.sampler: IndexSampler
    # maybe self.ingress is a better name. This can be an observer that other buffers can also subscribe to
    self.filter: Filtration

This is likely a breaking change. Ideally, we take advantage of the refactor to eliminate some superflous interfaces (view vs buffer) in favor of giving users direct access to each individual component. That is, I'd like to be able to do the following:

buffer1 = ReplayBuffer()
buffer2 = ReplayBuffer(memory=buffer1.memory)

or something similar in order to have two buffers referencing the same underlying memory. We need to have lifecycle hooks in order for all components to be informed of new samples (e.g. in order to update the sampler). Possibly this can be accomplished with an observer pattern.

Performance testing

Should add some basic performance testing to the library. Now that I'm starting to think through code optimizations, it's important to ensure we are actually rigorously tracking performance. Need to think through how to do this in a consistent way, i.e. do github actions guarantee consistent hardware? Maybe we can use external executors like my home server, but then what happens when I upgrade that (also how to avoid abuse)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.