Giter Club home page Giter Club logo

rllte's Introduction

RLLTE: Long-Term Evolution Project of Reinforcement Learning is inspired by the long-term evolution (LTE) standard project in telecommunications, which aims to track the latest research progress in reinforcement learning (RL) and provide stable and efficient baselines. In rllte, you can find everything you need in RL, such as training, evaluation, deployment, etc. The highlight features of rllte:

  • ⏱️ Latest algorithms and tricks;
  • 🧱 Highly modularized design for complete decoupling of RL algorithms;
  • πŸš€ Optimized workflow for full hardware acceleration;
  • βš™οΈ Support for custom environments;
  • πŸ–₯️ Support for multiple computing devices like GPU and NPU;
  • πŸ› οΈ Support for RL model engineering deployment (TensorRT, CANN, ...);
  • πŸ’Ύ Large number of reusable bechmarks (See rllte-benchmark);

Quick Start

Installation

  • Prerequisites

Currently, we recommend Python>=3.8, and user can create an virtual environment by

conda create -n rllte python=3.8
  • with pip recommended

Open up a terminal and install rllte with pip:

pip install rllte # basic installation
pip install rllte[envs] # for pre-defined environments
  • with git

Open up a terminal and clone the repository from GitHub with git:

git clone https://github.com/RLE-Foundation/rllte.git

After that, run the following command to install package and dependencies:

pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments

For more detailed installation instruction, see https://docs.rllte.dev/getting_started.

Start Training

On NVIDIA GPU

For example, we want to use DrQ-v2 to solve a task of DeepMind Control Suite, and it suffices to write a train.py like:

# Import `env` and `agent` api
from rllte.env import make_dmc_env 
from rllte.xploit.agent import DrQv2

if __name__ == "__main__":
    device = "cuda:0"
    # Create env, `eval_env` is optional
    env = make_dmc_env(env_id="cartpole_balance", device=device)
    eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
    # create agent
    agent = DrQv2(env=env, 
                  eval_env=eval_env, 
                  device='cuda',
                  tag="drqv2_dmc_pixel")
    # start training
    agent.train(num_train_steps=5000)

Run train.py and you will see the following output:

On HUAWEI NPU

Similarly, if we want to train an agent on HUAWEI NPU, it suffices to replace DrQv2 with NpuDrQv2:

# Import `env` and `agent` api
from rllte.env import make_dmc_env 
from rllte.xploit.agent import NpuDrQv2

if __name__ == "__main__":
    device = "npu:0"
    # Create env, `eval_env` is optional
    env = make_dmc_env(env_id="cartpole_balance", device=device)
    eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
    # create agent
    agent = NpuDrQv2(env=env, 
                  eval_env=eval_env, 
                  device='cuda',
                  tag="drqv2_dmc_pixel")
    # start training
    agent.train(num_train_steps=5000)

Then you will see the following output:

Please refer to Implemented Modules for the compatibility of NPU.

For more detailed tutorials, see https://docs.rllte.dev/tutorials.

Implemented Modules

Roadmap

rllte evolves based on reinforcement learning algorithms and integrates latest tricks. The following figure demonstrates the main evolution roadmap of rllte:

Project Structure

See the project structure below:

  • Common: Auxiliary modules like trainer and logger.

    • Engine: Engine for building rllte application.
    • Logger: Logger for managing output information.
  • Xploit: Modules that focus on exploitation in RL.

    • Encoder: Neural nework-based encoder for processing observations.
    • Agent: Agent for interacting and learning.
    • Storage: Storage for storing collected experiences.
  • Xplore: Modules that focus on exploration in RL.

    • Augmentation: PyTorch.nn-like modules for observation augmentation.
    • Distribution: Distributions for sampling actions.
    • Reward: Intrinsic reward modules for enhancing exploration.
  • Evaluation: Reasonable and reliable metrics for algorithm evaluation.

  • Env: Packaged environments (e.g., Atari games) for fast invocation.

  • Pre-training: Methods of pre-training in RL.

  • Deployment: Methods of model deployment in RL.

For more detiled descriptions of these modules, see https://docs.rllte.dev/api

RL Agents

Module Recurrent Box Discrete MultiBinary Multi Processing NPU Paper Citations
SAC ❌ βœ”οΈ ❌ ❌ ❌ 🐌 Link 5077⭐
DrQ ❌ βœ”οΈ ❌ ❌ ❌ 🐌 Link 433⭐
DDPG ❌ βœ”οΈ ❌ ❌ ❌ βœ”οΈ Link 11819⭐
DrQ-v2 ❌ βœ”οΈ ❌ ❌ ❌ βœ”οΈ Link 100⭐
PPO ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ Link 11155⭐
DrAC ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ Link 29⭐
DAAC ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ 🐌 Link 56⭐
PPG ❌ βœ”οΈ βœ”οΈ ❌ βœ”οΈ 🐌 Link 82⭐
IMPALA βœ”οΈ βœ”οΈ βœ”οΈ ❌ βœ”οΈ 🐌 Link 1219⭐
  • 🐌: Developing.
  • NPU: Support Neural-network processing unit.
  • Recurrent: Support recurrent neural network.
  • Box: A N-dimensional box that containes every point in the action space.
  • Discrete: A list of possible actions, where each timestep only one of the actions can be used.
  • MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.

Intrinsic Reward Modules

Module Remark Repr. Visual Reference
PseudoCounts Count-Based exploration βœ”οΈ βœ”οΈ Never Give Up: Learning Directed Exploration Strategies
ICM Curiosity-driven exploration βœ”οΈ βœ”οΈ Curiosity-Driven Exploration by Self-Supervised Prediction
RND Count-based exploration ❌ βœ”οΈ Exploration by Random Network Distillation
GIRM Curiosity-driven exploration βœ”οΈ βœ”οΈ Intrinsic Reward Driven Imitation Learning via Generative Model
NGU Memory-based exploration βœ”οΈ βœ”οΈ Never Give Up: Learning Directed Exploration Strategies
RIDE Procedurally-generated environment βœ”οΈ βœ”οΈ RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
RE3 Entropy Maximization ❌ βœ”οΈ State Entropy Maximization with Random Encoders for Efficient Exploration
RISE Entropy Maximization ❌ βœ”οΈ RΓ©nyi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning
REVD Divergence Maximization ❌ βœ”οΈ Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning
  • 🐌: Developing.
  • Repr.: The method involves representation learning.
  • Visual: The method works well in visual RL.

See Tutorials: Use intrinsic reward and observation augmentation for usage examples.

Model Zoo

rllte provides a large number of reusable bechmarks, see https://hub.rllte.dev/ and https://docs.rllte.dev/benchmarks/

API Documentation

View our well-designed documentation: https://docs.rllte.dev/

How To Contribute

Welcome to contribute to this project! Before you begin writing code, please read CONTRIBUTING.md for guide first.

Acknowledgment

This project is supported by FUNDING.yml. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See ACKNOWLEDGMENT.md.

rllte's People

Contributors

heodel avatar shihaoluo avatar williamyangxu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.