Giter Club home page Giter Club logo

rlzoo's Introduction

Reinforcement Learning Zoo

Documentation Status Supported TF Version Downloads



RLzoo is a collection of the most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0 and API of neural network layers in TensorLayer2.0+, to provide a hands-on fast-developing approach for reinforcement learning practices and benchmarks. It supports basic toy-tests like OpenAI Gym and DeepMind Control Suite with very simple configurations. Moreover, RLzoo supports robot learning benchmark environment RLBench based on Vrep/Pyrep simulator. Other large-scale distributed training framework for more realistic scenarios with Unity 3D, Mujoco, Bullet Physics, etc, will be supported in the future. A Springer textbook is also provided, you can get the free PDF if your institute has Springer license.

Different from RLzoo for simple usage with high-level APIs, we also have a RL tutorial that aims to make the reinforcement learning tutorial simple, transparent and straight-forward with low-level APIs, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly.

Please check our Online Documentation for detailed usage and the Arxiv paper for the high-level description of design choices plus comparisons with other RL libraries. We suggest users to report bugs using Github issues. Users can also discuss how to use RLzoo in the following slack channel.



[News] RLzoo's paper is accepted at ACM Multimedia 2021 Open Source Software Competition! See a simple presentation slide describing the key characteristics of RLzoo.

Table of contents:

Status: Release

Current status [click to expand]
We are currently open to any suggestions or pull requests from the community to make RLzoo a better repository. Given the scope of this project, we expect there could be some issues over the coming months after initial release. We will keep improving the potential problems and commit when significant changes are made in the future. Current default hyperparameters for each algorithm and each environment may not be optimal, so you can play around with those hyperparameters to achieve best performances. We will release a version with optimal hyperparameters and benchmark results for all algorithms in the future.
Version History [click to expand]
  • 1.0.4 (Current version)

    Changes:

    • Add distributed training for DPPO algorithm, using Kungfu
  • 1.0.3

    Changes:

    • Fix bugs in SAC algorithm
  • 1.0.1

    Changes:

    • Add interactive training configuration;
    • Better support RLBench environment, with multi-head network architectures to support dictionary as observation type;
    • Make the code cleaner.
  • 0.0.1

Installation

Ensure that you have Python >=3.5 (Python 3.6 is needed if using DeepMind Control Suite).

Direct installation:

pip3 install rlzoo --upgrade

Install RLzoo from Git:

git clone https://github.com/tensorlayer/RLzoo.git
cd RLzoo
pip3 install .

Prerequisites

pip3 install -r requirements.txt

List of prerequisites. [click to expand]
  • tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
  • tensorlayer >= 2.0.1
  • tensorflow-probability
  • tf-nightly-2.0-preview
  • Mujoco 2.0, dm_control, dm2gym (if using DeepMind Control Suite environments)
  • Vrep, PyRep, RLBench (if using RLBench environments, follows here, here and here)

Usage

For detailed usage, please check our online documentation.

Quick Start

Choose whatever environments with whatever RL algorithms supported in RLzoo, and enjoy the game by running following example in the root file of installed package:

# in the root folder of RLzoo package
cd rlzoo
python run_rlzoo.py

What's in run_rlzoo.py?

from rlzoo.common.env_wrappers import build_env
from rlzoo.common.utils import call_default_params
from rlzoo.algorithms import TD3  # import the algorithm to use
# choose an algorithm
AlgName = 'TD3'
# chose an environment
EnvName = 'Pendulum-v0'  
# select a corresponding environment type
EnvType = 'classic_control'
# build an environment with wrappers
env = build_env(EnvName, EnvType)  
# call default parameters for the algorithm and learning process
alg_params, learn_params = call_default_params(env, EnvType, AlgName)  
# instantiate the algorithm
alg = eval(AlgName+'(**alg_params)')
# start the training
alg.learn(env=env, mode='train', render=False, **learn_params)  
# test after training 
alg.learn(env=env, mode='test', render=True, **learn_params)  

The main script run_rlzoo.py follows (almost) the same structure for all algorithms on all environments, see the full list of examples.

General Descriptions: RLzoo provides at least two types of interfaces for running the learning algorithms, with (1) implicit configurations or (2) explicit configurations. Both of them start learning program through running a python script, instead of running a long command line with all configurations shortened to be arguments of it (e.g. in Openai Baseline). Our approaches are found to be more interpretable, flexible and convenient to apply in practice. According to the level of explicitness of learning configurations, we provided two different ways of setting learning configurations in python scripts: the first one with implicit configurations uses a default.py script to record all configurations for each algorithm, while the second one with explicit configurations exposes all configurations to the running scripts. Both of them can run any RL algorithms on any environments supported in our repository with a simple command line.

1. Implicit Configurations [click to expand]

RLzoo with implicit configurations means the configurations for learning are not explicitly contained in the main script for running (i.e. run_rlzoo.py), but in the default.py file in each algorithm folder (for example, rlzoo/algorithms/sac/default.py is the default parameters configuration for SAC algorithm). All configurations include (1) parameter values for the algorithm and learning process, (2) the network structures, (3) the optimizers, etc, are divided into configurations for the algorithm (stored in alg_params) and configurations for the learning process (stored in learn_params). Whenever you want to change the configurations for the algorithm or learning process, you can either go to the folder of each algorithm and modify parameters in default.py, or change the values in alg_params (a dictionary of configurations for the algorithm) and learn_params (a dictionary of configurations for the learning process) in run_rlzoo.py according to the keys.

Common Interface:

from rlzoo.common.env_wrappers import build_env
from rlzoo.common.utils import call_default_params
from rlzoo.algorithms import *
# choose an algorithm
AlgName = 'TD3'
# chose an environment
EnvName = 'Pendulum-v0'  
# select a corresponding environment type
EnvType = ['classic_control', 'atari', 'box2d', 'mujoco', 'robotics', 'dm_control', 'rlbench'][0] 
# build an environment with wrappers
env = build_env(EnvName, EnvType)  
# call default parameters for the algorithm and learning process
alg_params, learn_params = call_default_params(env, EnvType, AlgName)  
# instantiate the algorithm
alg = eval(AlgName+'(**alg_params)')
# start the training
alg.learn(env=env, mode='train', render=False, **learn_params)  
# test after training 
alg.learn(env=env, mode='test', render=True, **learn_params)  
# in the root folder of rlzoo package
cd rlzoo
python run_rlzoo.py
2. Explicit Configurations [click to expand]

RLzoo with explicit configurations means the configurations for learning, including parameter values for the algorithm and the learning process, the network structures used in the algorithms and the optimizers etc, are explicitly displayed in the main script for running. And the main scripts for demonstration are under the folder of each algorithm, for example, ./rlzoo/algorithms/sac/run_sac.py can be called with python algorithms/sac/run_sac.py from the file ./rlzoo to run the learning process same as in above implicit configurations.

A Quick Example

import gym
from rlzoo.common.utils import make_env, set_seed
from rlzoo.algorithms import AC
from rlzoo.common.value_networks import ValueNetwork
from rlzoo.common.policy_networks import StochasticPolicyNetwork

''' load environment '''
env = gym.make('CartPole-v0').unwrapped
obs_space = env.observation_space
act_space = env.action_space
# reproducible
seed = 2
set_seed(seed, env)

''' build networks for the algorithm '''
num_hidden_layer = 4 #number of hidden layers for the networks
hidden_dim = 64 # dimension of hidden layers for the networks
with tf.name_scope('AC'):
        with tf.name_scope('Critic'):
            	# choose the critic network, can be replaced with customized network
                critic = ValueNetwork(obs_space, hidden_dim_list=num_hidden_layer * [hidden_dim])
        with tf.name_scope('Actor'):
            	# choose the actor network, can be replaced with customized network
                actor = StochasticPolicyNetwork(obs_space, act_space, hidden_dim_list=num_hidden_layer * [hidden_dim], output_activation=tf.nn.tanh)
net_list = [actor, critic] # list of the networks

''' choose optimizers '''
a_lr, c_lr = 1e-4, 1e-2  # a_lr: learning rate of the actor; c_lr: learning rate of the critic
a_optimizer = tf.optimizers.Adam(a_lr)
c_optimizer = tf.optimizers.Adam(c_lr)
optimizers_list=[a_optimizer, c_optimizer]  # list of optimizers

# intialize the algorithm model, with algorithm parameters passed in
model = AC(net_list, optimizers_list)
''' 
full list of arguments for the algorithm
----------------------------------------
net_list: a list of networks (value and policy) used in the algorithm, from common functions or customization
optimizers_list: a list of optimizers for all networks and differentiable variables
gamma: discounted factor of reward
action_range: scale of action values
'''

# start the training process, with learning parameters passed in
model.learn(env, train_episodes=500,  max_steps=200,
            save_interval=50, mode='train', render=False)
''' 
full list of parameters for training
---------------------------------------
env: learning environment
train_episodes:  total number of episodes for training
test_episodes:  total number of episodes for testing
max_steps:  maximum number of steps for one episode
save_interval: time steps for saving the weights and plotting the results
mode: 'train' or 'test'
render:  if true, visualize the environment
'''

# test after training
model.learn(env, test_episodes=100, max_steps=200,  mode='test', render=True)

In the package folder, we provides examples with explicit configurations for each algorithm.

# in the root folder of rlzoo package
cd rlzoo
python algorithms/<ALGORITHM_NAME>/run_<ALGORITHM_NAME>.py 
# for example: run actor-critic
python algorithms/ac/run_ac.py

Interactive Configurations

We also provide an interactive learning configuration with Jupyter Notebook and ipywidgets, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with rlzoo/interactive/main.ipynb by running $ jupyter notebook to open it.

Interactive Video

Distributed Training

RLzoo supports distributed training frameworks across multiple computational nodes with multiple CPUs/GPUs, using the Kungfu package. The installation of Kungfu requires to install CMake and Golang first, details see the website of Kungfu. An example for distributed training is contained in folder rlzoo/distributed, by running the following command, you will launch the distributed training process:

rlzoo/distributed/run_dis_train.sh
Code in Bash script [click to expand]
#!/bin/sh
set -e

cd $(dirname $0)

kungfu_flags() {
    echo -q
    echo -logdir logs

    local ip1=127.0.0.1
    local np1=$np

    local ip2=127.0.0.10
    local np2=$np
    local H=$ip1:$np1,$ip2:$np2
    local m=cpu,gpu

    echo -H $ip1:$np1
}

prun() {
    local np=$1
    shift
    kungfu-run $(kungfu_flags) -np $np $@
}

n_learner=2
n_actor=2
n_server=1

flags() {
    echo -l $n_learner
    echo -a $n_actor
    echo -s $n_server
}

rl_run() {
    local n=$((n_learner + n_actor + n_server))
    prun $n python3 training_components.py $(flags)
}

main() {
    rl_run
}

main

The script specifies the ip addresses for different computational nodes, as well as the number of policy learners (updating the models), actors (sampling through interaction with environments) and inference servers (policy forward inference during sampling process) as n_learner, n_actor and n_server respectively. n_server can only be 1 at current version.

Other training details are specified in an individual Python script named training_components.py within the same directory as run_dis_train.sh, which can be seen as following.

Code in Python script [click to expand]
from rlzoo.common.env_wrappers import build_env
from rlzoo.common.policy_networks import *
from rlzoo.common.value_networks import *
from rlzoo.algorithms.dppo_clip_distributed.dppo_clip import DPPO_CLIP
from functools import partial

# Specify the training configurations
training_conf = {
    'total_step': int(1e7),  # overall training timesteps
    'traj_len': 200,         # length of the rollout trajectory
    'train_n_traj': 2,       # update the models after every certain number of trajectories for each learner 
    'save_interval': 10,     # saving the models after every certain number of updates
}

# Specify the environment and launch it
env_name, env_type = 'CartPole-v0', 'classic_control'
env_maker = partial(build_env, env_name, env_type)
temp_env = env_maker()
obs_shape, act_shape = temp_env.observation_space.shape, temp_env.action_space.shape

env_conf = {
    'env_name': env_name,
    'env_type': env_type,
    'env_maker': env_maker,
    'obs_shape': obs_shape,
    'act_shape': act_shape,
}


def build_network(observation_space, action_space, name='DPPO_CLIP'):
    """ build networks for the algorithm """
    hidden_dim = 256
    num_hidden_layer = 2
    critic = ValueNetwork(observation_space, [hidden_dim] * num_hidden_layer, name=name + '_value')

    actor = StochasticPolicyNetwork(observation_space, action_space,
                                    [hidden_dim] * num_hidden_layer,
                                    trainable=True,
                                    name=name + '_policy')
    return critic, actor


def build_opt(actor_lr=1e-4, critic_lr=2e-4):
    """ choose the optimizer for learning """
    import tensorflow as tf
    return [tf.optimizers.Adam(critic_lr), tf.optimizers.Adam(actor_lr)]


net_builder = partial(build_network, temp_env.observation_space, temp_env.action_space)
opt_builder = partial(build_opt, )

agent_conf = {
    'net_builder': net_builder,
    'opt_builder': opt_builder,
    'agent_generator': partial(DPPO_CLIP, net_builder, opt_builder),
}
del temp_env

from rlzoo.distributed.start_dis_role import main

print('Start Training.')
main(training_conf, env_conf, agent_conf)
print('Training Finished.')
	

Users can specify the environment, network architectures, optimizers and other training detains in this script.

Note: if RLzoo is installed, you can create the two scripts run_dis_train.sh and training_components.py in whatever directory to launch distributed training, as long as the two scripts are in the same directory.

Contents

Algorithms

Choices for AlgName: 'DQN', 'AC', 'A3C', 'DDPG', 'TD3', 'SAC', 'PG', 'TRPO', 'PPO', 'DPPO'

Algorithms Papers
Value-based
Q-learning Technical note: Q-learning. Watkins et al. 1992
Deep Q-Network (DQN) Human-level control through deep reinforcement learning, Mnih et al. 2015.
Prioritized Experience Replay Schaul et al. Prioritized experience replay. Schaul et al. 2015.
Dueling DQN Dueling network architectures for deep reinforcement learning. Wang et al. 2015.
Double DQN Deep reinforcement learning with double q-learning. Van et al. 2016.
Retrace Safe and efficient off-policy reinforcement learning. Munos et al. 2016:
Noisy DQN Noisy networks for exploration. Fortunato et al. 2017.
Distributed DQN (C51) A distributional perspective on reinforcement learning. Bellemare et al. 2017.
Policy-based
REINFORCE(PG) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Ronald J. Williams 1992.
Trust Region Policy Optimization (TRPO) Abbeel et al. Trust region policy optimization. Schulman et al.2015.
Proximal Policy Optimization (PPO) Proximal policy optimization algorithms. Schulman et al. 2017.
Distributed Proximal Policy Optimization (DPPO) Emergence of locomotion behaviours in rich environments. Heess et al. 2017.
Actor-Critic
Actor-Critic (AC) Actor-critic algorithms. Konda er al. 2000.
Asynchronous Advantage Actor-Critic (A3C) Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.
Deep Deterministic Policy Gradient (DDPG) Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016
Twin Delayed DDPG (TD3) Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.
Soft Actor-Critic (SAC) Soft actor-critic algorithms and applications. Haarnoja et al. 2018.

Environments

Choices for EnvType: 'atari', 'box2d', 'classic_control', 'mujoco', 'robotics', 'dm_control', 'rlbench'

Some notes on environment usage. [click to expand]
  • Make sure the name of environment matches the type of environment in the main script. The types of environments include: 'atari', 'box2d', 'classic_control', 'mujoco', 'robotics', 'dm_control', 'rlbench'.

  • When using the DeepMind Control Suite, install the dm2gym package with: pip install dm2gym

  • When using the RLBench environments, please add the path of your local rlbench repository to python: export PYTHONPATH=PATH_TO_YOUR_LOCAL_RLBENCH_REPO

  • A dictionary of all different environments is stored in ./rlzoo/common/env_list.py

  • Full list of environments in RLBench is here.

  • Installation of Vrep->PyRep->RLBench follows here->here->here.

Configurations:

The supported configurations for RL algorithms with corresponding environments in RLzoo are listed in the following table.

Algorithms Action Space Policy Update Envs
DQN (double, dueling, PER) Discrete Only -- Off-policy Atari, Classic Control
AC Discrete/Continuous Stochastic On-policy All
PG Discrete/Continuous Stochastic On-policy All
DDPG Continuous Deterministic Off-policy Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench
TD3 Continuous Deterministic Off-policy Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench
SAC Continuous Stochastic Off-policy Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench
A3C Discrete/Continuous Stochastic On-policy Atari, Classic Control, Box2D, Mujoco, Robotics, DeepMind Control
PPO Discrete/Continuous Stochastic On-policy All
DPPO Discrete/Continuous Stochastic On-policy Atari, Classic Control, Box2D, Mujoco, Robotics, DeepMind Control
TRPO Discrete/Continuous Stochastic On-policy All

Properties

1. Automatic model construction [click to expand]
We aim to make it easy to configure for all components within RL, including replacing the networks, optimizers, etc. We also provide automatically adaptive policies and value functions in the common functions: for the observation space, the vector state or the raw-pixel (image) state are supported automatically according to the shape of the space; for the action space, the discrete action or continuous action are supported automatically according to the shape of the space as well. The deterministic or stochastic property of policy needs to be chosen according to each algorithm. Some environments with raw-pixel based observation (e.g. Atari, RLBench) may be hard to train, be patient and play around with the hyperparameters!
3. Simple and flexible API [click to expand]
As described in the Section of Usage, we provide at least two ways of deploying RLzoo: implicit configuration and explicit configuration process. We ensure the maximum flexiblity for different use cases with this design.
3. Sufficient support for DRL algorithms and environments [click to expand]
As shown in above algorithms and environments tables.
4. Interactive reinforcement learning configuration. [click to expand]

As shown in the interactive use case in Section of Usage, a jupyter notebook is provided for more intuitively configuring the whole process of deploying the learning process (rlzoo/interactive/main.ipynb)

Troubleshooting

  • If you meet the error 'AttributeError: module 'tensorflow' has no attribute 'contrib'' when running the code after installing tensorflow-probability, try: pip install --upgrade tf-nightly-2.0-preview tfp-nightly
  • When trying to use RLBench environments, 'No module named rlbench' can be caused by no RLBench package installed at your local or a mistake in the python path. You should add export PYTHONPATH=/home/quantumiracle/research/vrep/PyRep/RLBench every time you try to run the learning script with RLBench environment or add it to you ~/.bashrc file once for all.
  • If you meet the error that the Qt platform is not loaded correctly when using DeepMind Control Suite environments, it's probably caused by your Ubuntu system not being version 14.04 or 16.04. Check here.

Credits

Our core contributors include:

Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Guo Li, Quancheng Guo, Luo Mai, Hao Dong

Citing

@article{ding2020rlzoo,
  title={RLzoo: A Comprehensive and Adaptive Reinforcement Learning Library},
  author={Ding, Zihan and Yu, Tianyang and Huang, Yanhua and Zhang, Hongming and Mai, Luo and Dong, Hao},
  journal={arXiv preprint arXiv:2009.08644},
  year={2020}
}

and

@book{deepRL-2020,
 title={Deep Reinforcement Learning: Fundamentals, Research, and Applications},
 editor={Hao Dong, Zihan Ding, Shanghang Zhang},
 author={Hao Dong, Zihan Ding, Shanghang Zhang, Hang Yuan, Hongming Zhang, Jingqing Zhang, Yanhua Huang, Tianyang Yu, Huaqing Zhang, Ruitong Huang},
 publisher={Springer Nature},
 note={\url{http://www.deepreinforcementlearningbook.org}},
 year={2020}
}

Other Resources





rlzoo's People

Contributors

dependabot[bot] avatar gqc666 avatar hanjr92 avatar mirkomorati avatar officium avatar quantumiracle avatar tokarev-tt-33 avatar zsdonghao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rlzoo's Issues

ImportError: cannot import name 'ArmActionMode'

hi, I have run (rlzoo/interactive/main.ipynb ) and I got this error
ImportError: cannot import name 'ArmActionMode'
I did the solution that was suggested in #43 but now I get a new error and the problem isn't solved.
the error is action_shape() missing 1 required positional argument: 'scene'
what should I do?

Results using RLBench as the environment

Hi,
first of all let me say that I appreciate a lot the work made in this repo.
I would like to know if you have had success in training any algorithm using RLBench as the environment.
I'm currently trying to train the DDPG algorithm on the ReachTarget task using all the observations available with state_type='vision'. As suggested in the issue #6 I modified the default params for DDPG lowering the max_steps and increasing the train_episodes, but I can't seem to get any result.
Any feedback is really much appreciated.

Mirko

Edit:
I noticed that RLBench doesn't provide "usable" reward metrics, am I wrong? All the episodes rewards are either 0.000 or 1.000. Any insight on this problem?

Could PPO solve DM control tasks?

I install RLzoo and use its PPO to train an agent for DM Control Suite. I tested environments CheetahRun-v0 and CartpoleSwingup-v0, but the current PPO could solve neither of both. Could you please help me? I attach the testing reward for CartpoleSwingup-v0 below.

Testing... | Algorithm: PPO_CLIP | Environment: CartpoleSwingup-v0
Episode: 0/100 | Episode Reward: 28.4757 | Running Time: 0.5145
Episode: 1/100 | Episode Reward: 28.9385 | Running Time: 0.9912
Episode: 2/100 | Episode Reward: 28.6354 | Running Time: 1.4690
Episode: 3/100 | Episode Reward: 29.3395 | Running Time: 1.9561
Episode: 4/100 | Episode Reward: 28.6659 | Running Time: 2.4513
Episode: 5/100 | Episode Reward: 29.1282 | Running Time: 2.9346
Episode: 6/100 | Episode Reward: 28.1226 | Running Time: 3.4143
Episode: 7/100 | Episode Reward: 28.2178 | Running Time: 3.8957
Episode: 8/100 | Episode Reward: 28.2538 | Running Time: 4.3784
Episode: 9/100 | Episode Reward: 28.1161 | Running Time: 4.8581
Episode: 10/100 | Episode Reward: 28.2593 | Running Time: 5.3397
Episode: 11/100 | Episode Reward: 28.5096 | Running Time: 5.8084
Episode: 12/100 | Episode Reward: 27.9026 | Running Time: 6.2774
Episode: 13/100 | Episode Reward: 28.5970 | Running Time: 6.7430
Episode: 14/100 | Episode Reward: 28.6751 | Running Time: 7.2089
Episode: 15/100 | Episode Reward: 28.5764 | Running Time: 7.6745
Episode: 16/100 | Episode Reward: 28.2926 | Running Time: 8.1450
Episode: 17/100 | Episode Reward: 28.1627 | Running Time: 8.6118
Episode: 18/100 | Episode Reward: 28.3646 | Running Time: 9.1045
Episode: 19/100 | Episode Reward: 29.0091 | Running Time: 9.5852
Episode: 20/100 | Episode Reward: 28.6220 | Running Time: 10.0611
Episode: 21/100 | Episode Reward: 28.0959 | Running Time: 10.5363
Episode: 22/100 | Episode Reward: 28.3945 | Running Time: 11.0277
Episode: 23/100 | Episode Reward: 27.9222 | Running Time: 11.4943
Episode: 24/100 | Episode Reward: 29.1726 | Running Time: 11.9824
Episode: 25/100 | Episode Reward: 27.4740 | Running Time: 12.4491
Episode: 26/100 | Episode Reward: 29.0847 | Running Time: 13.0193
Episode: 27/100 | Episode Reward: 28.7490 | Running Time: 13.5816
Episode: 28/100 | Episode Reward: 29.4471 | Running Time: 14.0713
Episode: 29/100 | Episode Reward: 28.9889 | Running Time: 14.5420
Episode: 30/100 | Episode Reward: 27.9317 | Running Time: 15.0105
Episode: 31/100 | Episode Reward: 28.6057 | Running Time: 15.4807
Episode: 32/100 | Episode Reward: 27.9958 | Running Time: 15.9642
Episode: 33/100 | Episode Reward: 28.7247 | Running Time: 16.4500
Episode: 34/100 | Episode Reward: 28.7462 | Running Time: 16.9543
Episode: 35/100 | Episode Reward: 27.8867 | Running Time: 17.4389
Episode: 36/100 | Episode Reward: 28.2105 | Running Time: 17.9195
Episode: 37/100 | Episode Reward: 28.7734 | Running Time: 18.4043
Episode: 38/100 | Episode Reward: 28.7227 | Running Time: 18.8784
Episode: 39/100 | Episode Reward: 28.0787 | Running Time: 19.4129
Episode: 40/100 | Episode Reward: 28.7410 | Running Time: 19.9047
Episode: 41/100 | Episode Reward: 28.2244 | Running Time: 20.3926
Episode: 42/100 | Episode Reward: 28.6137 | Running Time: 20.8720
Episode: 43/100 | Episode Reward: 28.1213 | Running Time: 21.3447
Episode: 44/100 | Episode Reward: 28.7770 | Running Time: 21.8240
Episode: 45/100 | Episode Reward: 28.4468 | Running Time: 22.3140
Episode: 46/100 | Episode Reward: 28.3316 | Running Time: 22.7774
Episode: 47/100 | Episode Reward: 28.9745 | Running Time: 23.2579
Episode: 48/100 | Episode Reward: 28.5198 | Running Time: 23.7473
Episode: 49/100 | Episode Reward: 28.2299 | Running Time: 24.2266
Episode: 50/100 | Episode Reward: 27.5498 | Running Time: 24.7056
Episode: 51/100 | Episode Reward: 28.1589 | Running Time: 25.1900
Episode: 52/100 | Episode Reward: 28.2864 | Running Time: 25.6784
Episode: 53/100 | Episode Reward: 28.6884 | Running Time: 26.1567
Episode: 54/100 | Episode Reward: 28.1469 | Running Time: 26.6400
Episode: 55/100 | Episode Reward: 28.5643 | Running Time: 27.1302
Episode: 56/100 | Episode Reward: 28.3990 | Running Time: 27.6151
Episode: 57/100 | Episode Reward: 28.4950 | Running Time: 28.0974
Episode: 58/100 | Episode Reward: 27.8701 | Running Time: 28.5832
Episode: 59/100 | Episode Reward: 28.7812 | Running Time: 29.0694
Episode: 60/100 | Episode Reward: 27.9976 | Running Time: 29.5470
Episode: 61/100 | Episode Reward: 28.6969 | Running Time: 30.0193
Episode: 62/100 | Episode Reward: 28.6212 | Running Time: 30.4909
Episode: 63/100 | Episode Reward: 27.4787 | Running Time: 30.9708
Episode: 64/100 | Episode Reward: 28.4545 | Running Time: 31.4938
Episode: 65/100 | Episode Reward: 28.5045 | Running Time: 31.9696
Episode: 66/100 | Episode Reward: 27.7482 | Running Time: 32.4473
Episode: 67/100 | Episode Reward: 28.2154 | Running Time: 32.9266
Episode: 68/100 | Episode Reward: 28.5635 | Running Time: 33.3989
Episode: 69/100 | Episode Reward: 28.1430 | Running Time: 33.8813
Episode: 70/100 | Episode Reward: 28.6439 | Running Time: 34.3629
Episode: 71/100 | Episode Reward: 28.0486 | Running Time: 34.8364
Episode: 72/100 | Episode Reward: 28.7735 | Running Time: 35.3201
Episode: 73/100 | Episode Reward: 28.5547 | Running Time: 35.8198
Episode: 74/100 | Episode Reward: 28.8559 | Running Time: 36.3093
Episode: 75/100 | Episode Reward: 28.3621 | Running Time: 36.7830
Episode: 76/100 | Episode Reward: 28.5224 | Running Time: 37.2498
Episode: 77/100 | Episode Reward: 27.8541 | Running Time: 37.7178
Episode: 78/100 | Episode Reward: 28.5152 | Running Time: 38.1858
Episode: 79/100 | Episode Reward: 28.2094 | Running Time: 38.6564
Episode: 80/100 | Episode Reward: 27.7631 | Running Time: 39.1511
Episode: 81/100 | Episode Reward: 28.6392 | Running Time: 39.6275
Episode: 82/100 | Episode Reward: 29.1371 | Running Time: 40.1082
Episode: 83/100 | Episode Reward: 28.3666 | Running Time: 40.6566
Episode: 84/100 | Episode Reward: 28.5571 | Running Time: 41.1946
Episode: 85/100 | Episode Reward: 27.9868 | Running Time: 41.7414
Episode: 86/100 | Episode Reward: 28.3500 | Running Time: 42.3022
Episode: 87/100 | Episode Reward: 28.3478 | Running Time: 42.8577
Episode: 88/100 | Episode Reward: 28.2653 | Running Time: 43.4031
Episode: 89/100 | Episode Reward: 27.4543 | Running Time: 43.9250
Episode: 90/100 | Episode Reward: 28.3082 | Running Time: 44.4695
Episode: 91/100 | Episode Reward: 28.7126 | Running Time: 44.9914
Episode: 92/100 | Episode Reward: 28.8012 | Running Time: 45.5375
Episode: 93/100 | Episode Reward: 28.3971 | Running Time: 46.0787
Episode: 94/100 | Episode Reward: 28.2597 | Running Time: 46.6196
Episode: 95/100 | Episode Reward: 28.6428 | Running Time: 47.1560
Episode: 96/100 | Episode Reward: 28.4852 | Running Time: 47.6989
Episode: 97/100 | Episode Reward: 29.4192 | Running Time: 48.2465
Episode: 98/100 | Episode Reward: 29.0577 | Running Time: 48.7841
Episode: 99/100 | Episode Reward: 28.3379 | Running Time: 49.3224

Results on Box2D environments

I tried to benchmark the follwing environments ['BipedalWalker-v2', 'BipedalWalkerHardcore-v2', 'CarRacing-v0', 'LunarLander-v2', 'LunarLanderContinuous-v2'] using ['A3C', 'DDPG', 'TD3', 'SAC', 'PG', 'TRPO', 'PPO', 'DPPO'] algorithms. Most of the combinations failed to learn the task and didn't converge. Only (SAC, LunarLanderContinuous-v2) and (TD3, LunarLanderContinuous-v2) learnt the task sub-optimally. . Can someone address this issue?

Does RLzoo support Dict gym env state?

My customized gym env has a dict type obs_space, even I also customized ActorNetwork and CriticNetwork, I found out RLzoo's source code seems only support single input and can not handle dict state.
Is there any plan to support dict gym env state?

ValueError: too many values to unpack

When I run the run_dqn.py with setting QNetwork(...)'s parameter state_only=False, "states, actions = inputs" in value_networks.py occurrs a ValueError as the title indicates. It occurs because "obv = np.expand_dims(obv, 0).astype('float32')" in dqn.py. I think if state_only = False, obv should add act_inputs for debugging. Hope you can fix this error.

ImportError: cannot import name 'ArmActionMode'

Screenshot from 2022-07-02 18-49-39

I am getting this issue (in the screenshot) while running RLzoo with RLbench using the following code:

from rlzoo.common.env_wrappers import *
from rlzoo.common.utils import *
from rlzoo.algorithms import *

EnvName = 'ReachTarget'
EnvType = 'rlbench'
env = build_env(EnvName, EnvType, state_type='vision')

AlgName = 'SAC'
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=False, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)

env.close()

I have also added export PYTHONPATH="/home/sidharth/RLBench" in .bashrc

Any help would be appreciated! Thanks.

A suggestion about kl divergence and entropy implementation of categorical distribution

In https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L99, I think there is a concise implementation:

@expand_dims
def kl(self, logits):
    p = tf.exp(self._logits)
    kl = tf.reduce_sum(p * (self._logits-logits), axis=-1)
    return kl

Similarly in https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L115

@expand_dims
def entropy(_logits):
    p = tf.exp(_logits)
    return tf.reduce_sum(-p * _logits, axis=-1)

I don’t know the reason why you implemented it in a more complicated way in your code. Is it convenient to tell me?

How to properly introduce a new RLBench task?

Hello!

I want to introduce a new RLBench task (or also override one). How do I accomplish this properly? The only way I can think of now is to rewrite parts of the code in the RLBench package, which I don't think is the proper way to do it. Should there be an argument to indicate where the task is defined?

Thank you!

one error when running run_rlzoo.oy

Traceback (most recent call last):
File "D:/Anaconda3/Lib/site-packages/rlzoo/run_rlzoo.py", line 30, in
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
File "D:\Anaconda3\Lib\site-packages\rlzoo\common\utils.py", line 131, in call_default_params
default_seed) # need manually set seed in the main script if default_seed = False
File "D:\Anaconda3\Lib\site-packages\rlzoo\algorithms\sac\default.py", line 43, in classic_control
soft_q_net1 = QNetwork(env.observation_space, env.action_space,
NameError: name 'QNetwork' is not defined

RLBench learning speed

Hi, I am checking this repository, I was able to install everything without apparent problem.

I am testing the run_rlzoo.py script using RLBench with the ReachTarget. I run it but it is quite slow. One episode takes about 2~3 minutes. I wonder if you have seen the same behavior when training or somehow is my configuration. At first, the episode took about 5 min, then I realized that TensorFlow was not working with my GPU, I fixed that and well now is twice as fast, but still, 3 minutes per episode is quite slow, especially if it plans to run for 1000 episodes.

It is that the normal speed using R-VEP? is there anything I can do to train using faster-than-real-time simulation with RLBench?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.