Giter Club home page Giter Club logo

di-engine's Introduction


Twitter PyPI Conda Conda update PyPI - Python Version PyTorch Version

Loc Comments

Style Read en Docs Read zh_CN Docs Unittest Algotest deploy codecov

GitHub Org's stars GitHub stars GitHub forks GitHub commit activity GitHub issues GitHub pulls Contributors GitHub license Hugging Face Open in OpenXLab discord badge slack badge

Updated on 2024.02.04 DI-engine-v0.5.1

Introduction to DI-engine

Documentation | 中文文档 | Tutorials | Feature | Task & Middleware | TreeTensor | Roadmap

DI-engine is a generalized decision intelligence engine for PyTorch and JAX.

It provides python-first and asynchronous-native task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports various deep reinforcement learning algorithms with superior performance, high efficiency, well-organized documentation and unittest:

  • Most basic DRL algorithms: such as DQN, Rainbow, PPO, TD3, SAC, R2D2, IMPALA
  • Multi-agent RL algorithms: such as QMIX, WQMIX, MAPPO, HAPPO, ACE
  • Imitation learning algorithms (BC/IRL/GAIL): such as GAIL, SQIL, Guided Cost Learning, Implicit BC
  • Offline RL algorithms: BCQ, CQL, TD3BC, Decision Transformer, EDAC, Diffuser, Decision Diffuser, SO2
  • Model-based RL algorithms: SVG, STEVE, MBPO, DDPPO, DreamerV3, MuZero
  • Exploration algorithms: HER, RND, ICM, NGU
  • LLM + RL Algorithms: PPO-max, DPO, MODPO,PromptPG
  • Other algorithms: such as PER, PLR, PCGrad

DI-engine aims to standardize different Decision Intelligence environments and applications, supporting both academic research and prototype applications. Various training pipelines and customized decision AI applications are also supported:

(Click to Collapse)
  • Traditional academic environments

    • DI-zoo: various decision intelligence demonstrations and benchmark environments with DI-engine.
  • Tutorial courses

  • Real world decision AI applications

    • DI-star: Decision AI in StarCraftII
    • DI-drive: Auto-driving platform
    • DI-sheep: Decision AI in 3 Tiles Game
    • DI-smartcross: Decision AI in Traffic Light Control
    • DI-bioseq: Decision AI in Biological Sequence Prediction and Searching
    • DI-1024: Deep Reinforcement Learning + 1024 Game
  • Research paper

    • InterFuser: [CoRL 2022] Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
    • ACE: [AAAI 2023] ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency
    • GoBigger: [ICLR 2023] Multi-Agent Decision Intelligence Environment
    • DOS: [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning
    • LightZero: [NeurIPS 2023 Spotlight] A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit
    • SO2: [AAAI 2024] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
    • LMDrive: LMDrive: Closed-Loop End-to-End Driving with Large Language Models
  • Docs and Tutorials

On the low-level end, DI-engine comes with a set of highly re-usable modules, including RL optimization functions, PyTorch utilities and auxiliary tools.

BTW, DI-engine also has some special system optimization and design for efficient and robust large-scale RL training:

(Click for Details)

Have fun with exploration and exploitation.

Outline

Installation

You can simply install DI-engine from PyPI with the following command:

pip install DI-engine

If you use Anaconda or Miniconda, you can install DI-engine from conda-forge through the following command:

conda install -c opendilab di-engine

For more information about installation, you can refer to installation.

And our dockerhub repo can be found here,we prepare base image and env image with common RL environments.

(Click for Details)
  • base: opendilab/ding:nightly
  • rpc: opendilab/ding:nightly-rpc
  • atari: opendilab/ding:nightly-atari
  • mujoco: opendilab/ding:nightly-mujoco
  • dmc: opendilab/ding:nightly-dmc2gym
  • metaworld: opendilab/ding:nightly-metaworld
  • smac: opendilab/ding:nightly-smac
  • grf: opendilab/ding:nightly-grf
  • cityflow: opendilab/ding:nightly-cityflow
  • evogym: opendilab/ding:nightly-evogym
  • d4rl: opendilab/ding:nightly-d4rl

The detailed documentation are hosted on doc | 中文文档.

Quick Start

3 Minutes Kickoff

3 Minutes Kickoff (colab)

DI-engine Huggingface Kickoff (colab)

How to migrate a new RL Env | 如何迁移一个新的强化学习环境

How to customize the neural network model | 如何定制策略使用的神经网络模型

测试/部署 强化学习策略 的样例

新老 pipeline 的异同对比

Feature

Algorithm Versatility

(Click to Collapse)

discrete  discrete means discrete action space, which is only label in normal DRL algorithms (1-23)

continuous  means continuous action space, which is only label in normal DRL algorithms (1-23)

hybrid  means hybrid (discrete + continuous) action space (1-23)

dist  Distributed Reinforcement Learning分布式强化学习

MARL  Multi-Agent Reinforcement Learning多智能体强化学习

exp  Exploration Mechanisms in Reinforcement Learning强化学习中的探索机制

IL  Imitation Learning模仿学习

offline  Offiline Reinforcement Learning离线强化学习

mbrl  Model-Based Reinforcement Learning基于模型的强化学习

other  means other sub-direction algorithms, usually as plugin-in in the whole pipeline

P.S: The .py file in Runnable Demo can be found in dizoo

No. Algorithm Label Doc and Implementation Runnable Demo
1 DQN discrete DQN doc
DQN中文文档
policy/dqn
python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0
2 C51 discrete C51 doc
policy/c51
ding -m serial -c cartpole_c51_config.py -s 0
3 QRDQN discrete QRDQN doc
policy/qrdqn
ding -m serial -c cartpole_qrdqn_config.py -s 0
4 IQN discrete IQN doc
policy/iqn
ding -m serial -c cartpole_iqn_config.py -s 0
5 FQF discrete FQF doc
policy/fqf
ding -m serial -c cartpole_fqf_config.py -s 0
6 Rainbow discrete Rainbow doc
policy/rainbow
ding -m serial -c cartpole_rainbow_config.py -s 0
7 SQL discretecontinuous SQL doc
policy/sql
ding -m serial -c cartpole_sql_config.py -s 0
8 R2D2 distdiscrete R2D2 doc
policy/r2d2
ding -m serial -c cartpole_r2d2_config.py -s 0
9 PG discrete PG doc
policy/pg
ding -m serial -c cartpole_pg_config.py -s 0
10 PromptPG discrete policy/prompt_pg ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0
11 A2C discrete A2C doc
policy/a2c
ding -m serial -c cartpole_a2c_config.py -s 0
12 PPO/MAPPO discretecontinuousMARL PPO doc
policy/ppo
python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0
13 PPG discrete PPG doc
policy/ppg
python3 -u cartpole_ppg_main.py
14 ACER discretecontinuous ACER doc
policy/acer
ding -m serial -c cartpole_acer_config.py -s 0
15 IMPALA distdiscrete IMPALA doc
policy/impala
ding -m serial -c cartpole_impala_config.py -s 0
16 DDPG/PADDPG continuoushybrid DDPG doc
policy/ddpg
ding -m serial -c pendulum_ddpg_config.py -s 0
17 TD3 continuoushybrid TD3 doc
policy/td3
python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0
18 D4PG continuous D4PG doc
policy/d4pg
python3 -u pendulum_d4pg_config.py
19 SAC/[MASAC] discretecontinuousMARL SAC doc
policy/sac
ding -m serial -c pendulum_sac_config.py -s 0
20 PDQN hybrid policy/pdqn ding -m serial -c gym_hybrid_pdqn_config.py -s 0
21 MPDQN hybrid policy/pdqn ding -m serial -c gym_hybrid_mpdqn_config.py -s 0
22 HPPO hybrid policy/ppo ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0
23 BDQ hybrid policy/bdq python3 -u hopper_bdq_config.py
24 MDQN discrete policy/mdqn python3 -u asterix_mdqn_config.py
25 QMIX MARL QMIX doc
policy/qmix
ding -m serial -c smac_3s5z_qmix_config.py -s 0
26 COMA MARL COMA doc
policy/coma
ding -m serial -c smac_3s5z_coma_config.py -s 0
27 QTran MARL policy/qtran ding -m serial -c smac_3s5z_qtran_config.py -s 0
28 WQMIX MARL WQMIX doc
policy/wqmix
ding -m serial -c smac_3s5z_wqmix_config.py -s 0
29 CollaQ MARL CollaQ doc
policy/collaq
ding -m serial -c smac_3s5z_collaq_config.py -s 0
30 MADDPG MARL MADDPG doc
policy/ddpg
ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0
31 GAIL IL GAIL doc
reward_model/gail
ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0
32 SQIL IL SQIL doc
entry/sqil
ding -m serial_sqil -c cartpole_sqil_config.py -s 0
33 DQFD IL DQFD doc
policy/dqfd
ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0
34 R2D3 IL R2D3 doc
R2D3中文文档
policy/r2d3
python3 -u pong_r2d3_r2d2expert_config.py
35 Guided Cost Learning IL Guided Cost Learning中文文档
reward_model/guided_cost
python3 lunarlander_gcl_config.py
36 TREX IL TREX doc
reward_model/trex
python3 mujoco_trex_main.py
37 Implicit Behavorial Cloning (DFO+MCMC) IL policy/ibc
model/template/ebm
python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py
38 BCO IL entry/bco python3 -u cartpole_bco_config.py
39 HER exp HER doc
reward_model/her
python3 -u bitflip_her_dqn.py
40 RND exp RND doc
reward_model/rnd
python3 -u cartpole_rnd_onppo_config.py
41 ICM exp ICM doc
ICM中文文档
reward_model/icm
python3 -u cartpole_ppo_icm_config.py
42 CQL offline CQL doc
policy/cql
python3 -u d4rl_cql_main.py
43 TD3BC offline TD3BC doc
policy/td3_bc
python3 -u d4rl_td3_bc_main.py
44 Decision Transformer offline policy/dt python3 -u d4rl_dt_mujoco.py
45 EDAC offline EDAC doc
policy/edac
python3 -u d4rl_edac_main.py
46 QGPO offline QGPO doc
policy/qgpo
python3 -u ding/example/qgpo.py
47 MBSAC(SAC+MVE+SVG) continuousmbrl policy/mbpolicy/mbsac python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py
48 STEVESAC(SAC+STEVE+SVG) continuousmbrl policy/mbpolicy/mbsac python3 -u pendulum_stevesac_mbpo_config.py
49 MBPO mbrl MBPO doc
world_model/mbpo
python3 -u pendulum_sac_mbpo_config.py
50 DDPPO mbrl world_model/ddppo python3 -u pendulum_mbsac_ddppo_config.py
51 DreamerV3 mbrl world_model/dreamerv3 python3 -u cartpole_balance_dreamer_config.py
52 PER other worker/replay_buffer rainbow demo
53 GAE other rl_utils/gae ppo demo
54 ST-DIM other torch_utils/loss/contrastive_loss ding -m serial -c cartpole_dqn_stdim_config.py -s 0
55 PLR other PLR doc
data/level_replay/level_sampler
python3 -u bigfish_plr_config.py -s 0
56 PCGrad other torch_utils/optimizer_helper/PCGrad python3 -u multi_mnist_pcgrad_main.py -s 0

Environment Versatility

(Click to Collapse)
No Environment Label Visualization Code and Doc Links
1 Atari discrete original dizoo link
env tutorial
环境指南
2 box2d/bipedalwalker continuous original dizoo link
env tutorial
环境指南
3 box2d/lunarlander discrete original dizoo link
env tutorial
环境指南
4 classic_control/cartpole discrete original dizoo link
env tutorial
环境指南
5 classic_control/pendulum continuous original dizoo link
env tutorial
环境指南
6 competitive_rl discrete selfplay original dizoo link
环境指南
7 gfootball discretesparseselfplay original dizoo link
env tutorial
环境指南
8 minigrid discretesparse original dizoo link
env tutorial
环境指南
9 MuJoCo continuous original dizoo link
env tutorial
环境指南
10 PettingZoo discrete continuous marl original dizoo link
env tutorial
环境指南
11 overcooked discrete marl original dizoo link
env tutorial
12 procgen discrete original dizoo link
env tutorial
环境指南
13 pybullet continuous original dizoo link
环境指南
14 smac discrete marlselfplaysparse original dizoo link
env tutorial
环境指南
15 d4rl offline ori dizoo link
环境指南
16 league_demo discrete selfplay original dizoo link
17 pomdp atari discrete dizoo link
18 bsuite discrete original dizoo link
env tutorial
环境指南
19 ImageNet IL original dizoo link
环境指南
20 slime_volleyball discreteselfplay ori dizoo link
env tutorial
环境指南
21 gym_hybrid hybrid ori dizoo link
env tutorial
环境指南
22 GoBigger hybridmarlselfplay ori dizoo link
env tutorial
环境指南
23 gym_soccer hybrid ori dizoo link
环境指南
24 multiagent_mujoco continuous marl original dizoo link
环境指南
25 bitflip discrete sparse original dizoo link
环境指南
26 sokoban discrete Game 2 dizoo link
env tutorial
环境指南
27 gym_anytrading discrete original dizoo link
env tutorial
28 mario discrete original dizoo link
env tutorial
环境指南
29 dmc2gym continuous original dizoo link
env tutorial
环境指南
30 evogym continuous original dizoo link
env tutorial
环境指南
31 gym-pybullet-drones continuous original dizoo link
环境指南
32 beergame discrete original dizoo link
环境指南
33 classic_control/acrobot discrete original dizoo link
环境指南
34 box2d/car_racing discrete
continuous
original dizoo link
环境指南
35 metadrive continuous original dizoo link
环境指南
36 cliffwalking discrete original dizoo link
env tutorial
环境指南
37 tabmwp discrete original dizoo link
env tutorial
环境指南
38 frozen_lake discrete original dizoo link
env tutorial
环境指南
39 ising_model discrete marl original dizoo link
env tutorial
环境指南

discrete means discrete action space

continuous means continuous action space

hybrid means hybrid (discrete + continuous) action space

MARL means multi-agent RL environment

sparse means environment which is related to exploration and sparse reward

offline means offline RL environment

IL means Imitation Learning or Supervised Learning Dataset

selfplay means environment that allows agent VS agent battle

P.S. some enviroments in Atari, such as MontezumaRevenge, are also the sparse reward type.

General Data Container: TreeTensor

DI-engine utilizes TreeTensor as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:

  • TreeTensor can easily extend all the operations of torch.Tensor to nested data:

    (Click for Details)
    import treetensor.torch as ttorch
    
    
    # create random tensor
    data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
    # clone+detach tensor
    data_clone = data.clone().detach()
    # access tree structure like attribute
    a = data.a
    c = data.b.c
    # stack/cat/split
    stacked_data = ttorch.stack([data, data_clone], 0)
    cat_data = ttorch.cat([data, data_clone], 0)
    data, data_clone = ttorch.split(stacked_data, 1)
    # reshape
    data = data.unsqueeze(-1)
    data = data.squeeze(-1)
    flatten_data = data.view(-1)
    # indexing
    data_0 = data[0]
    data_1to2 = data[1:2]
    # execute math calculations
    data = data.sin()
    data.b.c.cos_().clamp_(-1, 1)
    data += data ** 2
    # backward
    data.requires_grad_(True)
    loss = data.arctan().mean()
    loss.backward()
    # print shape
    print(data.shape)
    # result
    # <Size 0x7fbd3346ddc0>
    # ├── 'a' --> torch.Size([1, 3, 2])
    # └── 'b' --> <Size 0x7fbd3346dd00>
    #     └── 'c' --> torch.Size([1, 3])
  • TreeTensor can make it simple yet effective to implement classic deep reinforcement learning pipeline

    (Click for Details)
    import torch
    import treetensor.torch as ttorch
    
    B = 4
    
    
    def get_item():
        return {
            'obs': {
                'scalar': torch.randn(12),
                'image': torch.randn(3, 32, 32),
            },
            'action': torch.randint(0, 10, size=(1,)),
            'reward': torch.rand(1),
            'done': False,
        }
    
    
    data = [get_item() for _ in range(B)]
    
    
    # execute `stack` op
    - def stack(data, dim):
    -     elem = data[0]
    -     if isinstance(elem, torch.Tensor):
    -         return torch.stack(data, dim)
    -     elif isinstance(elem, dict):
    -         return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
    -     elif isinstance(elem, bool):
    -         return torch.BoolTensor(data)
    -     else:
    -         raise TypeError("not support elem type: {}".format(type(elem)))
    - stacked_data = stack(data, dim=0)
    + data = [ttorch.tensor(d) for d in data]
    + stacked_data = ttorch.stack(data, dim=0)
    
    # validate
    - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
    - assert stacked_data['action'].shape == (B, 1)
    - assert stacked_data['reward'].shape == (B, 1)
    - assert stacked_data['done'].shape == (B,)
    - assert stacked_data['done'].dtype == torch.bool
    + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
    + assert stacked_data.action.shape == (B, 1)
    + assert stacked_data.reward.shape == (B, 1)
    + assert stacked_data.done.shape == (B,)
    + assert stacked_data.done.dtype == torch.bool

Feedback and Contribution

We appreciate all the feedbacks and contributions to improve DI-engine, both algorithms and system designs. And CONTRIBUTING.md offers some necessary information.

Supporters

↳ Stargazers

Stargazers repo roster for @opendilab/DI-engine

↳ Forkers

Forkers repo roster for @opendilab/DI-engine

Citation

@misc{ding,
    title={DI-engine: OpenDILab Decision Intelligence Engine},
    author={OpenDILab Contributors},
    publisher={GitHub},
    howpublished={\url{https://github.com/opendilab/DI-engine}},
    year={2021},
}

License

DI-engine released under the Apache 2.0 license.

di-engine's People

Contributors

altmand avatar cloud-pku avatar davide97l avatar eltociear avatar hansbug avatar hcnaeg avatar hiha3456 avatar jayyoung0802 avatar karroyan avatar kelichloe avatar kxzxvbk avatar lixl-st avatar luciusmos avatar nighood avatar paparazz1 avatar puyuan1996 avatar robinc94 avatar ruoyugao avatar sailxjx avatar simonat2011 avatar song2181 avatar super1ce avatar tutuhuss avatar weiyuhong-1998 avatar will-nie avatar yinminzhang avatar zerlinwang avatar zhangpaipai avatar zhziszz avatar zjowowen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

di-engine's Issues

Suggestion on unittest for ding/utils/plot.py

The unit test here can just check whether the file was produced successfully, but in actual, image files frequently have issues with generation (for example, some styles, element rendering failure, etc.). As a result, such a test does not function properly:

Here's an idea:

  1. Please test matplotlib image content using unit testing (see https://github.com/matplotlib/pytest-mpl).

  2. Use an image similarity unit test, such as https://github.com/Apkawa/pytest-image-diff.

  3. In addition, files will be generated under the project directory, which could have unintended consequences on the Git workspace and may be added to the repository in later git add commands. For unit tests, please use the mocked path, such as isolated_directory in hbutils.

gtrxl

How to run gtrxl with ppo policy? can someone provide an example?

Discussion channel for how to apply self-play to custom env?

Hi all,

Nice project. We want to start using it. After reading the doc and the config dizoo/competitive_rl/entry/cpong_dqn_default_config.py for league train, there are still something not clear to us. Do you have a channel that can discuss trivial questions frequently? Like a WeChat group or slack channel?

cc: [email protected]

ImportError: cannot import name 'SampleCollector' from 'ding.worker'

From the quick start:

from ding.config import compile_config
from ding.envs import BaseEnvManager, DingEnvWrapper
from ding.model import DQN
from ding.policy import DQNPolicy
from ding.worker import BaseLearner, SampleCollector, BaseSerialEvaluator, AdvancedReplayBuffer
from dizoo.classic_control.cartpole.config.cartpole_dqn_config import cartpole_dqn_config

# compile config
cfg = compile_config(
    cartpole_dqn_config,
    BaseEnvManager,
    DQNPolicy,
    BaseLearner,
    SampleCollector,
    BaseSerialEvaluator,
    AdvancedReplayBuffer,
    save_cfg=True
)

This raises

Traceback (most recent call last):
  File "r2d2/main.py", line 7, in <module>
    from ding.worker import BaseLearner, SampleCollector, BaseSerialEvaluator, AdvancedReplayBuffer
ImportError: cannot import name 'SampleCollector' from 'ding.worker' (/Users/ethanbrooks/Library/Caches/pypoetry/virtualenvs/r2d2-3EWJbHPG-py3.8/lib/python3.8/site-packages/ding/worker/__init__.py)

It appears that BaseSerialEvaluator is also not present in the library.

League Training for SlimeVolleyball Env

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

Overview

Add league training pipeline in slime_volleyball environment, and make better performance than self-play results (#23)
Related Discussion: #61

TODO

  • league pipeline
  • mutate from pre-trained model (vs bot)
  • policy behavior analysis

Visualize Training Progression

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable: N/A

RL training can be unstable and fall into local optimal solution easily. Visualization and monitoring metrics are therefore extremely important. Assume there are 3 roles in league training (MA, ME, LE). It would be better to visualize metrics for each of these roles over the training time.

A brave new interface

We will design a brave new interactive interface in Di-engine 1.0, including program api and cli commands, which will support most reinforcement learning scenarios, and the rest will be implemented by our elastic atomic components.

Here are some design guidelines for the new interfaces:

  1. It should be compatible with the existing configurations, and easy to convert from the old code to the new style.
  2. It should be semantic and easy to understand, anyone with little reinforment learning experience will benifit from the policy examples.
  3. It should be easy to extend to multi-threaded or distributed environments.

Any suggestions are welcome, please leave your comments in this channel.

Add slimevolleygym into dizoo

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable (N/A)

slimevolleygym is a pong-like physics game env from the open source community. It follows standard OpenAI gym interface. Naive PPO self-play achieves scores of -0.371 ± 1.085 in slimeVolley-v0 env against built-in AI report.
It would be good to benchmark opendilab's league training and see if it can generate higher results.

安装DI-engine的问题

  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
print(torch.__version__, sys.version, sys.platform)
1.11.0+cu113 3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)] win32

我已经安装了torch1.11.0+cu113 ,但是在使用pip install DI-engine时,却在下载不带cuda的torch-1.10.0-cp38-cp38-win_amd64.whl

这和手册里面不一致:
https://di-engine-docs.readthedocs.io/zh_CN/latest/01_quickstart/installation_zh.html

在安装好 CUDA 之后,当您在安装 DI-engine 的依赖项时,会自动获取和安装带有 Nvidia CUDA 加速的 PyTorch。

安装完成后,导致如下冲突:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.12.0+cu113 requires torch==1.11.0, but you have torch 1.10.0 which is incompatible.
torchaudio 0.11.0+cu113 requires torch==1.11.0, but you have torch 1.10.0 which is incompatible.
tianshou 0.4.9 requires gym>=0.23.1, but you have gym 0.20.0 which is incompatible.
nbconvert 6.5.0 requires jinja2>=3.0, but you have jinja2 2.11.3 which is incompatible.

卸载torch,升级gym,重新安装torchtorch1.11.0+cu113,导致如下冲突:

di-engine 0.4.0 requires gym==0.20.0, but you have gym 0.25.1 which is incompatible.
di-engine 0.4.0 requires torch<=1.10.0,>=1.1.0, but you have torch 1.11.0+cu113 which is incompatible.

di-engine对版本要求太固定了。

Initialization bug in RegressionHead

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

I have run the ddpg and td3 algorithms which use RegressionHead and check their initialized weights. However, head.main.1 seemed to haven't initialized properly.
image
image
It's only for head.main.1, head.main.0 is initialized properly.
image

task.parallel_ctx always stays the same in parallel mode.

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

I followed the code in the section "Distributed - Async and parallel" of doc to implement parallelism, but I found that task.parallel_ctx in both processes always remained as original.
And I can't find a function to change task.parallel_ctx in the files of di-engine /ding/framework/, so I can't judge what's wrong.
I want to know how the task.parallel_ctx in one process is synchronized with the ctx of another process. Thanks!

The default n_sample in SAC Policy

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

If I use n_episode for SAC policy, it raises error. 'AssertionError: n_episode/n_sample in policy cfg can't be not None at the same time'.
I find that there is a default config value 'n_sample=1' in SAC policy, thus if I define n_episode, two config keys exist at the same time.
I suggest delete the default config for n_sample.

sac_discrete用random collect policy的输出没有logit

File "/root/cityflow/my_cityflow/SAC/cityflow_sac_train.py", line 177, in serial_pipeline
random_collect(cfg.policy, policy, collector, collector_env, commander, replay_buffer)
File "/root/DI-engine/ding/entry/utils.py", line 40, in random_collect
new_data = collector.collect(n_sample=policy_cfg.random_collect_size, policy_kwargs=collect_kwargs)
File "/root/DI-engine/ding/worker/collector/sample_serial_collector.py", line 251, in collect
self._obs_pool[env_id], self._policy_output_pool[env_id], timestep
File "/root/DI-engine/ding/policy/sac.py", line 453, in _process_transition
'logit': model_output['logit'],
KeyError: 'logit'

Agent Demo List

This issue is a collection of various interesting agent demonstration trained by DI-engine, it will be updated continually.

  • Mario 1-1

    mario_trained.mp4
  • Mario 1-2

    mario_trained_1_2.mp4
  • rocket landing

    rocket_landing.mp4
  • SMAC 5m VS 6m

    5m6m.mp4
  • SMAC MMM

    mmm.mp4
  • SMAC MMM2

    mmm2.mp4
  • SMAC 3s5z

    3s5z.mp4
  • lunarlander

    lunarlander.mp4
  • gfootball

    • rule-based bot vs rule-based bot
    football_rvr.mp4
    • trained agent vs rule-based bot
    football_avr.mp4
  • slime_volley

    • rule-base bot vs trained agent

    slime_volley

League Evaluation Metric

Added this issue as suggested by @PaParaZz1.

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable: N/A

TrueSkill is a ranking metric developed by Microsoft for game matchmaking. Unlike ELO which just measures one agent's strength, TrueSkill can measure both strength and stability. Each player starts with mu=25.000 and sigma=8.333. Former one (mu) measures strength and the latter one (sigma) measures stability. After receiving payoffs of one matching, mu and sigma will be updated accordingly from the TrueSkill API. Final agent's score can be defined as mu - 3 * sigma to take both strength and stability into consideration.

Currently this metric is missing in the league demo. It would be better to add it.

Entropy scheduling

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable: (Not applicable)

In reinforcement learning there is well known explore-exploit dilemma. In league training it's crucial that we can have a better entropy coefficient scheduling because of the following reasons:
(1) If entropy of policy drops too fast to zero, it might get stuck in local optimum and failed to explore more states.
(2) If entropy of policy drops too slow, it might fail to select the right action at pivotal moments and the training is very slow.

One solution to address above problem is to have a good scheduling. Assume there are some validation measurements that we can use like win rate, we only decrease entropy coefficient when the win rate is on plateau.

It's similar to learning rate scheduler ReduceLROnPlateau in PyTorch link

May us know if there could be some documentations about how entropy scheduling can be supported?

When the onpolicy of PPO processes the continuous action space, an error occurs.

The error message is as follows。

Traceback (most recent call last):
File "/root/cityflow/my_cityflow/PPO_Continuous/cityflow_ppo_continuous_train.py", line 201, in
serial_pipeline_onpolicy([main_config, create_config], seed=0)
File "/root/cityflow/my_cityflow/PPO_Continuous/cityflow_ppo_continuous_train.py", line 193, in serial_pipeline_onpolicy
learner.train(new_data, collector.envstep)
File "/root/DI-engine/ding/worker/learner/base_learner.py", line 166, in wrapper
ret = fn(*args, **kwargs)
File "/root/DI-engine/ding/worker/learner/base_learner.py", line 203, in train
log_vars = self._policy.forward(data)
File "/root/DI-engine/ding/policy/ppo.py", line 214, in _forward_learn
ppo_loss, ppo_info = ppo_error_continuous(ppo_batch, self._clip_ratio)
File "/root/DI-engine/ding/rl_utils/ppo.py", line 181, in ppo_error_continuous
dist_new = Independent(Normal(mu_sigma_new['mu'], mu_sigma_new['sigma']), 1)
File "/opt/conda/lib/python3.6/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/opt/conda/lib/python3.6/site-packages/torch/distributions/distribution.py", line 56, in init
f"Expected parameter {param} "
ValueError: Expected parameter loc (Tensor of shape (64, 1)) of distribution Normal(loc: torch.Size([64, 1]), scale: torch.Size([64, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan]], grad_fn=)

The parameter configuration is as follows。

policy=dict(
    cuda=False,
    action_space='continuous',
    recompute_adv=True,
    model=dict(
        obs_shape=20,
        action_shape=1,
        action_space='continuous',
        share_encoder = True,
        encoder_hidden_size_list=[256,64],
        actor_head_hidden_size = 64,
        actor_head_layer_num = 1,
        critic_head_hidden_size = 64,
        critic_head_layer_num  = 1, 
        activation  = nn.ReLU(),
        norm_type  = None,
        sigma_type  = 'conditioned',
        fixed_sigma_value  = 0.3,
        bound_type  = 'tanh',
    ),
    learn=dict(
        multi_gpu=False,
        epoch_per_collect=5, 
        batch_size=64,
        learning_rate=3e-4,   
        value_weight=0.5,     
        entropy_weight=0.01,  
        clip_ratio=0.2,
        adv_norm=True,
        value_norm=True,
        ignore_done=False,
        grad_clip_type='clip_norm',
        grad_clip_value=0.5,
    ),
    collect=dict(
        n_sample=int(640),
        unroll_len=1,
        discount_factor=0.99,
        gae_lambda=0.95,
    ),

Bugs fix and new feature request for gfootball

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

There are many bugs in current vesion of DI-engine(V0.3.1) gfootball environment. I have tried to fix some of those, but some problems still exists which are beyond my ability. So I guess it needs systemic maintenance and updates. As far as the codes I have tested, only files in dizoo/gfootball/envs/tests works well(after some bug fix). And the fundamental features metioned in the doc(play with built-in AI & self-play) are basicly unusable.

Besides, since gfootball is an environment with great potential both in academy and practice. I strongly recommend following features being added:

  • Battle between customized models
  • Multi-agent support(5 vs 5, 11 vs 11)
  • League training support
  • Imitation learning algorithm enrich (paticularly GAIL, MAGAIL)

Thanks. I think DI-engine is an excelent potential framwork, hope it to be better.

How to get more info data in reward model?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • [+] documentation request
    • [+] new feature request
  • [+] I have visited the readme and doc
  • [+] I have searched through the issue tracker and pr tracker
  • [+] I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

image

Dear all,
I just try to customize a reward model by Di-engine, however I found we only can get those data(as input of collect_data funciton):

image

My question is how to get more data in reward model? such as the 'info' from env return.
Looking for replay and thank you.

Example of MAPPO

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:

Would it be possible to get an example of training an MAPPO in a sample MA environment?

Should put "log" and "ckpt_LEARNER_DATE_TIME" into same folder?

image

Now the checkpoints and log data are stored in two separate folders. Should we introduce a higher level folder, name EXPERIMENT_NAME or so to store all data of single experiment in it?

By the way, the name format of the checkpoint look quite weird to me, why there are two "_" between the date? I suggest to make it as "checkpoints_MODELNAME" and this is enough! It is not reasonable to write the "created time" in the folder name, since the folder contains lots of checkpoints that are created at different time.

How to create customized model (pointer network)

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

Hi there, I am new to DI-engine.
I am trying to implement the pointer network for my own environment.
The most relevant resource I can find is the docs about the RNN here. It seems that I can treat the pointer network as a kind of RNN and wrap each decoding output as hidden_state . But the encoder (also an LSTM) output is also used in every decoding step. Can I wrap it as another hidden_state ?
I noticed from slack that a similar architecture had been implemented in DI-star.
Can you give me directions on how to make it work?
Also, I am not sure which part of the codes I should modify. It will be good if you can point me to the docs/ tutorial on customizing models.

Comparison of training efficiency between asynchronous mode and distributed mode based on Gobigger Env

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)
>>> print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.2.2 1.10.0+cu102 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
[GCC 7.3.0] linux

按照DI-engine文档中关于Task与Parallel的用法,将Gogbigger训练di-baseline改为并行及异步形式。
训练测试结果显示,使用Parallel比只使用Task更慢,这可能是什么原因?

    with Task(async_mode=True) as task:
        task.use_step_wrapper(StepTimer(print_per_step=1))
        task.use(evalute(random_evaluator, rule_evaluator, model, task), filter_labels=["standalone", "node.1"])
        task.use(collect(epsilon_greedy, collector, replay_buffer), filter_labels=["standalone", "node.0"])
        task.use(training(cfg, learner, replay_buffer, task, model), filter_labels=["standalone", "node.0"])
        task.run(max_step=max_iterations)

Invalid output of `ding --help`

Currently when I use ding --help, it prints out the following:

usage: ding [-h] [--cfg CFG] [--seed SEED] [--device DEVICE]

optional arguments:
  -h, --help       show this help message and exit
  --cfg CFG
  --seed SEED
  --device DEVICE

which is not the expected help information.

I searched the code and found that this behaviour is caused by This line of code, please try to fix this.

[Error] AttributeError: 'InteractionSerialEvaluator' object has no attribute '_end_flag'

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

v0.3.1 1.8.1+cpu 3.8.12 (default, Oct 12 2021, 03:01:40) [MSC v.1916 64 bit (AMD64)] win32

When running the basic example: python3 -u dizoo/classic_control/cartpole/entry/cartpole_dqn_main.py
It shows the following error.

Traceback (most recent call last):
  File "C:/ProgramData/Anaconda3/envs/PYTORCH/Lib/site-packages/dizoo/classic_control/cartpole/entry/cartpole_dqn_main.py", line 91, in <module>
    main(cartpole_dqn_config)
  File "C:/ProgramData/Anaconda3/envs/PYTORCH/Lib/site-packages/dizoo/classic_control/cartpole/entry/cartpole_dqn_main.py", line 84, in main
    evaluator = InteractionSerialEvaluator(
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 56, in __init__
    self.reset(policy, env)
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 112, in reset
    self.reset_env(_env)
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 76, in reset_env
    self._env.launch()
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 199, in launch
[2022-05-22 16:15:02] ERROR    Env 0 reset has exceeded max retries(1)                                                                             base_env_manager.py:274
    self.reset(reset_param)
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 242, in reset
    self._reset(env_id)
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 281, in _reset
    raise runtime_error
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 259, in _reset
    obs = reset_fn()
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 251, in reset_fn
    return self._envs[env_id].reset(**self._reset_param[env_id])
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env\ding_env_wrapper.py", line 68, in reset
    obs = self._env.reset()
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\record_video.py", line 58, in reset
    self.start_video_recorder()
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\record_video.py", line 75, in start_video_recorder
    self.video_recorder.capture_frame()
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 155, in capture_frame
    self._encode_image_frame(frame)
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 213, in _encode_image_frame
    self.encoder = ImageEncoder(
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 337, in __init__
    raise error.DependencyNotInstalled(
RuntimeError: Env 0 reset has exceeded max retries(1), and the latest exception is: DependencyNotInstalled("Found neither the ffmpeg nor avconv executables. On OS X, you can install ffmpeg via `brew install ffmpeg`. On most Ubuntu variants, `sudo apt-get install ffmpeg` should do it. On Ubuntu 14.04, however, you'll need to install avconv with `sudo apt-get install libav-tools`.")
Exception ignored in: <function InteractionSerialEvaluator.__del__ at 0x0000017CA3252280>
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 138, in __del__
  File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 125, in close
AttributeError: 'InteractionSerialEvaluator' object has no attribute '_end_flag'

Process finished with exit code 1

Test trigger

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

CPU utilization problem

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
  # ding version `v0.2.0`, linux platform

Issue Description

CPU utilization is not 100% and very low. (below 5% on average)

Steps to Reproduce

clone the repo and git checkout main. (currently on 0fcfdf26). Run python3 dizoo/slime_volley/entry/slime_volley_selfplay_ppo_main.py. Open htop to check CPU usage. Only one core is occupied on a multi-core machine.

What Do We Need?

During training, run command mpstat 3. The column of %idle is less than 20% (Current value is 97%)

PettingZoo for SMAC

As a follow up to #153, you guys don't need to separately support the SMAC API; you can just use the PettingZoo API since SMAC supports it and it's fairly heavily used.

Import error for TREX

Hello, I'm a beginner in IRL and I want to reproduce the results of TREX algorithm by running "dizoo/mujoco/entry/mujoco_trex_main.py" in the repo. But there is an ImportError: cannot import name 'serial_pipeline_trex_onpolicy' from 'ding.entry'. I think it is because there is no corresponding file named "serial_entry_trex.py" which contains functions 'serial_pipeline_trex_onpolicy' and 'serial_pipeline_trex'.
Can someone help me to solve that, thank you.

v2版本中,可以实现不同worker上的collector将收集到的信息传送到learner所在的worker上吗

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

Multi GPU training problem

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

My DING's version is f1bf66. My pytorch version is 1.7.1+cu101. My system is Linux 3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0].
I follow the docs' guidance to enable multi-gpu training. I add a config term config.policy.learn.multi_gpu=True in demo/simple_rl/ppo_train.py. But I get the following exception:

WARNING:root:If you want to use numba to speed up segment tree, please install numba first
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
[ENV] Setting seed: 0
Traceback (most recent call last):
File "imgppo_train.py", line 189, in
main(main_config)
File "imgppo_train.py", line 158, in main
policy = PPOPolicy(cfg.policy, model=model)
File "/home/qhzhang/code/DI-engine/ding/policy/base_policy.py", line 81, in init
self._init_multi_gpu_setting(model)
File "/home/qhzhang/code/DI-engine/ding/policy/base_policy.py", line 101, in _init_multi_gpu_setting
broadcast(param.data, 0)
File "/home/qhzhang/anaconda3/envs/didrive/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 859, in broadcast
_check_default_pg()
File "/home/qhzhang/anaconda3/envs/didrive/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized

May be you should add readme on pypi site?

What do I find?

Look at here: https://pypi.org/project/DI-engine/

It says

The author of this package has not provided a project description

Why this is not okay?

If I open the pypi page, I will be confused that what is it? 😕

How to solve this problem

These information should be configured in setup.py. So just take a look at the implement in treevalue.

After that, the content in README will be visible on pypi site, like the treevalue. (Some links are down, I'm fixing 😸 )

How to use ObsNormEnv?

I would like to ask about the details of the use of state standardization.

State is a one-dimensional vector composed of three eigenvectors, where the first eigenvector has a value range of about 0-40, for example: [2, 1, 8, 12, 12, 4, 1, 2]. The range of the second eigenvector is approximately 0-11, for example: [2.3, 1.4, 0.2, 0.9, 8.4, 7.1, 8.3, 9.4]. The third eigenvector is the one-hot vector. Example: [0,0,1,0,0,0,0,0].

In this case, can I use ObsNormEnv directly? I don't think so.

So I would like to ask your advice, thank you very much.

gfootball_ppo_parallel_config.py does not work

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)
    # v0.2.3 1.8.1 3.9.12 (main, Mar 26 2022, 15:51:15) 
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
Traceback (most recent call last):
  File "/Users/zzhaoao/Documents/RL/New/DI-engine/dizoo/gfootball/entry/parallel/gfootball_ppo_parallel_config.py", line 102, in <module>
    parallel_pipeline(config, seed=0)
  File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/entry/parallel_entry.py", line 52, in parallel_pipeline
    launch_coordinator(config.seed, config, learner_handle=learner_handle, collector_handle=collector_handle)
  File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/entry/parallel_entry.py", line 125, in launch_coordinator
    coordinator = Coordinator(config)
  File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/worker/coordinator/coordinator.py", line 61, in __init__
    self._exp_name = cfg.main.exp_name
AttributeError: 'EasyDict' object has no attribute 'exp_name'
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Exception ignored in: <function Coordinator.__del__ at 0x14d374790>
Traceback (most recent call last):
  File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/worker/coordinator/coordinator.py", line 289, in __del__
    self.close()
  File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/worker/coordinator/coordinator.py", line 268, in close
    if self._end_flag:
AttributeError: 'Coordinator' object has no attribute '_end_flag'

The default random_collect_size is not compatible with episode collector

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

When I used a episode collector together with SAC policy. It raised the following exception.

Traceback (most recent call last):
  File "/home/tianhan/codes/astraea/src/train/astraea_episode_ma_sac_config.py", line 96, in <module>
    serial_pipeline([main_config, create_config], seed = 9)
  File "/home/tianhan/codes/astraea/third_party/DI-engine/ding/entry/serial_entry.py", line 91, in serial_pipeline
    new_data = collector.collect(n_sample=cfg.policy.random_collect_size, policy_kwargs=collect_kwargs)
TypeError: collect() got an unexpected keyword argument 'n_sample'

I think the reason is that SAC policy has a default random_collect_size, which is not compatible with episode collector (e.g. EpisodeSerialCollector, which requires n_episode as an argument instead of n_sample).

Recommended change on sac as well as how to inherit a policy

  1. Change how we transform a distribution. For example, https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py#L816-L822, can be changed to
dist = TransformedDistribution(Independent(Normal(mu, sigma), 1), [TanhTransform()]) 
next_action = dist.rsample()
next_log_prob = dist.log_prob()

This is much easier and more importantly, more numerically stable.
2. I also recommend the practice in mbsac when creating variants of a policy. A lot of copies in configs and __init__learn are not necessary (for example, in SQILSACPolicy)
3. There are two sac papers sac-v1 and sac-v2 from the same search group, I think we should include links to both papers since it is sac-v2 that proposes automatic entropy adjustment.
4. Maybe we can delete value_network instead of hardcoding value_network=False, which is not commonly used and not even used in sac-v2.
5. Maybe we can create a subdirectory for sac instead of squeezing every variant of sac in a single file. Since sac is a very good commonly-used baseline. If a subdirectory is not necessary, at least we should move SACPolicy to the top instead of SACDiscretePolicy, which is not commonly used.

PPO Policy Bug in Parallel Mode

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

The value_norm is used in _get_train_sample in PPO Policy, which is used in the _process_timestep function in collector. However, in parallel mode, the collector doesn't have value_norm which is only initialized in _init_learn. Thus, raise the exception "AttributeError: 'PPOCommandModePolicy' object has no attribute '_value_norm".

r2d2 atari

Hello there, I'm sort of a newbie here. I am trying to reproduce some of the atari games with R2D2, and I am unable to produce them. I've been blocked on this for quite some time and it would be a great help if anyone can help me here.
Thank you.

when runing cartpole_ppo_rnd_main.py, some bug is coming.

hi, when runing cartpole_ppo_rnd_main.py, some bug is coming. I want to know the reason and the corresponding solution. the bug is below. looking forward your answer.

Traceback (most recent call last):
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/dizoo/classic_control/cartpole/entry/cartpole_ppo_rnd_main.py", line 70, in
main(cartpole_ppo_rnd_config)
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/dizoo/classic_control/cartpole/entry/cartpole_ppo_rnd_main.py", line 60, in main
reward_model.train()
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/ding/reward_model/rnd_reward_model.py", line 97, in train
self._train()
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/ding/reward_model/rnd_reward_model.py", line 81, in _train
if self.cfg.obs_norm:
AttributeError: 'EasyDict' object has no attribute 'obs_norm'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.