Specification-based library for automatic reward shaping.

⚠️ Disclaimer

This repository aims to provide a unified library for automatic reward shaping and reimplements the methods described in the table below. However, it is not meant to validate the results of the original papers.

If you are a reproducibility reviewer for HPRS, please refer to the original codebase.

Methods

Method	Signal Soundness	Dense Signal	Multi-Objective	Objective Prioritization	Status
TLTL^[1]	✔️	❌	❌	❌	✔️
BHNR^[2]	❌	✔️	❌	❌	✔️
HPRS^[4]	✔️	✔️	✔️	✔️	✔️
PAM^[4]	✔️	❌	✔️	✔️	✔️
Rank-Preserving Reward^[5]	✔️	❌	✔️	✔️	✔️

✔️ Supported

❌ Not supported

👷 Work in progress

Specification Language

The task specification consists of a set of requirements, as in [4]. The requirement syntax is as follows:

formula ::= f(state) ~ 0
requirement ::= ensure <formula> | achieve <formula> | conquer <formula> | encourage <formula>

where f is a function of the state dictionary state and ~ is a comparison operator in <, <=, >, >=.

Examples

To run the examples, ensure to install the extra requirements:

pip install -r examples/requirements.txt

Then, you can train an agent with stable-baselines3 and auto-shaping

using default specifications from the configuration file in configs/
using a custom specification by passing it as an argument
benchmarking the agent with multiple reward shaping

Citation

If you use this code in your research, please cite the following paper:

@misc{berducci2022hierarchical,
    title={Hierarchical Potential-based Reward Shaping from Task Specifications}, 
    author={Luigi Berducci and Edgar A. Aguilar and Dejan Ničković and Radu Grosu},
    year={2022},
    eprint={2110.02792},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

References

[1] "Reinforcement learning with temporal logic rewards." Li, et al. IROS 2017.

[2] "Structured reward shaping using signal temporal logic specifications." Balakrishnan, et al. IROS 2019.

[3] "Multi-objectivization of reinforcement learning problems by reward shaping." Brys, et al. IJCNN 2014.

[4] "Hierarchical Potential-based Reward Shaping." Berducci, et al. Under Review.

[5] "Receding Horizon Planning with Rule Hierarchies for Autonomous Vehicles." Veer, et al. ICRA 2023.

luigiberducci / auto-shaping Goto Github PK

auto-shaping's Introduction

Specification-based library for automatic reward shaping.

⚠️ Disclaimer

If you are a reproducibility reviewer for HPRS, please refer to the original codebase.

Methods

Specification Language

Examples

Citation

References

auto-shaping's People

Contributors

Watchers

auto-shaping's Issues

Recommend Projects

Recommend Topics

Recommend Org