maximilianb2 / pc-gym Goto Github PK

View Code? Open in Web Editor NEW

18.0 2.0 1.0 564.71 MB

Reinforcement learning environments for process control applications.

Home Page: https://maximilianb2.github.io/pc-gym/

License: MIT License

Python 100.00%

chemical-engineering control reinforcement-learning-environments

pc-gym's Introduction

Reinforcement learning environments for process control

Quick start ⚡

Setup a CSTR environment with a setpoint change

import pcgym

# Simulation variables
nsteps = 100
T = 25

# Setpoint
SP = {'Ca': [0.85 for i in range(int(nsteps/2))] + [0.9 for i in range(int(nsteps/2))]} 

# Action and observation Space
action_space = {'low': np.array([295]), 'high': np.array([302])}
observation_space = {'low': np.array([0.7,300,0.8]),'high': np.array([1,350,0.9])}

# Construct the environment parameter dictionary
env_params = {
    'N': nsteps, # Number of time steps
    'tsim':T, # Simulation Time
    'SP' :SP, 
    'o_space' : observation_space, 
    'a_space' : action_space, 
    'x0': np.array([0.8, 330, 0.8]), # Initial conditions [Ca, T, Ca_SP]
    'model': 'cstr_ode', # Select the model
}

# Create environment
env = pcgym.make_env(env_params)

# Reset the environment
obs, state = env.reset()

# Sample a random action
action = env.action_space.sample()

# Perform a step in the environment
obs, rew, done, term, info = env.step(action)

Documentation

You can read the full documentation here!

Installation ⏳

The latest pc-gym version can be installed from PyPI:

pip install pcgym

Examples

Example notebooks with training walkthroughs, implementing constraints, disturbances and the policy evaluation tool can be found here.

Implemented Process Control Environments 🎛️

Environment	Reference	Source
CSTR	Hedengren, 2022	Source
First Order Sytem	N/A	Source
Multistage Extraction Column	Ingham et al, 2007 (pg 471)	Source
Nonsmooth Control	Lim,1969	Source

Citing `pc-gym`

If you use pc-gym in your research, please cite using the following

@software{pcgym2024,
  author = {Max Bloor and  Jose Neto and Ilya Sandoval and Max Mowbray and Akhil Ahmed and Mehmet Mercangoz and Calvin Tsay and Antonio Del Rio-Chanona},
  title = {{pc-gym}: Reinforcement Learning Environments for Process Control},
  url = {https://github.com/MaximilianB2/pc-gym},
  version = {0.1.6},
  year = {2024},
}

Other Great Gyms 🔍

pc-gym's People

Contributors

Stargazers

Watchers

Forkers

trsav

pc-gym's Issues

Citation Typo

The citation has an extra and after Max's name.

Add more unique plots and optimality gap information

Our MPC oracle setting allows us to provide more information about the performance of RL policies:

Optimality gaps
- Overall gap in reward
- Gap in value function per state.
- Gap in Q function per state-action pair.
Identify local optima by comparing control trajectories of MPC oracle and RL policy.
State and action distributions of trained policies (example).

Handle the parameters of models with classes

Here is a proposal for defining a model whose parameters are set at initialisation and which can then be called with the expected signatures for other methods.

from diffrax import diffeqsolve, ODETerm, Dopri5
import jax.numpy as jnp

def f(t, y, args):
    return -y

term = ODETerm(f)
solver = Dopri5()
y0 = jnp.array([2., 3.])
solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0)

# Dataclass version

from dataclasses import dataclass

# frozen: makes the objets immutable after creation
# so parameters can not be modified at runtime
# it also makes the class hashable, as required by Equinox:
# ValueError: Non-hashable static arguments are not supported.

# kw_only: require the parameter names if they want
# to be set when the object is created

@dataclass(frozen=True, kw_only=True)
class Model:
  a:float = 1.0
  def __call__(self, t, y, args):
    return -self.a*y

m = Model(a=2.0)
sol = diffeqsolve(ODETerm(m), solver, t0=0, t1=1, dt0=0.1, y0=y0)

# can also pass the complete or partial parameters from a dict
# params = {"a": 2.0}
# m = Model(**params)

# no performance difference
# jax.jit seems to have no effect

# term = ODETerm(f)
# term = ODETerm(Model())
# term = ODETerm(jax.jit(Model()))
# %timeit sol = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0)

Before the first internal tests

Cusomisation Documentation
- Params
- Model
- Constraints
Model description inc. hard to operate params/setpoints
Example Notebooks
Constraint violation plots
Reproducibility Metric
Multi Timescale model
Jose pipeline model

Feature Ideas

Policy evaluation
- Learning curve plot
- cross-validation
- Plot custom constraints
Customisation
- Reward function
- Update MPC to use the control/Custom constraints as currently only does state
Oracle
- IMC Tuned FB controller (i.e. if MPC fails to converge this could be
  used as a backup?)
- Option to allow/disallow disturbance and setpoint foresight
Other
- Ability to specify observable states
- Leaderboard / Hackathon
- compatibility with jax parallelisation/vectorization

Done

Policy evaluation tool
- ~~Oracle~~ MPC with perfect model?
- ~~Return distribution~~
- Reproducibility Metric
- ~~Real plot axis naming~~
Customisation
- ~~Model parameters~~
- ~~Model Dynamics~~
- ~~Constraint Functions~~
Model Reformulation as Python classes
- ~~Allow disturbances for JAX models~~
- ~~Expose model details (i.e m.info returns variable names for states, controls etc.)~~
- ~~Change SP, Constraints, and disturbances to use variable names instead of '0', '1' etc.~~
- ~~Allow for non-sequential definition of disturbances/constraints~~
- ~~First Order system and Multistage extraction reformulation~~

Setup formatter and coding style before release

Also add a badge, of course.

One trendy option: ruff

Installation as a package

Here are the steps to follow if we want the library to be a pypi package.
It also has a nice guide on how to setup the code structure.
https://realpython.com/pypi-publish-python-package/#publish-your-package-to-pypi

IMHO this is nice to have but not really necessary, certainly not for the internal testing.

Problem tracks for demo day

Potentially prepare 2 problem tracks for the audience

For PSE backgrounds: a walkthrough/playground to setup a model as an RL problem and how to optimize it.
For CS backgrounds: a challenge to optimize an RL policy to achieve the same level of performance as an oracle.

These could be based on the same model to reduce the workload.

Organised collaboration workflow

Hey guys, I noticed there has been many commits that do not have a very clear purpose recently.
This makes it hard to understand the state of the code and the things that need to be made.
I took the liberty to move these to a specific branch and revert back to organise better and plan ahead, hope you do not mind!

It would be good to move on from here with a more organised workflow so that we can collaborate better and the repo ends up in an attractive state when it gets released.

IMHO the GitHub Flow branching strategy is the best suited for research development, followed by GitLab Flow for maintenance after the library is released (this one might be overkill for a small library though).

Let me know your thoughts! 😄 This would imply that we work on branches different to main and only contribute back to it through pull requests that we can all understand.

Add setup instructions for challenge or training example

Make sure instructions are clear for anyone to install package + dependencies in a fresh environment and also build local docs.

Provide options in venv, conda/mamba and pixi #6 (comment).

Jax and PyTorch tend to be the problematic ones.