Giter Club home page Giter Club logo

pc-gym's Introduction


Reinforcement learning environments for process control

Quick start ⚡

Setup a CSTR environment with a setpoint change

import pcgym

# Simulation variables
nsteps = 100
T = 25

# Setpoint
SP = {'Ca': [0.85 for i in range(int(nsteps/2))] + [0.9 for i in range(int(nsteps/2))]} 

# Action and observation Space
action_space = {'low': np.array([295]), 'high': np.array([302])}
observation_space = {'low': np.array([0.7,300,0.8]),'high': np.array([1,350,0.9])}

# Construct the environment parameter dictionary
env_params = {
    'N': nsteps, # Number of time steps
    'tsim':T, # Simulation Time
    'SP' :SP, 
    'o_space' : observation_space, 
    'a_space' : action_space, 
    'x0': np.array([0.8, 330, 0.8]), # Initial conditions [Ca, T, Ca_SP]
    'model': 'cstr_ode', # Select the model
}

# Create environment
env = pcgym.make_env(env_params)

# Reset the environment
obs, state = env.reset()

# Sample a random action
action = env.action_space.sample()

# Perform a step in the environment
obs, rew, done, term, info = env.step(action)

Documentation

You can read the full documentation here!

Installation ⏳

The latest pc-gym version can be installed from PyPI:

pip install pcgym

Examples

Example notebooks with training walkthroughs, implementing constraints, disturbances and the policy evaluation tool can be found here.

Implemented Process Control Environments 🎛️

Environment Reference Source Documentation
CSTR Hedengren, 2022 Source
First Order Sytem N/A Source
Multistage Extraction Column Ingham et al, 2007 (pg 471) Source
Nonsmooth Control Lim,1969 Source

Citing pc-gym

If you use pc-gym in your research, please cite using the following

@software{pcgym2024,
  author = {Max Bloor and  Jose Neto and Ilya Sandoval and Max Mowbray and Akhil Ahmed and Mehmet Mercangoz and Calvin Tsay and Antonio Del Rio-Chanona},
  title = {{pc-gym}: Reinforcement Learning Environments for Process Control},
  url = {https://github.com/MaximilianB2/pc-gym},
  version = {0.1.6},
  year = {2024},
}

Other Great Gyms 🔍

pc-gym's People

Contributors

ilyaorson avatar josetorraca avatar mawbray avatar maximilianb2 avatar trsav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

trsav

pc-gym's Issues

Add more unique plots and optimality gap information

Our MPC oracle setting allows us to provide more information about the performance of RL policies:

  • Optimality gaps
    • Overall gap in reward
    • Gap in value function per state.
    • Gap in Q function per state-action pair.
  • Identify local optima by comparing control trajectories of MPC oracle and RL policy.
  • State and action distributions of trained policies (example).

Handle the parameters of models with classes

Here is a proposal for defining a model whose parameters are set at initialisation and which can then be called with the expected signatures for other methods.

from diffrax import diffeqsolve, ODETerm, Dopri5
import jax.numpy as jnp

def f(t, y, args):
    return -y

term = ODETerm(f)
solver = Dopri5()
y0 = jnp.array([2., 3.])
solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0)

# Dataclass version

from dataclasses import dataclass

# frozen: makes the objets immutable after creation
# so parameters can not be modified at runtime
# it also makes the class hashable, as required by Equinox:
# ValueError: Non-hashable static arguments are not supported.

# kw_only: require the parameter names if they want
# to be set when the object is created

@dataclass(frozen=True, kw_only=True)
class Model:
  a:float = 1.0
  def __call__(self, t, y, args):
    return -self.a*y

m = Model(a=2.0)
sol = diffeqsolve(ODETerm(m), solver, t0=0, t1=1, dt0=0.1, y0=y0)

# can also pass the complete or partial parameters from a dict
# params = {"a": 2.0}
# m = Model(**params)

# no performance difference
# jax.jit seems to have no effect

# term = ODETerm(f)
# term = ODETerm(Model())
# term = ODETerm(jax.jit(Model()))
# %timeit sol = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0)

Feature timeline

Before the first internal tests

  • Cusomisation Documentation
    • Params
    • Model
    • Constraints
  • Model description inc. hard to operate params/setpoints
  • Example Notebooks
  • Constraint violation plots
  • Reproducibility Metric
  • Multi Timescale model
  • Jose pipeline model

Feature Ideas

  • Policy evaluation
    • Learning curve plot
    • cross-validation
    • Plot custom constraints
  • Customisation
    • Reward function
    • Update MPC to use the control/Custom constraints as currently only does state
  • Oracle
    • IMC Tuned FB controller (i.e. if MPC fails to converge this could be
      used as a backup?)
    • Option to allow/disallow disturbance and setpoint foresight
  • Other
    • Ability to specify observable states
    • Leaderboard / Hackathon
    • compatibility with jax parallelisation/vectorization

Done

  • Policy evaluation tool

    • Oracle MPC with perfect model?
    • Return distribution
    • Reproducibility Metric
    • Real plot axis naming
  • Customisation

    • Model parameters
    • Model Dynamics
    • Constraint Functions
  • Model Reformulation as Python classes

    • Allow disturbances for JAX models
    • Expose model details (i.e m.info returns variable names for states, controls etc.)
    • Change SP, Constraints, and disturbances to use variable names instead of '0', '1' etc.
    • Allow for non-sequential definition of disturbances/constraints
    • First Order system and Multistage extraction reformulation

Problem tracks for demo day

Potentially prepare 2 problem tracks for the audience

  • For PSE backgrounds: a walkthrough/playground to setup a model as an RL problem and how to optimize it.
  • For CS backgrounds: a challenge to optimize an RL policy to achieve the same level of performance as an oracle.

These could be based on the same model to reduce the workload.

Organised collaboration workflow

Hey guys, I noticed there has been many commits that do not have a very clear purpose recently.
This makes it hard to understand the state of the code and the things that need to be made.
I took the liberty to move these to a specific branch and revert back to organise better and plan ahead, hope you do not mind!

It would be good to move on from here with a more organised workflow so that we can collaborate better and the repo ends up in an attractive state when it gets released.

IMHO the GitHub Flow branching strategy is the best suited for research development, followed by GitLab Flow for maintenance after the library is released (this one might be overkill for a small library though).

Let me know your thoughts! 😄 This would imply that we work on branches different to main and only contribute back to it through pull requests that we can all understand.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.