Giter Club home page Giter Club logo

deepbeerinventory-rl's Introduction

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization

The code of the paper A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization is presented at this repository. The code works with Python2.7 and Python3.4-Python3.7. For more information see the list of the requirments (You can install them pip install -r requirements.txt). The main.py is the file to call to start the training. BGAgent.py provides the beer-game agent which involves all the properties and functionality of an agent. clBeergame.py instanciates the agents and runs the beer-game simulation. Also, once the number of observations in the replay buffer filled by the minimum requirement, it calls the train-step of the SRDQN algorithm. The DNN approximator and SRDQN algorithm are implemented in SRDQN.py. config.py introduce all arguments and their default values, as well as some functions to properly build the simulation scenarios for different instances of the game. In the following the procedure to run the training and setting different values for the arguments is described.

###Play beer-game and compare your result with AI! You can play the beer-game and compare your result on the same game with the result that our RL algorithm achieves. See https://beergame.opexanalytics.com/

Note that this code does not work with TensorFlow 2+.

Some Notations

Each agent can use either of the srdqn, bs, Ster, or Rnd algorithms to decide about the action (order quantity). So, there are 256 combination of agent-types from which we consider 23 cases in this study. To determine each of these cases, we have used config.gameConfig to select one of pre-defined type of four agents in the game. For example, config.gameConfig=3, sets config.agentTypes = ["srdqn", "bs","bs","bs"], in which the retailer follows the srdqn algorithm and the rest of agents use the base-stock policy to decide for the order quantity. The main gameConfig are as below:

Base-stock co-players

if config.gameConfig == 3: 
	config.agentTypes = ["srdqn", "bs","bs","bs"]
if config.gameConfig == 4: 
	config.agentTypes = ["bs", "srdqn","bs","bs"]
if config.gameConfig == 5: 
	config.agentTypes = ["bs", "bs","srdqn","bs"]
if config.gameConfig == 6: 
	config.agentTypes = ["bs", "bs","bs","srdqn"]

Sterman co-players

if config.gameConfig == 7: 
	config.agentTypes = ["srdqn", "Strm","Strm","Strm"]
if config.gameConfig == 8: 
	config.agentTypes = ["Strm", "srdqn","Strm","Strm"]
if config.gameConfig == 9: 
	config.agentTypes = ["Strm", "Strm","srdqn","Strm"]
if config.gameConfig == 10: 
	config.agentTypes = ["Strm", "Strm","Strm","srdqn"]

Random co-players

if config.gameConfig == 11: 
	config.agentTypes = ["srdqn", "rnd","rnd","rnd"]
if config.gameConfig == 12: 
	config.agentTypes = ["rnd", "srdqn","rnd","rnd"]
if config.gameConfig == 13: 
	config.agentTypes = ["rnd", "rnd","srdqn","rnd"]
if config.gameConfig == 14: 
	config.agentTypes = ["rnd", "rnd","rnd","srdqn"]

The full list of all gameConfig is defined in setAgentType() function in config.py.

Since the d+x rule is used to train the SRDQN model, we use the upper and lower limit for x. config.actionLow and config.actionUp are used to set these values.

In addition, for each agent one can determine the lead time for receving order as well as receving the shimpement via config.leadRecItem1, config.leadRecItem2, config.leadRecItem3, config.leadRecItem4 and config.leadRecOrder1, config.leadRecOrder2, config.leadRecOrder3, config.leadRecOrder4 for four agents. Similarly, the initial inventory level, initial arriving order, and initial arriving shipment can be set by config.ILInit1, config.ILInit2, config.ILInit3, config.ILInit4, config.AOInit1, config.AOInit2, config.AOInit3, config.AOInit4, config.ASInit1, config.ASInit2, config.ASInit3, config.ASInit4, respectively for the four agents.

config.maxEpisodesTrain determines the number of episodes to train the srdqn agent.

TO run the baseStock policy (bs), you need to set the value of the base-stock level for each agent by config.f1, config.f2, config.f3, config.f4. We obtained those values by running the Clark-Scarf algorithm for each instance.

unzip the data

data.zip includes all the required dataset to train the model on basic case, literature cases, basket dataset, and forecasting dataset. Unzipping this file creates data directory, in which there is a python file (createDemand.py) as well as the mentioned datasets. createDemand.py can be used to create datasets of any size for the literature cases.

Train the basic model

The basic model used the Uniform distribution U[0,2] with action space of {-2, -1, 0, 1, 2}. All the default values are set to run this experiment for the case that srdqn plays the retailer and other agents follow base-stock policy. For any other case the training can be started by setting the corresponding arguments. For example, to train a srdqn Warehouse with the initial inventory of 10 units which plays with Sterman co-players, the following line can be used to run the training for 50000 episodes:

python main.py --gameConfig=8 --maxEpisodesTrain=50000 config.ILInit2=10 --batchSize=128

Train the literature cases

To train each of the literature cases, first you need to set config.demandDistribution, actionUp, and actionLow, as well as the other parameter for the agents as following:

For U[0,8]:

python main.py --demandDistribution=0 --demandUp=9  --actionUp=8  --actionLow=-8 --ch1=0.5 --ch2=0.5 --ch3=0.5 --ch4=0.5 --cp1=1.0 --cp2=1.0 --cp3=1.0 --cp4=1.0 --f1=19.0 --f2=20.0 --f3=20.0 --f4=14.0  --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --ILInit1=12 --ILInit2=12 --ILInit3=12 --ILInit4=12 --AOInit1=4 --AOInit2=4 --AOInit3=4 --AOInit4=4 --ASInit1=4 --ASInit2=4 --ASInit3=4 --ASInit4=4 --gameConfig=6 

For N(10,2):

python main.py --demandDistribution=1 --demandMu=10  --demandSigma=2 --actionUp=5  --actionLow=-5 --ch1=1 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0 --cp3=0 --cp4=0 --f1=48.0 --f2=43.0 --f3=41.0 --f4=30.0 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --ILInit1=10 --ILInit2=10 --ILInit3=10 --ILInit4=10 --AOInit1=10 --AOInit2=10 --AOInit3=10 --AOInit4=10 --ASInit1=10 --ASInit2=10 --ASInit3=10 --ASInit4=10 --gameConfig=6

For C(4,8):

python main.py --demandDistribution=2 --actionUp=8  --actionLow=-8 --ch1=0.5 --ch2=0.5 --ch3=0.5 --ch4=0.5 --cp1=1.0 --cp2=1.0 --cp3=1.0 --cp4=1.0 --demandUp=9 --f1=32.0 --f2=32.0 --f3=32.0 --f4=24.0 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --ILInit1=12 --ILInit2=12 --ILInit3=12 --ILInit4=12 --AOInit1=4 --AOInit2=4 --AOInit3=4 --AOInit4=4 --ASInit1=4 --ASInit2=4 --ASInit3=4 --ASInit4=4 --gameConfig=6

Train the basket dataset

For the basket dataset you need to set config.demandDistribution=3, and then config.data_id can be either 6, 13, or 22. For training with the scaled dataset, which is reported in the paper, config.scaled=True is required too. See the following commands for three cases:

python main.py --demandDistribution=3 --data_id=6 --demandMu=3 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=19.0 --f2=12.0 --f3=12.0 --f4=8.0 --ILInit1=3 --ILInit2=3 --ILInit3=3 --ILInit4=3 --AOInit1=3 --AOInit2=3 --AOInit3=3 --AOInit4=3 --ASInit1=3 --ASInit2=3 --ASInit3=3 --ASInit4=3

python main.py --demandDistribution=3 --data_id=13 --demandMu=3  --demandSigma=2  --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=19.0 --f2=13.0 --f3=11.0 --f4=8.0 --ILInit1=3  --ILInit2=3  --ILInit3=3  --ILInit4=3  --AOInit1=3  --AOInit2=3  --AOInit3=3  --AOInit4=3  --ASInit1=3  --ASInit2=3  --ASInit3=3  --ASInit4=3 

python main.py --demandDistribution=3 --data_id=22 --demandMu=2  --demandSigma=2  --demandUp=3 --actionUp=5 --actionLow=-5       --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=14.0 --f2=9.0 --f3=9.0 --f4=5.0 --ILInit1=2  --ILInit2=2  --ILInit3=2  --ILInit4=2  --AOInit1=2  --AOInit2=2  --AOInit3=2  --AOInit4=2  --ASInit1=2  --ASInit2=2  --ASInit3=2  --ASInit4=2 

Train the forecasting dataset

For the forecasting dataset you need to set config.demandDistribution=4, and then config.data_id can be either 5, 34, or 46. For training with the scaled dataset, which is reported in the paper, config.scaled=True is required too. See the following commands for three cases:

python main.py --demandDistribution=4 --data_id=5 --demandMu=4 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=21.0 --f2=16.0 --f3=16.0 --f4=11.0 --ILInit1=4  --ILInit2=4  --ILInit3=4  --ILInit4=4  --AOInit1=4  --AOInit2=4  --AOInit3=4  --AOInit4=4  --ASInit1=4  --ASInit2=4  --ASInit3=4  --ASInit4=4 

python main.py --demandDistribution=4 --data_id=34 --demandMu=4 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=18.0 --f2=15.0 --f3=14.0 --f4=10.0 --ILInit1=4  --ILInit2=4  --ILInit3=4  --ILInit4=4  --AOInit1=4  --AOInit2=4  --AOInit3=4  --AOInit4=4  --ASInit1=4  --ASInit2=4  --ASInit3=4  --ASInit4=4 

python main.py --demandDistribution=4 --data_id=46 --demandMu=4 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=21.0 --f2=16.0 --f3=18.0 --f4=12.0 --ILInit1=4  --ILInit2=4  --ILInit3=4  --ILInit4=4  --AOInit1=4  --AOInit2=4  --AOInit3=4  --AOInit4=4  --ASInit1=4  --ASInit2=4  --ASInit3=4  --ASInit4=4 

Use Transfer Learning

We have provided the trained model of the basic model which are used in the transfer learning section. The saved models are available in pre_model\uniform\0-3\brainX in which X is in {3, 4, 5, 6}. The value of X follows the same pattern as of config.gameConfig. To train a new with either of these trained models, you need to set config.tlBaseBrain that determines which trained should be used as the base model. For example:

python main.py --gameConfig=3  --iftl=True --ifUsePreviousModel=True  --tlBaseBrain=3 --baseDemandDistribution=0

Besides, if you trained a model with another demand distribution, e.g., N(10,2), you need to move the saved models into pre_model\normal\10-2\brainX and then for a new training set config.baseDemandDistribution=1. The config.baseDemandDistribution follows the same pattern as of config.demandDistribution.

Other utilities

If you set config.ifSaveFigure=True, it saves the trajectories of inventory-level, reward, action, open-order, and order-upto-level for each agent in an episode. config.saveFigIntLow and config.saveFigIntUp determine the range of eprisode to save the figures.

Setting config.ifsaveHistInterval=True, activate saving of trajectory of the received order, received shipment, inventory-level, reward, action, open-order, and order-upto-level for each agent in an episode. With this argument, you need to determine the interval between every two epsiode to save the history with config.saveHistInterval.

Paper citation

If you used this code for your experiments or found it helpful, consider citing the following paper:

@article{oroojlooyjadid2017deep,
  title={A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization},
  author={Oroojlooyjadid, Afshin and Nazari, MohammadReza and Snyder, Lawrence and Tak{\'a}{\v{c}}, Martin},
  journal={MSOM},
  year={2020}
}

deepbeerinventory-rl's People

Contributors

kathuman avatar optml avatar oroojlooy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.