A Python implementation of the off-policy algorithm well known as the Q-Learning algorithm. In order to transform the original problem of pricing a Black Scholes based derivative, we stray from the academic continuous-time limit and use a discretized version of the classic Black Scholes model in order to construct a risk-neutral Markov Decision Process. Here, the option price of the MDP is an optimal Q-function. Pricing and hedging an option when the stock dynamics, such as the volatility, are unknown is now possible through the use of the Q-Learning algorithm. The model itself does not make any guesses or assumptions about the structure of the data it is given, and is guaranteed to asymptotically converge to the optimal solution given enough data and time. As a result, this suggests that the RL environment could be used to provide more optimal pricing of financial derivatives in a practical real world setting.
thesisQLBS.py
contains the testing environment, and the helper file QLBSHelper.py
contains the actual Q-Learning algorithm. I implemented my own version of the Fitted-Q Learner algorithm and applied it to the Backwards MDP model of the BSM process.
russellkim98 / qlearner_blackscholes Goto Github PK
View Code? Open in Web Editor NEWA python implementation of the Q-Learning algorithm in pricing financial options.
License: GNU Lesser General Public License v3.0