This repo serves as a template for coding out a Reinforcement Learning (RL) system. This system is meant to be a multi-purpose system with multiple possible applications.
Add things needed to enforce structure of any subsequent code
- Make generic baseclass for actor (policy) network
- Make generic baseclass for critic (value) network
- Make generic baseclass for reward function
- Finish thinking about generic lactchain baseclass. Yes, it is state-->action-->state, but what is action? Does action involve taking in a fluid prompt? A prompt menu? What?
- Write unit tests
Build out specific use cases
- Draw schematic of simple use case
- Add plausibly useful language action chains using lactchain class
- Add code extractor and other functions in state class
- Add other extractors to lactchains if you need to pull certain things (like code) from gpt4 responses
- Define example format for textblock in state class
- Define Policy and Value Function networks
- Define Actor-Critic teaching moments (TD learning? Whatever it's called)