Learn & Apply reinforcement learning techniques on complex continuous control domain to achieve maximum rewards. In the continuous control domain, where actions are continuous and often high-dimensional such as OpenAI-Gym environment Humanoid-V2. The Humanoid environment has 377 Observation dimensions and 17 action dimensions. This problem requires temporal difference learning compared to supervised learning since it has so many moving parts that are hard to debug, and they require substantial efforts in tuning in order to get good results. Also, in supervised learning problems, progress has been driven by large labeled datasets like ImageNet. In Reinforcement Learning, the closest equivalent would be a large and diverse collection of environments.
raphaelsc19 / msds696 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from rajkumargithub/msds696
Humanoid V2