declanoller / rwg_benchmarking Goto Github PK

Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing

License: MIT License

Python 100.00%

ai artificial-intelligence benchmark environments episodes evolutionary-algorithms learning-algorithm matplotlib openai-gym python python3 random reinforcement-learning rl rwg score trials weight-space weights

rwg_benchmarking's People

Contributors

Stargazers

Watchers

rwg_benchmarking's Issues

Simple variations: softmax outputs, no nonlinearities, etc.

There are several slight variations that it would make sense to test. For example, now, for discrete action spaces, I just use an argmax across the outputs. It's possible that a softmax would be more effective for some env's.

Similarly, we have a nonlinearity right now, but it's possible that's not necessary for some env's (see winning agents for LunarLander-v2 and CartPole-v0 here: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/ ; completely linear).

More broadly: make it so many variations can be tested for each.

Add distributional stats

It's not as informative to only have a single solve time/avg score, due to randomness. It would be better, for benchmarking, to run an ensemble of the agents, and form a distribution. Even 10 of them would let us get a sense of the spread.

Make benchmark happen less frequently

I'm currently testing to see if an agent has reached "benchmark level" by doing the following. Every agent produced is tested for N (usually 3) episodes, and the mean score is taken. Then, if that mean score is better than the best mean score found so far (from previous agents), the agent is tested for 100 episodes (typically) to see if it produces the "benchmark 100 episode" score.

However, this is slowing it down too much -- if the 3 episode average is the best found, but still far below the benchmark score, it doesn't make much sense to test it for that long. I should do something more like, if an agent's 3 episode mean is >=80% of the benchmark score, then try it. That can be fine tuned more but should speed it up.

Add option of using static random seed

There's randomness in the initial conditions of many env's that affects the outcomes. To be more systematic, it makes sense sometimes to specify a random seed value so runs at different times can be compared.

Hopefully this would be overcome by gathering enough statistics (trials), but it should be done anyway.

Save solved episode recording

It would be good to add a recording of an episode with the best weights found, for each env.

However, last I checked, there's a very annoying bug with gym with recording multiple episodes, or having multiple env's being monitored. Solve this one. It's possible it may need to be done in a hacky way, after the main optimization runs.

Additionally, maybe add "grid" style ones like this: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/

declanoller / rwg_benchmarking Goto Github PK

rwg_benchmarking's People

Contributors

Stargazers

Watchers

rwg_benchmarking's Issues

Simple variations: softmax outputs, no nonlinearities, etc.

Add distributional stats

Make benchmark happen less frequently

Add option of using static random seed

Save solved episode recording

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent