declanoller / rwg_benchmarking Goto Github PK
View Code? Open in Web Editor NEWAnalyzing Reinforcement Learning Benchmarks with Random Weight Guessing
License: MIT License
Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing
License: MIT License
There are several slight variations that it would make sense to test. For example, now, for discrete action spaces, I just use an argmax across the outputs. It's possible that a softmax would be more effective for some env's.
Similarly, we have a nonlinearity right now, but it's possible that's not necessary for some env's (see winning agents for LunarLander-v2
and CartPole-v0
here: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/ ; completely linear).
More broadly: make it so many variations can be tested for each.
It's not as informative to only have a single solve time/avg score, due to randomness. It would be better, for benchmarking, to run an ensemble of the agents, and form a distribution. Even 10 of them would let us get a sense of the spread.
I'm currently testing to see if an agent has reached "benchmark level" by doing the following. Every agent produced is tested for N (usually 3) episodes, and the mean score is taken. Then, if that mean score is better than the best mean score found so far (from previous agents), the agent is tested for 100 episodes (typically) to see if it produces the "benchmark 100 episode" score.
However, this is slowing it down too much -- if the 3 episode average is the best found, but still far below the benchmark score, it doesn't make much sense to test it for that long. I should do something more like, if an agent's 3 episode mean is >=80% of the benchmark score, then try it. That can be fine tuned more but should speed it up.
There's randomness in the initial conditions of many env's that affects the outcomes. To be more systematic, it makes sense sometimes to specify a random seed value so runs at different times can be compared.
Hopefully this would be overcome by gathering enough statistics (trials), but it should be done anyway.
It would be good to add a recording of an episode with the best weights found, for each env.
However, last I checked, there's a very annoying bug with gym
with recording multiple episodes, or having multiple env's being monitored. Solve this one. It's possible it may need to be done in a hacky way, after the main optimization runs.
Additionally, maybe add "grid" style ones like this: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.