Comments (5)
I think that's a very interesting kind of environment.
David Duvenaud has done interesting work in this area. His paper Gradient-based Hyperparameter Optimization with Reversible Learning [arXiv, slides] describes one approach.
Check out this graph from the above paper, plotting loss vs the learning rate hyperparameter:
I think that sort of curve is common. If you start on the left, a simple convex optimizer might find its way down to the minimum. But if you start on the right, chaos. That's where prior knowledge is helpful, and where your idea of transfer learning from other domains might be a big win.
One issue is that if the steps are very slow, it may be frustrating to test. So it'd be good to have some nontrivial networks that can converge in a fraction of a second for agent development purposes.
Another thought: the use case is different for model training in production vs research. In production, you care about improving convergence time and minimizing overfitting. In research many attempts don't converge at all, and you don't know if the model is wrong, the problem is unsolvable, or your hyperparameters need a slight tweak. I assume you're mainly thinking about the production case, but the research case would also be really interesting.
Anyway, we'd be delighted if you would contribute some hyperparameter choosing environments. I'm happy to answer questions about integrating into gym.
from gym.
It's an interesting idea that's worth exploring. Several recommendations:
- It would be much more interesting if the datasets were not random; for example, using the penn-treebank, MNIST, CIFAR-10, and a few more like them would be much better.
- At the beginning of each episode, the first observation should represent the dataset, the architecture, and a depiction of all the different hyperparameters.
- It would be good to set things up so that each step would take no more than 10-20 minutes, at least for the early versions.
If all three points are done, it could become an interesting environment that would easily defeat pretty much all RL algorithms.
from gym.
Thank you everyone for your informative responses! I will convert my existing code into environment and test it. As I will have something interesting, I will let you know.
@tlbtlbtlb:
Thanks for the fascinating reference! Indeed, maybe some years from now training of deep nets will be done in completely new way to accomplish better hyperparameter selection.
I agree that training steps should not take too much time. Therefore for first environments I think it makes sense to use small datasets (e.g. from uci ml dataset repository) and “classical” ml models from sklearn (which usually do not take much time to train).
Different use cases of industry / academia is definitely something to consider. I think objective of the environment should be engineered accordingly; For one environment performance measure would be proportional to speed of convergence, complexity of the model and achieved validation accuracy, and for the other best achieved performance and fraction of successfully converged trials. Also for academic applications control over the model should be more fine grained.
- Indeed, in the end I think it only makes sense to use real datasets, as otherwise results might not be transferable into practice. Using artificial data makes sense only for “Hello World” environment of this kind, to have it as an example for more complicated ones.
- Right. For initial environments I will however have set of model parameters optimized fixed per environment, to keep things simple.
- The most expensive step (on modified MNIST dataset) in my current code usually takes less than 1 min, with linear SVM. It might make sense to consider a subset of MNIST to use some more complicated models, or use small datasets from uci ml dataset repository.
from gym.
- +1
- OK
- A linear SVM is a good starting point, but you to make it interesting, even a neural net with 30 hidden units would be, most likely, a lot closer than a linear SVM.
from gym.
I think this issue can be closed now.
As for the environments that I committed, I think the next step would be to make versions with different objectives, to represent different use cases as discussed with tlbtlbtlb.
from gym.
Related Issues (20)
- [Bug Report] Env force_mag can not be changed in cartpole HOT 1
- I am using DDPG. I trained the agent successfully. Now, I need to specify all action using all_actions = np.array(history.history['action']). But, it looks like there is no action recorded in history object. Any help and advice would be much appreciated.
- [Question] Is it possible that I turn off the auto reset of `step_wait()` function? HOT 1
- setup.py in https://www.gymlibrary.dev/content/environment_creation/#make-your-own-custom-environment
- [Question] Open source code for Montezuma Revenge
- Juputer notebook Kernel Dead After running any gym method
- [Bug Report] Vector env return value HOT 1
- Deepcopy env not working as expected HOT 6
- Segmentation Fault while trying to run Rviz in a wsl enviorment using VcXsrv HOT 1
- Is there any tools for changing the hyperparameter of the mujoco environment? HOT 1
- [Question] Modifying and Analyzing mujoco's qpos and qvel HOT 1
- installing gym[toy_text] bug HOT 1
- Could not find platform dependent libraries and kernel always busy. Im waiting and kernel still busy but my energyplus was completed succesfully. HOT 4
- Env.reset () is error HOT 2
- [Bug Report] Getting Error from "D:\PongDQN_RL\venv\lib\site-packages\gym\wrappers\compatibility.py" HOT 12
- [Question] Custom dtype in observation space
- [Bug Report] pusher-v4 in the environment doesn't collide the object for the fork HOT 1
- [Question] How to verify who is the winner of a game? HOT 3
- [bug] OpenAI gym set_level(50) not disabling logs HOT 1
- [Bug Report] Humanoidv4 doesnt include contact_cost in code, but still present in documentation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym.