metalearncuriosity's People
metalearncuriosity's Issues
Change templates
Problem Description:
Change template style
Proposed Solution:
edit template files
Alternative Solutions (if any):
Additional Context:
Update RRN code
Problem Description
The flax RNNCellBase
API has undergone some key updates. Need to update code to match these changes.
Proposed Solution
Follow the tutorial here.
Alternative Solutions (if any)
Downgrade to an older version of flax.
RL loss for BYOL-explore is too high
Bug Description:
RL loss for BYOL-explore is too high.
Steps to Reproduce:
- Simply run the BYOL-explore file and monitor the RL loss.
Expected Behavior:
The loss should decrease as training progresses.
Actual Behavior:
The loss decreases but the magnitude of the loss is still way too high,
Environment:
Empty-misc
Feature Request: Add pmapped reward combiner
Problem Description
With TPU access we need to adjust our code for training reward combiner to take full advantage of the TPUs.
Proposed Solution
Code up reward combiner script that uses pmap
.
Alternative Solutions (if any)
Additional Context
Add other metrics/hyperparameter visions of agents.
Problem Description
Add other metrics to view instead of the average reward. Such as the standard error and add hyperparameter versions of the code.
Proposed Solution
Add a file so that we may view these metrics. Have a folder that contains files that sweeps through the hyperparameters of the algorithms.
Alternative Solutions (if any)
Additional Context
Add FAST algorithm.
Problem Description
Add the FAST algorithm which was meta-learned from the Meta-Learning Curiosity Algorithms paper.
Proposed Solution
Add a single file implementation of the FAST algorithm.
Additional Context
Mainly for the ICLR blogpost. But can also serve as one of the baselines for my master's.
Add delayed wrapper for Brax.
Problem Description
To test the exploration power of our RL algorithms we will implement Brax with delayed rewards. This idea is taken from here.
Proposed Solution
Add a function that is able to provide the reward at the specified step-interval.
Add saver file or checkpoint saver and plotter file.
Problem Description
Add a way to save the runner_state
and the hence the network params of the networks. This will help with the visualisation as well perhaps. Also add a file that plots the W&B CSV files.
Proposed Solution
Provide a file that saves the runner_state
and the config used. Save the file locally.
Alternative Solutions (if any)
Perhaps use Weights & Bias to save the models.
BYOL-Explore-toy-example does not learn with intrinsic rewards.
Bug Description:
As soon as we add the intrinsic rewards BYOL-Explore does not learn. The learning curves crashes.
Steps to Reproduce:
- Just simply run the experiments with BYOL-explore and set
INT_LAMBDA
to a positive number.
Expected Behavior:
We see BYOL explore perform better with curiosity in the latter stages of training.
Actual Behavior:
We see BYOL explore perform considerably worse with curiosity.
Environment:
This was for Empty-misc environment.
Have two value heads for byol-explore.
Problem Description
Implement BYOL-explore with two value heads. This could help determine why the value loss is so high.
Proposed Solution
Add second value head for BYOL explore. Which means we have second an advantage and target for the intrinsic rewards as well. This will help keep the intrinsic reward non-episodic and the extrinsic reward episodic.
Additional Context
Since we have one value head, if we make the intrinsic reward non-episodic we are making the extrinsic reward non-episodic as well. This could cause issues as we are leaking information about the task to the agent. See original random network distillation paper.
TPU Branch
Problem Description
The current code base does not take advantage of pmap
. This is needed since we are using TPUs.
Proposed Solution
Make the code base suited for TPUs.
Feature Request: Add gymnax and brax wrappers from purejaxrl
Feature Title:
Wrappers for brax and gymnax needed.
Problem Description:
The wrappers are needed so that we are able to train our agents on the environments.
Proposed Solution:
A file that contains the function wrappers.
Using Episode Horizon Instead Training Horizon
Bug Description:
The temporal reward combiner is using normalised episode time step. It should the normalised training time step.
Expected Behavior:
Reward combiner should generalise to other environments.
Actual Behavior:
Struggles to generalise beyond the minigrid environments
Feature Request: Add Discrete PPO agent
Feature Title:
Add PPO agent from purejaxrl
Problem Description:
The base PPO is necessary to ensure we are able to implement benchmarks
Proposed Solution:
This should implement the PPO agent for discrete actions gymnax.
Feature Request: Add toy example for BYOL-Explore
Feature Title:
Add toy example for BYOL-Explore
Problem Description:
We need a file that implements BYOL-Explore
Proposed Solution:
This file should contain a toy example of BYOL-Explore where we simply add the world model and the encoder and the normalisation methods.
Add Meta-learner for extrinsic and intrinsic reward combiner.
Problem Description
In the Curiosity literature it is not clear how one should combine the intrinsic rewards and the extrinsic rewards. This reward combiner has been handcrafted and perhaps it can be meta-learned.
Proposed Solution
Add a meta-learner that tries to learn how to be able to combine the intrinsic reward and the extrinsic reward.
It will be a single file implementation.
Add episodic intrinsic reward for BYOL-explore
Feature Title:
Add episodic intrinsic reward for BYOL-explore
Problem Description:
Intrinsic rewards should be episodic if desired by the user. Currently just none-episodic.
Proposed Solution:
Make the intrinsic reward non-episodic by adding another batch of last_dones
so that we may times the intrinsic reward by (1-last_dones)
.
Implement BYOL-Explore Properly
Problem Description
The last baseline to implement is BYOL-Explore
Proposed Solution
Add a single file implementation of it.
Alternative Solutions (if any)
Additional Context
Add Reward Prioritisation for BYOL-explore
Feature Title:
Add Reward Prioritisation for BYOL-explore
Problem Description:
Add Reward Prioritisation for BYOL-explore as this could help stabilise training for lower values for int_lambda
.
Proposed Solution:
Add reward Prioritisation function as described in the paper.
Add Random Agent Code
Problem Description
Add code that runs the random agent on the environment. This is to normalise the scores from the agents.
Proposed Solution
Add random agent code for each macro-environment.
Cannot divide evenly which means cannot form batches
Bug Description:
When attempting to create a batch we run into an issue whereby we cannot divide evenly the shapes 128 and 512 when we do not times the intrinsic reward by (1-done)
.
Steps to Reproduce:
- Do not times the intrinsic reward by
(1-done)
in BYOL explore. - Then run the BYOL-explore file.
Expected Behavior:
We should expect the file to execute and we should be able to run the experiments.
Actual Behavior:
We obtain this error:
jax._src.core.InconclusiveDimensionOperation: Cannot divide evenly the sizes of shapes (128,) and (512,)
Environment:
Empty-misc
Additional Information:
config = { "SEED": 42, "NUM_SEEDS": 30, "LR": 2.5e-4, "NUM_ENVS": 4, "NUM_STEPS": 128, "TOTAL_TIMESTEPS": 5e5, "UPDATE_EPOCHS": 4, "NUM_MINIBATCHES": 4, "GAMMA": 0.99, "GAE_LAMBDA": 0.95, "CLIP_EPS": 0.2, "ENT_COEF": 0.01, "VF_COEF": 0.5, "MAX_GRAD_NORM": 0.5, "ACTIVATION": "tanh", "ENV_NAME": "Empty-misc", "ANNEAL_LR": True, "DEBUG": False, "EMA_PARAMETER": 0.99, "REW_NORM_PARAMETER": 0.99, "INT_LAMBDA": 0.1, }
Update logggers
Problem Description
Currently we log after training is complemented. It would be ideal to log during training. And then pause training and continue training.
Proposed Solution
It is possible to log the results during training using jax.experimental.io_callback()
. For example, we can update the parameters after every update step and use it in the print function to log training.
Alternative Solutions (if any)
Additional Context
Please see this issue.
Add visualisation file
Problem Description
We need a way to visualise the agent's performance on environments.
Proposed Solution
Add a file that takes a sequence of states and provides a gif.
Add cts ppo, rnn ppo, dpo
Feature Title:
Add other PPO types from PureJaxRL.
Problem Description:
So that we can test PPO on other types of environments.
Proposed Solution:
Add the single file implementations
Add loggers for intrinsic reward, and the losses (RL-loss, BYOL-Loss, Encoder Loss)
Feature Title:
Add loggers for intrinsic reward, and the losses (RL-loss, BYOL-Loss, Encoder Loss)
Problem Description:
Currently BYOL explore doesn't work with intrinsic reward added. The loggers will allow us to monitor why it can't learn. This is also for the future in case we run into other bugs.
Proposed Solution:
Add loggers that keep track of the losses, and intrinsic reward.
Add logger to log results (W&B)
Feature Title:
W&B logger
Problem Description:
There's no logger to log results and experiments.
Proposed Solution:
Add a run file and a logger file.
Add Random Network Distillation (RND).
Problem Description
Add the Random Network Distillation single file implementation.
Proposed Solution
A file that is able to be executed so that we may run the RND.
Additional Context
Serves as one of the baselines for our Master's
Add heat map for grid world environments.
Problem Description
Add heat maps for the grid world environments so that we may understand the RL algorithms behaviour better
Proposed Solution
Load the params of the models and see how they move in the environment. The user can specify which seeds to use.
Alternative Solutions (if any)
Additional Context
Add Cycle-consistency intrinsic motivation
Problem Description
Add Cycle-consistency intrinsic motivation
Proposed Solution
Just add a single file implementation of it.
Alternative Solutions (if any)
Additional Context
Add logger for the individual RL losses
Feature Title:
Add logger for the individual RL losses
Problem Description:
Add logger for individual RL losses. This will help with issue #15. And help with the debuggi8ng process for that loss.
Proposed Solution:
Add loggers for entrophy
, actor_loss
and value
loss.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.