Comments (5)
Thanks for reply. I have a related question. I was considering exporting actor network only but I noticed that model.predict and model.actor.predict returns different values. Is it expected behavior?
model = SAC('MlpPolicy', 'CartPole-v1', verbose=1)
env = gym.make ('CartPole-v1')
obs = env.reset()
print(model.predict(obs))
print(model.actor.predict(obs))
Returns: (array([-0.20559013], dtype=float32), None)
(array([-0.9510132], dtype=float32), None)
Edit: My mistake. It works as expected when both is called as deterministic.
from stable-baselines3.
You can do this with original stable-baselines using get_parameters and load_parameters, and with bit of manual tinkering. You need manually create the mismatching parameter arrays for env_2 agent and update correct parameters with ones from env_1 agent. E.g. if only the last fully-connected layer changes, you need to manually crate final_params = np.zeros((N, 20))
and then assign the weights of original parameters to it final_params[:, :10] = original_params
. If you want to modify save .zip files, the format is specified here.
Similar support is planned / partially working in SB3, but still needs to go through a check and review.
from stable-baselines3.
So, is exporting saved model as Pytorch model not supported yet in SB3? Is there a way to get model parameters as in stable-baselines' get_parameters function?
from stable-baselines3.
So, is exporting saved model as Pytorch model not supported yet in SB3?
It is for policy. (but not properly documented yet). There are nn.Module
anyway.
from stable-baselines3.
Similar support is planned / partially working in SB3, but still needs to go through a check and review.
done in #138 (will merge today)
from stable-baselines3.
Related Issues (20)
- Training of PPO freezes after number of iterations HOT 8
- [Question] Discretize continuous actions/observations ? HOT 1
- Why does the Logger only return the train/ metrics, and not eval/, time/, and rollout/? HOT 1
- [Question] How to pass a varying gamma to DQN or PPO during training? HOT 5
- [Bug]: EOFError after running for some steps HOT 1
- [Question] Saving PPO rollout buffer on GPU HOT 2
- Issue(HER with in SAC algorithm) HOT 2
- [Question] CheckpointCallback keep last K HOT 2
- [Bug]: Potential Bug in PPO? Clarification requested HOT 2
- Off policy algorithm policy_kwargs HOT 2
- [Feature Request] Enable predict to take tensor as input HOT 3
- [Question] policy gradient loss and explained variance very small (almost zero) from the training start? HOT 2
- [Question] Discontinuous reward training curve HOT 4
- [Bug]: if learning_rate function uses special types, they can cause torch.load to fail when weights_only=True HOT 4
- [Question] How to avoid SAC to stuck in local minima HOT 1
- Scalability HOT 2
- [Bug]: Scaling Environment HOT 9
- Scaling Environment HOT 6
- Discrepancy between Observations Sampled from Gym Env and Replay Buffer HOT 3
- [Bug]: Load Trained Policy HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stable-baselines3.