Comments (7)
Yes this is closed now!
from rl.
@vmoens Is this issue still open? @BY571 Does #1038 close it?
from rl.
Nice! @vmoens Do you know if this is meant to handle truncation correctly (e.g. as introduced in Gymnasium)? I.e. allows to give a value for the truncation state,
Or maybe, if this is deferred to the learning logic, there is a flag for states belonging to truncated episodes so that one could bootstrap their values (e.g.
I'm not aware of the updates since #403.
from rl.
Similarly, for episodes that have not finished yet (typically the last episode in each concurrent environment), is there a way to find those and mask them out in the loss?
Thanks!
from rl.
I think that -- provided that you pass the correct mask to the function -- truncation should be handled properly.
@BY571 can you confirm?
This is the tranform. It looks at done or truncated here.
The functional handles these two as a "done" but as you can see, upstream the transform will do done = done | truncated.
Let us know if something is not clear!
from rl.
Yes, when an episode was ended (without done=True) truncated is set true on that last state the transform handles it as if that was the last state of the episode:
>>> from torchrl.envs.transforms import Reward2GoTransform
>>> import torch
>>> from tensordict import TensorDict
>>> r2g = Reward2GoTransform(in_keys=["reward"], out_keys=["reward_to_go"])
>>> td = TensorDict({"reward": torch.ones(4,1), "next": {"done":torch.zeros(4,1).to(dtype=bool), "truncated": torch.zeros(4,1).to(dtype=bool)}}, batch_size=())
>>> td["next"]["truncated"][-1]=True
>>> r2g._inv_call(td)["reward_to_go"]
tensor([[4.],
[3.],
[2.],
[1.]])
If you want to mask these episodes out completely you might have to set states, reward, actions (etc) to zero. Simply setting truncated to True for all those steps would not work. Then the reward-to-go transform returns only the current reward per step as it expects that each step is a single episode with length=1:
>>> td = TensorDict({"reward": torch.ones(4,1), "next": {"done":torch.zeros(4,1).to(dtype=bool), "truncated": torch.zeros(4,1).to(dtype=bool)}}, batch_size=())
>>> td["next"]["truncated"][-3:]=True
>>> r2g._inv_call(td)["reward_to_go"]
tensor([[2.],
[1.],
[1.],
[1.]])
>>> td = TensorDict({"reward": torch.ones(4,1), "next": {"done":torch.zeros(4,1).to(dtype=bool), "truncated": torch.zeros(4,1).to(dtype=bool)}}, batch_size=())
>>> td["next"]["truncated"][-3:]=True
>>> td["reward"][-3:]=0
>>> r2g._inv_call(td)["reward_to_go"]
tensor([[1.],
[0.],
[0.],
[0.]])
Let me know if this helped to clarify
from rl.
Thanks both for the clarifications.
So I guess the answer to my question is that the transform is aware of truncation and handles it as termination.
So, it will not bootstrap truncated episodes or mask unfinished ones. This is left to the user.
Also, it does not expect the last state to have ["next"]["truncated”] = True or ["next"][“done”] = True, it will only complain if there is not any done or truncated in the batch.
>>> from torchrl.envs.transforms import Reward2GoTransform
>>> import torch
>>> from tensordict import TensorDict
>>> r2g = Reward2GoTransform(in_keys=["reward"], out_keys=["reward_to_go"])
>>> td = TensorDict({"reward": torch.ones(4,1), "next": {"done":torch.zeros(4,1).to(dtype=bool), "truncated": torch.zeros(4,1).to(dtype=bool)}}, batch_size=())
>>> td["next"]["truncated"][1]=True
>>> r2g._inv_call(td)["reward_to_go"]
tensor([[2.], # Belongs to truncated episode.
[1.], # Belongs to truncated episode. Next is truncated.
[2.], # Belongs to unfinished episode.
[1.]]) # Belongs to unfinished episode. Next is not truncated, nor done.
Otherwise, regarding the last step truncation:
Yes, when an episode was ended (without done=True) truncated is set true
Where does this happen? It doesn't seem to be done by a collector at the last frame of a batch. (I’m new to TorchRL and in the process of deciding whether I should adopt it!)
from rl.
Related Issues (20)
- [BUG] Problems with BatchedEnv on accelerated device with single envs on cpu HOT 28
- [BUG] Truncated key is on different devices with BatchedEnv on different device than single envs. HOT 2
- [Feature Request] partial steps in batches envs HOT 1
- [BUG] SliceSampler proposes indices outside storage content
- [Feature Request] PrioritizedSliceSampler
- [BUG]
- [Feature Request] Multi-dim RBs
- [Feature Request] Streamline preproc of datasets HOT 2
- [BUG] UserWarning: Failed to import torchrl C++ binaries HOT 4
- [Feature Request] Make `KLControllerBase` independent of the model HOT 1
- [BUG] Asked to report bug by a `UserWarning` HOT 1
- [Feature Request] Tutorial for custom env with complex shapes HOT 1
- [Feature Request] Why have mutually exclusive terminated and truncated? HOT 8
- [BUG] `FlattenObservation` transform with `OneHotDiscreteTensorSpec` HOT 7
- [Feature Request] A way to specify the input of environment resets through the DataCollector HOT 11
- [BUG] I had to patch these 2 methods in order to run my script HOT 4
- [Feature Request] Expose `time_dim` parameter on `GAE` and other advantage estimators HOT 1
- [BUG] A dataclass in "next" doesn't get copied over in `step_mdp` HOT 3
- [BUG] kl divergence calculation in KLPENPPOLoss is always zero HOT 1
- [BUG] TanhNormal is off? (Wrong issue) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rl.