Comments (6)
Why are parts of the
next_state
equal to thenext_state2
which came from a different action on a copied environment?
Looking at the observation space (https://gymnasium.farama.org/environments/classic_control/cart_pole/#observation-space), all this means is a single action hasn't caused the cart position or angular velocity to change.
It is not necessary for an action to affect the whole observation.
But going back to your original post, this is not a bug and deepcopy of CartPole works as expected.
from gym.
This is an interesting research question, the problem is you have a bug and an incorrect assumption
The bug is after you take the alternative action then you need to modify you env to use the prev_env
as this is the new next state that you actually want
The incorrect assumption is that if action X fails, then if I revert and take action Y then this won't fail. The problem is that in this example for certain states, it is possible that both action X and Y will cause the environment to terminate.
My code
env = gym.make("CartPole-v1")
state, info = env.reset()
for step in range(200):
action = random.randint(0, 1)
prev_env = copy.deepcopy(env)
next_state, reward, terminated, truncated, _ = env.step(action)
if terminated:
other_action = 1 if action == 0 else 0
print(f'terminated, taking alternative action ({other_action})')
next_state, reward, terminated, truncated, _ = prev_env.step(other_action)
if terminated:
print('both actions cause termination, ending')
break
else:
env = prev_env
Repeating this experiment a 1000 times then I didn't find a single case where the alternative action didn't also cause the environment to terminate
from gym.
Thanks for looking!
It does seem that all terminated states cannot be replayed with a different action. I wonder why not.
Anyway, can you rewind 2 actions and it won't terminate :)
I'm considering storing these states and rewarding differently. The state is on a boundary where if one action is taken it leads to an unrecoverable termination....
The bug still stands:
I think you are running into the same bug. In my original code I do use prev_env.step()
as you suggest. But if you inspect the state of the prev_env, you'll notice it changes even tho you don't call step
on it! That was the bug.
So, instead I just store all the actions in a replay memory and make a new environment with the same seed, and then replay those actions until 1 before I terminate.
Try this and you'll find states that you can recover from:
env = gym.make("CartPole-v1")
seed = random.randint(0, 10000)
state, info = env.reset(seed=seed)
actions_taken = []
for step in count():
action = random.randint(0, 1)
actions_taken.append(action)
prev_state = state
next_state, reward, terminated, truncated, _ = env.step(action)
state = next_state
if terminated or truncated:
reset_and_replay(actions_taken, seed)
break
def reset_and_replay(actions_taken, seed):
state, info = env.reset(seed=seed)
last_action = actions_taken[-1]
other_action = 1 if last_action == 0 else 0
for _ in actions_taken[:-1]:
env.step(_)
next_state, reward, terminated, truncated, _ = env.step(other_action)
state = next_state
if terminated:
print("unrecoverable!")
reset_and_replay(actions_taken[:-1], seed)
else:
print("taking the other action didn't terminate!")
if not truncated:
terminated_replay_memory.append(Transition(state, last_action, next_state, terminated))
from gym.
I can't replicate the problem that you are talking about using both Gym and Gymnasium
What version of Gym are you using? It looks like v0.26 (which I'm using as well)
import gym
import gymnasium
print("Gym")
env = gym.make("CartPole-v1")
env.reset()
print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)
print(f'Gymnasium')
env = gymnasium.make("CartPole-v1")
env.reset()
print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)
from gym.
Here's how to see the issue:
import gymnasium as gym
env = gym.make("CartPole-v1")
env.reset()
copied_env = copy.deepcopy(env)
print(env.unwrapped.state)
print(copied_env.unwrapped.state)
next_state, reward, terminated, truncated, _ = env.step(1)
print(env.unwrapped.state)
print(copied_env.unwrapped.state)
print("next_state, reward, terminated, truncated, _ = ", next_state, reward, terminated, truncated, _)
next_state2, reward2, terminated2, truncated2, _ = copied_env.step(0)
print(env.unwrapped.state)
print(copied_env.unwrapped.state)
print("next_state2, reward2, terminated2, truncated2, _ = ", next_state2, reward2, terminated2, truncated2, _)
print()
print("why?", next_state == next_state2)
Output to look at:
why? [ True False True False]
Why are parts of the next_state equal to the next_state2 which came from a different action on a copied environment?
from gym.
This has been bugging me.
How can it be that if you go: right, right, right, right, right, right
and build up lots of velocity..
your position is same as if you go: right, right, right, right, right, left?
This isn't a new issue:
#1019
Per discussion with @joschu - both versions are correct. The old one is the vanilla Euler, and the suggested is semi-implicit Euler. While the new one is more stable, we'd rather not change the behavior of the existing environment. If you'd still like to use semi-implicit Euler, could you add a separate environment flag that turns it on (i.e. by default it is off, and a flag can turn it on)?
I'm not sure why the team decided back then to not change the behavior? I'd recommend doing so.
If if changing the behavior to be the semi-implicit by default isn't possible, then at least document that it's the preferred method and how game engines and reality works.
from gym.
Related Issues (20)
- setup.py in https://www.gymlibrary.dev/content/environment_creation/#make-your-own-custom-environment
- [Question] Open source code for Montezuma Revenge
- Juputer notebook Kernel Dead After running any gym method
- [Bug Report] Vector env return value HOT 1
- Segmentation Fault while trying to run Rviz in a wsl enviorment using VcXsrv HOT 1
- Is there any tools for changing the hyperparameter of the mujoco environment? HOT 1
- [Question] Modifying and Analyzing mujoco's qpos and qvel HOT 1
- installing gym[toy_text] bug HOT 1
- Could not find platform dependent libraries and kernel always busy. Im waiting and kernel still busy but my energyplus was completed succesfully. HOT 4
- Env.reset () is error HOT 2
- [Bug Report] Getting Error from "D:\PongDQN_RL\venv\lib\site-packages\gym\wrappers\compatibility.py" HOT 12
- [Question] Custom dtype in observation space
- [Bug Report] pusher-v4 in the environment doesn't collide the object for the fork HOT 1
- [Question] How to verify who is the winner of a game? HOT 3
- [bug] OpenAI gym set_level(50) not disabling logs HOT 1
- [Bug Report] Humanoidv4 doesnt include contact_cost in code, but still present in documentation HOT 4
- [Question] I am implementing a DQN on Atari. I have some shape related problems. HOT 1
- [Question] MUJOCO environment info ERROR in the gym documentation website HOT 2
- [Question] How do I edit the leaderboard? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym.