Here's an attempt at being able to replay with a different action. If the step termina

Here's how to see the issue: <div class="snippet-clipboard-content notranslate pos

Deepcopy env not working as expected about gym HOT 6 OPEN

wjessup commented on May 27, 2024

Deepcopy env not working as expected

from gym.

Comments (6)

pseudo-rnd-thoughts commented on May 27, 2024 1

Why are parts of the next_state equal to the next_state2 which came from a different action on a copied environment?

Looking at the observation space (https://gymnasium.farama.org/environments/classic_control/cart_pole/#observation-space), all this means is a single action hasn't caused the cart position or angular velocity to change.
It is not necessary for an action to affect the whole observation.

But going back to your original post, this is not a bug and deepcopy of CartPole works as expected.

from gym.

pseudo-rnd-thoughts commented on May 27, 2024

This is an interesting research question, the problem is you have a bug and an incorrect assumption

The bug is after you take the alternative action then you need to modify you env to use the prev_env as this is the new next state that you actually want

The incorrect assumption is that if action X fails, then if I revert and take action Y then this won't fail. The problem is that in this example for certain states, it is possible that both action X and Y will cause the environment to terminate.

My code

env = gym.make("CartPole-v1")
state, info = env.reset()

for step in range(200):
    action = random.randint(0, 1)
    prev_env = copy.deepcopy(env)

    next_state, reward, terminated, truncated, _ = env.step(action)

    if terminated:
        other_action = 1 if action == 0 else 0

        print(f'terminated, taking alternative action ({other_action})')
        next_state, reward, terminated, truncated, _ = prev_env.step(other_action)
        if terminated:
            print('both actions cause termination, ending')
            break
        else:
            env = prev_env

Repeating this experiment a 1000 times then I didn't find a single case where the alternative action didn't also cause the environment to terminate

from gym.

wjessup commented on May 27, 2024

Thanks for looking!

It does seem that all terminated states cannot be replayed with a different action. I wonder why not.

Anyway, can you rewind 2 actions and it won't terminate :)

I'm considering storing these states and rewarding differently. The state is on a boundary where if one action is taken it leads to an unrecoverable termination....

The bug still stands:

I think you are running into the same bug. In my original code I do use prev_env.step() as you suggest. But if you inspect the state of the prev_env, you'll notice it changes even tho you don't call step on it! That was the bug.

So, instead I just store all the actions in a replay memory and make a new environment with the same seed, and then replay those actions until 1 before I terminate.

Try this and you'll find states that you can recover from:

env = gym.make("CartPole-v1")

seed = random.randint(0, 10000)
state, info = env.reset(seed=seed)
actions_taken = []
for step in count():
      action = random.randint(0, 1)
      actions_taken.append(action)

      prev_state = state
      next_state, reward, terminated, truncated, _ = env.step(action)
      state = next_state   

      if terminated or truncated:
          reset_and_replay(actions_taken, seed)
          break

def reset_and_replay(actions_taken, seed):
    state, info = env.reset(seed=seed)
    last_action = actions_taken[-1]
    other_action = 1 if last_action == 0 else 0
    for _ in actions_taken[:-1]:
        env.step(_)

    next_state, reward, terminated, truncated, _ = env.step(other_action)
    state = next_state
    
    if terminated:
        print("unrecoverable!")
        reset_and_replay(actions_taken[:-1], seed)
    else: 
        print("taking the other action didn't terminate!")
        if not truncated:
            terminated_replay_memory.append(Transition(state, last_action, next_state, terminated))

from gym.

pseudo-rnd-thoughts commented on May 27, 2024

I can't replicate the problem that you are talking about using both Gym and Gymnasium
What version of Gym are you using? It looks like v0.26 (which I'm using as well)

import gym
import gymnasium

print("Gym")
env = gym.make("CartPole-v1")
env.reset()

print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)

print(f'Gymnasium')
env = gymnasium.make("CartPole-v1")
env.reset()

print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)

from gym.

wjessup commented on May 27, 2024

Here's how to see the issue:

import gymnasium as gym
env = gym.make("CartPole-v1")
env.reset()
copied_env = copy.deepcopy(env)

print(env.unwrapped.state)
print(copied_env.unwrapped.state)

next_state, reward, terminated, truncated, _ = env.step(1)

print(env.unwrapped.state)
print(copied_env.unwrapped.state)
print("next_state, reward, terminated, truncated, _ = ", next_state, reward, terminated, truncated, _)

next_state2, reward2, terminated2, truncated2, _ = copied_env.step(0)

print(env.unwrapped.state)
print(copied_env.unwrapped.state)
print("next_state2, reward2, terminated2, truncated2, _ = ", next_state2, reward2, terminated2, truncated2, _)
print()
print("why?", next_state == next_state2)

Output to look at:

why? [ True False True False]

Why are parts of the next_state equal to the next_state2 which came from a different action on a copied environment?

from gym.

wjessup commented on May 27, 2024

This has been bugging me.

How can it be that if you go: right, right, right, right, right, right
and build up lots of velocity..
your position is same as if you go: right, right, right, right, right, left?

This isn't a new issue:
#1019

Per discussion with @joschu - both versions are correct. The old one is the vanilla Euler, and the suggested is semi-implicit Euler. While the new one is more stable, we'd rather not change the behavior of the existing environment. If you'd still like to use semi-implicit Euler, could you add a separate environment flag that turns it on (i.e. by default it is off, and a flag can turn it on)?

I'm not sure why the team decided back then to not change the behavior? I'd recommend doing so.

If if changing the behavior to be the semi-implicit by default isn't possible, then at least document that it's the preferred method and how game engines and reality works.

from gym.

Deepcopy env not working as expected about gym HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent