Giter Club home page Giter Club logo

Comments (6)

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on May 27, 2024 1

Why are parts of the next_state equal to the next_state2 which came from a different action on a copied environment?

Looking at the observation space (https://gymnasium.farama.org/environments/classic_control/cart_pole/#observation-space), all this means is a single action hasn't caused the cart position or angular velocity to change.
It is not necessary for an action to affect the whole observation.

But going back to your original post, this is not a bug and deepcopy of CartPole works as expected.

from gym.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on May 27, 2024

This is an interesting research question, the problem is you have a bug and an incorrect assumption

The bug is after you take the alternative action then you need to modify you env to use the prev_env as this is the new next state that you actually want

The incorrect assumption is that if action X fails, then if I revert and take action Y then this won't fail. The problem is that in this example for certain states, it is possible that both action X and Y will cause the environment to terminate.

My code

env = gym.make("CartPole-v1")
state, info = env.reset()

for step in range(200):
    action = random.randint(0, 1)
    prev_env = copy.deepcopy(env)

    next_state, reward, terminated, truncated, _ = env.step(action)

    if terminated:
        other_action = 1 if action == 0 else 0

        print(f'terminated, taking alternative action ({other_action})')
        next_state, reward, terminated, truncated, _ = prev_env.step(other_action)
        if terminated:
            print('both actions cause termination, ending')
            break
        else:
            env = prev_env

Repeating this experiment a 1000 times then I didn't find a single case where the alternative action didn't also cause the environment to terminate

from gym.

wjessup avatar wjessup commented on May 27, 2024

Thanks for looking!

It does seem that all terminated states cannot be replayed with a different action. I wonder why not.

Anyway, can you rewind 2 actions and it won't terminate :)

I'm considering storing these states and rewarding differently. The state is on a boundary where if one action is taken it leads to an unrecoverable termination....

The bug still stands:

I think you are running into the same bug. In my original code I do use prev_env.step() as you suggest. But if you inspect the state of the prev_env, you'll notice it changes even tho you don't call step on it! That was the bug.

So, instead I just store all the actions in a replay memory and make a new environment with the same seed, and then replay those actions until 1 before I terminate.

Try this and you'll find states that you can recover from:

env = gym.make("CartPole-v1")

seed = random.randint(0, 10000)
state, info = env.reset(seed=seed)
actions_taken = []
for step in count():
      action = random.randint(0, 1)
      actions_taken.append(action)

      prev_state = state
      next_state, reward, terminated, truncated, _ = env.step(action)
      state = next_state   

      if terminated or truncated:
          reset_and_replay(actions_taken, seed)
          break

def reset_and_replay(actions_taken, seed):
    state, info = env.reset(seed=seed)
    last_action = actions_taken[-1]
    other_action = 1 if last_action == 0 else 0
    for _ in actions_taken[:-1]:
        env.step(_)

    next_state, reward, terminated, truncated, _ = env.step(other_action)
    state = next_state
    
    if terminated:
        print("unrecoverable!")
        reset_and_replay(actions_taken[:-1], seed)
    else: 
        print("taking the other action didn't terminate!")
        if not truncated:
            terminated_replay_memory.append(Transition(state, last_action, next_state, terminated))

from gym.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on May 27, 2024

I can't replicate the problem that you are talking about using both Gym and Gymnasium
What version of Gym are you using? It looks like v0.26 (which I'm using as well)

import gym
import gymnasium

print("Gym")
env = gym.make("CartPole-v1")
env.reset()

print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)

print(f'Gymnasium')
env = gymnasium.make("CartPole-v1")
env.reset()

print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)

from gym.

wjessup avatar wjessup commented on May 27, 2024

Here's how to see the issue:

import gymnasium as gym
env = gym.make("CartPole-v1")
env.reset()
copied_env = copy.deepcopy(env)

print(env.unwrapped.state)
print(copied_env.unwrapped.state)

next_state, reward, terminated, truncated, _ = env.step(1)

print(env.unwrapped.state)
print(copied_env.unwrapped.state)
print("next_state, reward, terminated, truncated, _ = ", next_state, reward, terminated, truncated, _)

next_state2, reward2, terminated2, truncated2, _ = copied_env.step(0)

print(env.unwrapped.state)
print(copied_env.unwrapped.state)
print("next_state2, reward2, terminated2, truncated2, _ = ", next_state2, reward2, terminated2, truncated2, _)
print()
print("why?", next_state == next_state2)

Output to look at:

why? [ True False True False]

Why are parts of the next_state equal to the next_state2 which came from a different action on a copied environment?

from gym.

wjessup avatar wjessup commented on May 27, 2024

This has been bugging me.

How can it be that if you go: right, right, right, right, right, right
and build up lots of velocity..
your position is same as if you go: right, right, right, right, right, left?

This isn't a new issue:
#1019

Per discussion with @joschu - both versions are correct. The old one is the vanilla Euler, and the suggested is semi-implicit Euler. While the new one is more stable, we'd rather not change the behavior of the existing environment. If you'd still like to use semi-implicit Euler, could you add a separate environment flag that turns it on (i.e. by default it is off, and a flag can turn it on)?

I'm not sure why the team decided back then to not change the behavior? I'd recommend doing so.

If if changing the behavior to be the semi-implicit by default isn't possible, then at least document that it's the preferred method and how game engines and reality works.

from gym.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.