Giter Club home page Giter Club logo

irl-maxent's People

Contributors

niftyj avatar qzed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

irl-maxent's Issues

Multiple terminal states

Thanks a lot for such a well-documented code. It is making it really easy for me to adapt for my use-case.

My MDP has multiple terminal states and I was wondering how to change the local_action_probabilities() code for that scenario. The Ziebert paper mentions you have to do it for all terminal states but not sure how to combine them. Thanks for your help!

Supporting MDPs with negative reward states?

Hello, thanks for sharing your code. Is it possible to use this for MDPs with negative reward states?

I've tried setting negative rewards inside setup_mdp() in example.py, e.g. like:

def setup_mdp():
    """
    Set-up our MDP/GridWorld
    """
    # create our world
    world = W.IcyGridWorld(size=5, p_slip=0.2)

    # set up the reward function
    reward = np.zeros(world.n_states)
    reward[-1] = 1.0
    reward[17] = -0.75
    reward[18] = -0.75
    reward[19] = -0.75

    # set up terminal states
    terminal = [24]

    return world, reward, terminal

-0.75 seems to be around the lowest I can set - lower than that, and running example.py results in an error:

Traceback (most recent call last):
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/example.py", line 141, in <module>
    main()
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/example.py", line 113, in main
    trajectories, expert_policy = generate_trajectories(world, reward, terminal)
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/example.py", line 51, in generate_trajectories
    tjs = list(T.generate_trajectories(n_trajectories, world, policy_exec, initial, terminal))
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/irl_maxent/trajectory.py", line 128, in <genexpr>
    return (_generate_one() for _ in range(n))
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/irl_maxent/trajectory.py", line 126, in _generate_one
    return generate_trajectory(world, policy, s, final)
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/irl_maxent/trajectory.py", line 77, in generate_trajectory
    action = policy(state)
  File "/Users/kierad/Documents/GitHub/irl-maxent/src/irl_maxent/trajectory.py", line 169, in <lambda>
    return lambda state: np.random.choice([*range(policy.shape[1])], p=policy[state, :])
  File "mtrand.pyx", line 956, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities are not non-negative

And even with the above setup_mdp, the IRL methods don't seem to produce negative reward estimates (see the colourbar I've added):

True rewards:
reward_estimate_true

Estimated rewards with maxent:
reward_estimate_maxent

Estimated rewards with causal maxent:
reward_estimate_maxent_causal

if the trajectory stays n the terminal state (for a limited number of times)

hi thank you sooooooo much for this amazing repo.
I have been trying to build mu own environment but I faced some issues.
what if we have something like this : going from s0 to s1 to s2 and then staying in s3 for ever
(I changed the value iteration so now my trajectories are all 50 steps ) so my svf is something like(1,1,1,47, 0,...,0)
However I am facing some difficulties.
my zs and za start getting so big and then they become nan. and this ends in my omega to be nan as well
I was wondering if you have any idea how I can fix it? and what is the problem.
I am reading Dr.Zeibart's thesis but still have no clue how to tackle such problem(since z_terminal is 1 I am thinking maybe that results in the problem)
if you have any idea I would be so grateful if you share your thoughts
Thanks again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.