Giter Club home page Giter Club logo

Comments (11)

ppaquette avatar ppaquette commented on August 11, 2024
  • 3200% seems to be the maximum allowed value by fceux.
  • Not sure how to skip the intro screens faster, I'm starting fceux, then forcing a value in the world and level memory addresses, skipping frames until the timer starts to decrease, then I'm establishing the pipe with python.

Maximum efficiency could be achieved by coding directly in fceux (lua), but that would make it incompatible with gym / python.

Another option is probably to run iterations in parallel and update weights on a central server.

from gym-super-mario.

gabegrand avatar gabegrand commented on August 11, 2024

Hi Philip, we're still having some trouble achieving enough training iterations in a reasonable amount of time. The RL methods we're using are pretty standard (Q-learning, SARSA, approximate Q learning), and they would be difficult to parallelize, since they require iterative updates that depend on previous calculations done in serial. The training time issue is very big for us, since we need to test out several different algorithm variations and hyperparameter configurations in order to write our final paper for our course at Harvard.

If the emulator speed is already maxed out, we should look into ways we can decrease the amount of time spent on the intro screens. Is there any way to skip all frames before the timer starts? Another approach to consider would be to keep the emulator open for the entire duration of the training, and manually reset the number of lives to 3x after every life. That way, you would only have to establish the pipe with python once during the whole training sequence. What do you think?

from gym-super-mario.

ppaquette avatar ppaquette commented on August 11, 2024

I'll try to see if I can skip the intro by saving the memory state.

What kind of % improvement do you need vs the current speed?
Should I only optimize the tiles version?

from gym-super-mario.

gabegrand avatar gabegrand commented on August 11, 2024

Currently, it takes approx. 4500s = 75 mins to train 100 iterations on World 1-3. That particular level has a cliff right at the beginning, so Mario usually dies very quickly, which means that the training speed we achieved of 45s / iteration on that level is probably a best case scenario. In order to make it serviceable, we'd ideally like to see a 10x increase in training speed, which would allow us to get close to 1000 iterations per hour. We would need that kind of speed in order to test out different combinations of hyperparameters of our model.

We're only using the tiles version, so from our perspective, it's fine if you'd like to focus on optimizing that. Thank you again for your efforts.

from gym-super-mario.

ppaquette avatar ppaquette commented on August 11, 2024

I should have something ready by Tuesday or Wednesday.

from gym-super-mario.

ppaquette avatar ppaquette commented on August 11, 2024

OpenAI released 'Universe' today, a way to convert any game to a gym env through a docker container (communication is done through VNC).

I'll do a quick patch for you, but I'll probably need to make this env compatible with Universe in the future

Universe also has a A3C (asynchronous advantage actor-critic) learning algo available that can be run across a cluster. (see https://github.com/openai/universe-starter-agent).

from gym-super-mario.

ppaquette avatar ppaquette commented on August 11, 2024

Pushed the fix to the 'gabegrand' branch. Mario is on steroid.

  • The info var now returns an 'iteration' key, that is increased when the level is restarted.
  • You don't need to call reset(), except to first initialize the env
  • To check if Mario has completed the level, check the value of the distance key when the iteration key is increased. The flag pole is 40 'meters' before the castle. The castle distance are here. (e.g. if Mario reaches 2474 (2514 - 40) in level 1-3, he successfully completed the level).

Here is a quick python script that works for me:

import gym
import ppaquette_gym_super_mario

env = gym.make('ppaquette/SuperMarioBros-1-3-Tiles-v0')
env.reset()

curr_iter = 1;
max_iter = 2;
while curr_iter <= max_iter:
    action = env.action_space.sample()
    obs, rew, done, info = env.step(action)
    if (info['iteration'] > curr_iter):
      print('Max Distance Achieved', info['distance'])
      curr_iter = info['iteration'];

env.close()

from gym-super-mario.

gabegrand avatar gabegrand commented on August 11, 2024

Hi Philip, thanks for the fix. I see that the number of lives now starts at 9x, and that the info var / iteration key is behaving as expected. However, I'm still not really seeing an increase in the game speed - it seems to be running at roughly the same speed as before. Are you seeing significant speedup in the framerate on your end?

from gym-super-mario.

ppaquette avatar ppaquette commented on August 11, 2024

I just ran 100 episodes (random actions) on level 1-3, and it took 391.89 seconds (so ~ 900 episodes / hour).

Try running it in a cloud VM and compare it to my benchmark using random actions.

  • The game saves an initial state when first loaded, and reloads that state when Mario dies (much faster then killing and restarting fceux at every iteration), which should give roughly 2x increase
  • The game repeats every action for 6 frames (1 processed, 6 repeated - Used to be 1 processed, 1 repeated), which should give roughly 5-6x increase.

from gym-super-mario.

gabegrand avatar gabegrand commented on August 11, 2024

On World 1-3, running on my machine, the script you provided took 910.974s for 100 episodes. That's not quite up to what you recorded, but there is definitely some speedup from the previous version. Also, we no longer have to close and re-open the emulator every time, which is nice.

I have a couple questions / comments about the new code:

  • Previously, we had written our code to duplicate actions for a certain number of frames (otherwise, Mario's behavior is too frantic/jumpy since he is constantly taking actions). However, you mentioned that the game now repeats every action for 6 frames. I'm wondering whether we should now remove this behavior from our code, to avoid repeating actions for too many frames?

  • Does the done variable in obs, rew, done, info = env.step(action) ever return True? Or do we need to just replace all done conditions with if (info['iteration'] > curr_iter)?

  • env.close() seems to be not working. The emulator just beachballs and doesn't close.

from gym-super-mario.

ppaquette avatar ppaquette commented on August 11, 2024
  1. Yes you should remove the skip actions from your code. If you want to adjust the value, just edit this line: https://github.com/ppaquette/gym-super-mario/blob/gabegrand/ppaquette_gym_super_mario/lua/super-mario-bros.lua#L48
  2. done will always be false, since reset() doesn't need to be called. You need to replace all done with info['iteration'] > curr_iter
  3. Just kill the fceux process, or press ctrl c and close it manually

from gym-super-mario.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.