Giter Club home page Giter Club logo

Comments (5)

AjayTalati avatar AjayTalati commented on August 25, 2024

Unfortunately, I could'nt get any improvement from the suggested change after 15 epochs?

The numbers in results.csv are all roughly the same as the first epoch, and if I play the .pkl file from the 15th epoch it looks like its got Alzheimer's?

(Just out of curiosity, I wonder what your views on adding Monte Carlo Tree search to selecting training data are? It seems to significantly improve performance, there's a few well documented implementations. Its interesting too.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

from deep_q_rl.

alito avatar alito commented on August 25, 2024

The changes don't touch the learning code. They just change what is recorded to results.csv.

(15 epochs is not enough to see much, at least for breakout. At 60 epochs it should be pretty clear though)

from deep_q_rl.

AjayTalati avatar AjayTalati commented on August 25, 2024

Hi, sorry! Yes, I understand that script a bit better now.

I'm trying the changes now with Pong, that should be quick? I just wonder if you have a working ROM of Othello? Or a game to do quick tests with?

One last thing, is there a simple way of restarting the training of a saved network? It takes 36 hrs to run 100 epochs of Breakout and it got upto a per episode average of around 60.

I tried to restart the training, by replacing the environment and agent process start up lines p3 and p4 in ale_run.py with,

p3 = subprocess.Popen(['./rl_glue_ale_experiment.py', '--epoch_length', '50000'], env=my_env)

p4 = subprocess.Popen(['./rl_glue_ale_agent.py', "--nn_file", "/home/ajay/PythonProjects/deep_q_rl/_01-02-14-20_0p0001_0p9/network_file_100.pkl"], env=my_env)

Which loads the network and restarts the training fine, but its still not managed to get above the 60 level? Is this to be expected? Is it because the history of the dataset class is empty when the experiment is started again?

Output of Results.csv

epoch num_episodes total_reward reward_per_episode
1 10 439 43.9
2 10 445 44.5
3 10 459 45.9
4 9 421 46.7777777778
5 9 406 45.1111111111
6 10 420 42
7 10 400 40
8 9 462 51.3333333333
9 9 423 47
10 9 440 48.8888888889
11 10 438 43.8
12 10 396 39.6
13 9 380 42.2222222222
14 10 397 39.7
15 9 431 47.8888888889
16 8 459 57.375
17 10 418 41.8
18 11 346 31.4545454545
19 11 342 31.0909090909
20 11 401 36.4545454545
21 8 460 57.5
22 12 294 24.5
23 9 477 53

from deep_q_rl.

spragunr avatar spragunr commented on August 25, 2024

@alito Thanks for pointing this out. I've addressed it in master by changing

self.holdout_data = self.data_set.random_batch(holdout_size * self.batch_size)[0]

to

self.holdout_data = self.data_set.random_batch(holdout_size)[0]

and increasing holdout size to 3200. I think this is a bit clearer because the batch size doesn't really have anything to do with this calculation.

As for cuDNN: that's a good idea, but it is unlikely to make it to the top of my todo list soon. For one thing, I'm still on CUDA 5.5. I would be willing to incorporate a pull request if you are interested in taking this on.

from deep_q_rl.

spragunr avatar spragunr commented on August 25, 2024

@AjayTalati I don't remember where I found my ROM files, but if you google around you should be able to find any game you are interested in without too much difficulty.

It looks like your approach to resuming learning is correct. It may be that performance doesn't improve because the network has reached a local maxima.

from deep_q_rl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.