Hi, The code around calculating the mean Q value that is written to

Hi, sorry! Yes, I understand that a bit better now. I'm tryin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Mean Q calculations different from paper (or incorrect?) about deep_q_rl HOT 5 CLOSED

spragunr commented on August 25, 2024

Mean Q calculations different from paper (or incorrect?)

from deep_q_rl.

Comments (5)

AjayTalati commented on August 25, 2024

Unfortunately, I could'nt get any improvement from the suggested change after 15 epochs?

The numbers in results.csv are all roughly the same as the first epoch, and if I play the .pkl file from the 15th epoch it looks like its got Alzheimer's?

(Just out of curiosity, I wonder what your views on adding Monte Carlo Tree search to selecting training data are? It seems to significantly improve performance, there's a few well documented implementations. Its interesting too.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

from deep_q_rl.

alito commented on August 25, 2024

The changes don't touch the learning code. They just change what is recorded to results.csv.

(15 epochs is not enough to see much, at least for breakout. At 60 epochs it should be pretty clear though)

from deep_q_rl.

AjayTalati commented on August 25, 2024

Hi, sorry! Yes, I understand that script a bit better now.

I'm trying the changes now with Pong, that should be quick? I just wonder if you have a working ROM of Othello? Or a game to do quick tests with?

One last thing, is there a simple way of restarting the training of a saved network? It takes 36 hrs to run 100 epochs of Breakout and it got upto a per episode average of around 60.

I tried to restart the training, by replacing the environment and agent process start up lines p3 and p4 in ale_run.py with,

p3 = subprocess.Popen(['./rl_glue_ale_experiment.py', '--epoch_length', '50000'], env=my_env)

p4 = subprocess.Popen(['./rl_glue_ale_agent.py', "--nn_file", "/home/ajay/PythonProjects/deep_q_rl/_01-02-14-20_0p0001_0p9/network_file_100.pkl"], env=my_env)

Which loads the network and restarts the training fine, but its still not managed to get above the 60 level? Is this to be expected? Is it because the history of the dataset class is empty when the experiment is started again?

Output of Results.csv

epoch num_episodes total_reward reward_per_episode
1 10 439 43.9
2 10 445 44.5
3 10 459 45.9
4 9 421 46.7777777778
5 9 406 45.1111111111
6 10 420 42
7 10 400 40
8 9 462 51.3333333333
9 9 423 47
10 9 440 48.8888888889
11 10 438 43.8
12 10 396 39.6
13 9 380 42.2222222222
14 10 397 39.7
15 9 431 47.8888888889
16 8 459 57.375
17 10 418 41.8
18 11 346 31.4545454545
19 11 342 31.0909090909
20 11 401 36.4545454545
21 8 460 57.5
22 12 294 24.5
23 9 477 53

from deep_q_rl.

spragunr commented on August 25, 2024

@alito Thanks for pointing this out. I've addressed it in master by changing

self.holdout_data = self.data_set.random_batch(holdout_size * self.batch_size)[0]

self.holdout_data = self.data_set.random_batch(holdout_size)[0]

and increasing holdout size to 3200. I think this is a bit clearer because the batch size doesn't really have anything to do with this calculation.

As for cuDNN: that's a good idea, but it is unlikely to make it to the top of my todo list soon. For one thing, I'm still on CUDA 5.5. I would be willing to incorporate a pull request if you are interested in taking this on.

from deep_q_rl.

spragunr commented on August 25, 2024

@AjayTalati I don't remember where I found my ROM files, but if you google around you should be able to find any game you are interested in without too much difficulty.

It looks like your approach to resuming learning is correct. It may be that performance doesn't improve because the network has reached a local maxima.

from deep_q_rl.

Mean Q calculations different from paper (or incorrect?) about deep_q_rl HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent