yanpanlau / ddpg-keras-torcs Goto Github PK

View Code? Open in Web Editor NEW

715.0 34.0 267.0 14.31 MB

Using Keras and Deep Deterministic Policy Gradient to play TORCS

Python 99.57% Shell 0.43%

ddpg-keras-torcs's Introduction

Using Keras and Deep Deterministic Policy Gradient to play TORCS

300 lines of python code to demonstrate DDPG with Keras

Please read the following blog for details

https://yanpanlau.github.io/2016/10/11/Torcs-Keras.html

Installation Dependencies:

Python 2.7
Keras 1.1.0
Tensorflow r0.10
gym_torcs

How to Run?

git clone https://github.com/yanpanlau/DDPG-Keras-Torcs.git
cd DDPG-Keras-Torcs
cp *.* ~/gym_torcs
cd ~/gym_torcs
python ddpg.py

(Change the flag train_indicator=1 in ddpg.py if you want to train the network)

ddpg-keras-torcs's People

Contributors

Stargazers

Watchers

Forkers

mphielipp vyraun wanjinchang amoliu zhongxingpeng dongleecsu benjamesbabala techscientist jjdblast realentertain leezqcst davidattw cowhi ml-lab tmhm wesleyjtann 123fengye741 michaelxin iamsile 361793842 ghliu codeaudit babooppa6 hedgefair hanyuguo hmate9 ioriiod0 learningto aminzheng maxwell113 williamd4112 awentzonline ashispapu twmht collector-m tanjundong muhammadhamed sufengniu hli2020 galvin-mj kentchun33333 adgaudio smasoudn daviszuo davidtranno1 yanadsl boluoyu hejunzz couragelfyang williamfalcon kastnerkyle glebalshanskii christycui f133030 billyzs l0sg klvn930815 mabirck liucx kezpitt praneetdutta zoonono kerinin ossdc mohantypunit ypeng0126 joshgf sunshineclt zrclll zhangyang5511 waxz wenjiebit shehroze37-zz kongmo wang90063 zdx3578 vtpp2014 cnn-gan feinforcement-learning picopoco ivan33609 yhyu13 tsiakask iunknown10 gearchen kreco iqbal-chowdhury skyz8421 capri2014 ahong007007 droiter darwinbeing win2cs chenglongchen timecracker lukeeeeee qhduan meelement matheuscamargo rongkangwang

ddpg-keras-torcs's Issues

reward and action does not pair?

Hi,
from the gym_torcs.py code I find how env.step() works

        # Apply the Agent's action into torcs
        client.respond_to_server()
        # Get the response of TORCS
        client.get_servers_input()

If I did not get it wrong, this code get the reward immediately after the action is applied
So the reward should have no relationship with the action?
If so, why could the result go right?

screen input

Hi,

I try to use the raw screen image as input to train the torcs based on your code, but every 3 episodes restarting the torcs, the initial screen input shows an awkward pose and I need to press F2 to switch it.

Is any solution to it? Thanks

Does the output num of critic network (Q-value) should be 1? But the code is 3?

Hi,
I wonder to know the num of Q-value in critic network should be 1 or 3?
It is 3 in code,but I don't know the reason.
Thank you.

Unlearning After Completing One Lap

Hello,
I have a problem which I don't know where to start on. The start of training is fine where the reward increases mostly with every episode, and the car gets further and further away before leaving the lane. However after the car gets good enough to complete a lap (on a simple track) it unlearns what it has learned. It gets very unstable and the steering outputs get saturated causing massive oscillation and the car to spin out of control very quickly. Does anyone one have any idea as to what could be the cause of this? I am using tensorflow itself rather than Keras. I am also using xavier initialization on my network weights. The network structures and activation functions are the same as that in the DDPG paper.

Unable to run the package. TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Hi,

Thanks for your package and the article along.

Unfortunately, I am not able to test your package, receiving the following error after issuing command python ddpg.py in gym_torcs directory:

--
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.885
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.73GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:948] Ignoring visible gpu device (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
Now we build the model
Traceback (most recent call last):
File "ddpg.py", line 162, in
playGame()
File "ddpg.py", line 52, in playGame
actor = ActorNetwork(sess, state_dim, action_dim, BATCH_SIZE, TAU, LRA)
File "/home/learning/gym_torcs/ActorNetwork.py", line 25, in init
self.model , self.weights, self.state = self.create_actor_network(state_size, action_size)
File "/home/learning/gym_torcs/ActorNetwork.py", line 54, in create_actor_network
V = merge([Steering,Acceleration,Brake],mode='concat')
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1528, in merge
name=name)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1188, in init
self.add_inbound_node(layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 154, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1275, in call
return K.concatenate(inputs, axis=self.concat_axis)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 716, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1029, in concat
dtype=dtypes.int32).get_shape(
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 637, in convert_to_tensor
as_ref=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 702, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 110, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 99, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Can you tell me what can be the problem?

Sincerely,

Someting error when running on Keras2.0.3

When I tried your code under Keras2.0.3 and tensorflow1.0.1 backend, some error occurs.
Python version is 3.6

Which part should I modify if I want to run it correctly and why?

Thanks a lot.

Here are the detail imformation:

Traceback (most recent call last):
File "ddpg.py", line 162, in
playGame()
File "ddpg.py", line 52, in playGame
actor = ActorNetwork(sess, state_dim, action_dim, BATCH_SIZE, TAU, LRA)
File "/home/test/gym_torcs/ActorNetwork.py", line 25, in init
self.model , self.weights, self.state = self.create_actor_network(state_size, action_size)
File "/home/test/gym_torcs/ActorNetwork.py", line 51, in create_actor_network
Steering = Dense(1,activation='tanh',init=lambda shape, name: normal(shape, scale=1e-4, name=name))(h1)
File "/home/test/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 551, in call
self.build(input_shapes[0])
File "/home/test/anaconda3/lib/python3.6/site-packages/keras/layers/core.py", line 827, in build
constraint=self.kernel_constraint)
File "/home/test/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 384, in add_weight
weight = K.variable(initializer(shape), dtype=K.floatx(), name=name)
TypeError: () missing 1 required positional argument: 'name'

how to train my owe model?

how to put the torcs's data into training network? i am a new comer,thank you !

The Car runs very slowly.

I just run the ddpg.py and install all the dependencies the code needs , But the car runs very slowly, why that happen , Is there any solution that can solves this problem.

question on replay buffer

Hello, Thanks for a nice and simple code. But I am confused about this line

            batch = buff.getBatch(BATCH_SIZE)

it is 110 no line code in ddpg.py. the problem is the buffer has only one data . How can you sample a batch of data if it has less data than batch size ?

Torcs is continuously relaunching!

Hi,

I am trying to run your ddpg code. I setup Torcs and all required packages. When I run torcs in my terminal, it runs well and I can play without any problem. But, whenever I try to execute the ddpg code, the torcs environment continuously relaunches. I am not been able to get the issue.

Here is the snapshot.

I also studied gym_torcs.py where reset() functions are written, autostart.sh, and snakeoil3_gym.py. But, couldn't resolve it! Can you please tell me if I'm missing something!

Backward image

Hi, thanks a lot for uploading your code!
During train or test, after every episode there seems to be a another play where it seems like agent is looking backwards, or in different perspective.
Also, when i run code initially it is not able to drive smoothly in track, despite setting train to zero.
Would be glad if you can help!
Thanks :)

DDPG replication

Hi,

I believe that in DDPG the Value function output is a single scalar and not same as action size. Hence this line in CriticModel.py should be

V = Dense(1,activation='linear')(h3)

Corresponding in ddpg.py the definition of y_t can be changed to

y_t = np.zeros((states.shape[0],1))

Although I'm not sure how this would affect learning, I believe this is the right way to replicate ddpg.

OU noise

I believe the OU noise should be something like this:

action_noise = theta * (mu - action) + sigma * np.sqrt(dt) * np.random.normal(size=mu.shape)

We need the "dt" as Torcs uses dt = 0.2 seconds, not 1

Licence

I didn't find any licence, @yanpanlau, could you, please, clarify on this.

Failed building wheel for Box2D-kengz

Hi Guys,

I have started working on Torcs using DDPG-Keras. This question is quite long, please bear with me.

As mentioned in the README, I followed the below procedure.

git clone https://github.com/yanpanlau/DDPG-Keras-Torcs.git
cd DDPG-Keras-Torcs
cp *.* ~/gym_torcs
cd ~/gym_torcs
python ddpg.py

While running python ddpg.py I got the below error:

kk@kk-Lenovo-ideapad-320-15ISK:~/gym_torcs$ python ddpg.py 
Traceback (most recent call last):
  File "ddpg.py", line 1, in <module>
    from gym_torcs import TorcsEnv
  File "/home/kk/gym_torcs/gym_torcs.py", line 1, in <module>
    import gym
ImportError: No module named gym

To fix this error, I started installing gym by running the below command

kk@kk-Lenovo-ideapad-320-15ISK:~/gym_torcs$ pip install gym[all]

But I ended up in the below errors while installing gym.

  Failed building wheel for Box2D-kengz

Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-nlkdIE/Box2D-kengz/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-6P0gPk-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-build-nlkdIE/Box2D-kengz/

I tried running python ddpg.py again, but still getting the same error as before.

kk@kk-Lenovo-ideapad-320-15ISK:~/gym_torcs$ python ddpg.py 
Traceback (most recent call last):
  File "ddpg.py", line 1, in <module>
    from gym_torcs import TorcsEnv
  File "/home/kk/gym_torcs/gym_torcs.py", line 1, in <module>
    import gym
ImportError: No module named gym

Please help me to fix this error.

Thank you,
KK

<lambda>() takes exactualy 2 arguments(1 given)

when I run "python ddpg.py" I got this problem, the detail is

File "/home/f84106612/test03/keras/engine/base_layer.py", line 249, in add_weight
weight = K.variable(initializer(shape), # 2018.7.25
TypeError: () takes exactly 2 arguments (1 given)

how can I solve this? Thank you

The run errors!

I have this error,when I run the program.Anyone can help me .Thanks!

''cp: 无法打开'/home/bcwang/.torcs/config/graph.xml' 读取数据: 权限不够
cp: 无法删除'/home/bcwang/.torcs/config/graph.xml': 权限不够
chmod: 更改'/home/bcwang/.torcs/config/graph.xml' 的权限: 不允许的操作
fopen(config/graph.xml) failed ''

What is the purpose of the attribute "meta" in client.R.d?

Can I just use "pip install torcs"?

I failed to install libglu1-mesa-dev,so I just use "pip install torcs". But it seems that many problem occur.
When I run snakeoil3_gym.py,it look like this and always relauch.
/usr/bin/python2.7 /home/hanzy/software/gym_torcs-master/snakeoil3_gym.py
Waiting for server on 3101............
Count Down : 5
Waiting for server on 3101............
Count Down : 4
Waiting for server on 3101............
Count Down : 3
Waiting for server on 3101............
Count Down : 2
Waiting for server on 3101............
Count Down : 1
Waiting for server on 3101............
Count Down : 0
Waiting for server on 3101............
Count Down : -1
relaunch torcs
Waiting for server on 3101............
Count Down : 4
Waiting for server on 3101............
Count Down : 3
Waiting for server on 3101............
Count Down : 2
Waiting for server on 3101............
Count Down : 1
Waiting for server on 3101............
Count Down : 0
Waiting for server on 3101............
Count Down : -1
relaunch torcs
Waiting for server on 3101............

When I run ddpg.py ,an error below happen.
/usr/bin/python2.7 /home/hanzy/software/gym_torcs-master/ddpg.py
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/hanzy/software/gym_torcs-master/ddpg.py", line 10, in
from keras.engine.training import collect_trainable_weights
ImportError: cannot import name collect_trainable_weights

Process finished with exit code 1

WHY!!!

Code does not converge to the optimal policy (tensorflow 0.12.1 and Keras 1.1.0)

Hi,

I tested your code after downgrading tensorflow to version 0.12.1 and Keras to version 1.1.0. So there was no error running the program but it does not converge to the optimal policy as you can see in the following result:

('Episode', 4, 'Step', 268993, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.098257193098788914, 'Loss', 0.021010929718613625)
('Episode', 4, 'Step', 268994, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0065970534669316794, 'Loss', 0.042674191296100616)
('Episode', 4, 'Step', 268995, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.00031545438826979712, 'Loss', 0.024418037384748459)
('Episode', 4, 'Step', 268996, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.019568415147379711, 'Loss', 0.014433514326810837)
('Episode', 4, 'Step', 268997, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.018506277374548623, 'Loss', 0.041520103812217712)
('Episode', 4, 'Step', 268998, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.010767906475774695, 'Loss', 0.015593868680298328)
('Episode', 4, 'Step', 268999, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.002420423072458714, 'Loss', 0.0040719900280237198)
('Episode', 4, 'Step', 269000, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.004262467498310836, 'Loss', 0.037771023809909821)
('Episode', 4, 'Step', 269001, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.026749610641322499, 'Loss', 0.028206927701830864)
('Episode', 4, 'Step', 269002, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0030027620641139251, 'Loss', 0.015061482787132263)
('Episode', 4, 'Step', 269003, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.081173965529346151, 'Loss', 0.036130554974079132)
('Episode', 4, 'Step', 269004, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.10121600213326937, 'Loss', 0.014408881776034832)
('Episode', 4, 'Step', 269005, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.030823456600634971, 'Loss', 0.0036484464071691036)
('Episode', 4, 'Step', 269006, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.01758625249384254, 'Loss', 0.016869166865944862)
('Episode', 4, 'Step', 269007, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.0040950106730622809, 'Loss', 0.0061152027919888496)
('Episode', 4, 'Step', 269008, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0015352772668305245, 'Loss', 0.0020759631879627705)
('Episode', 4, 'Step', 269009, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.013906418235719725, 'Loss', 0.013349947519600391)
('Episode', 4, 'Step', 269010, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.00019903168120680918, 'Loss', 0.033619172871112823)
('Episode', 4, 'Step', 269011, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.00017914316922484539, 'Loss', 0.026164039969444275)
('Episode', 4, 'Step', 269012, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.003975938587499582, 'Loss', 0.0075099128298461437)
('Episode', 4, 'Step', 269013, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0088532675322910981, 'Loss', 0.051988624036312103)
('Episode', 4, 'Step', 269014, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.0022390205788955062, 'Loss', 0.021969024091959)
('Episode', 4, 'Step', 269015, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.033058362379728756, 'Loss', 0.0012654899619519711)
('Episode', 4, 'Step', 269016, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.019517203300999583, 'Loss', 0.012628044933080673)
('Episode', 4, 'Step', 269017, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0091726067061909666, 'Loss', 0.032372094690799713)

As you can see actions are saturated. Could you help me to fix this problem?

Thanks,

How to get some host car data in global coordinates?

I am wondering how to get get global coordinates for the host car, such as X, Y, and Z.
(I think z position can be got by using the sensor value)
I need to calculate acceleration and jerk values from global coordinates data,
so I want to pull the concerned values from TORCS environment.

In cpp files, "car->_accel_x" means host car's X-axis acceleration in global coordinates.
And, if possible, I hope to get "car->_yaw", "car->_yaw_rate", "car-> _accel_y".
However, I have trouble in getting these values from cpp file to python file.

Could anyone help me for this problem?

Thanks

How to save the expert trajectories for imitation learning?

malloc():memory corruption

After train the network about 800 episode, error happend, it shows that Error in ' /usr/local/lib/torcs/torcs-bin': malloc(): memory corruption: 0x00007f9da0c6ff10.
And I test my parameters, find that the learning effect is not good, what method can solve this memory problem ,so that I can continue train my network. Thanks!

ImportError: cannot import name collect_trainable_weights

When I run the code, it display that:

Using TensorFlow backend.
Traceback (most recent call last):
File "ddpg.py", line 11, in
from keras.engine.training import collect_trainable_weights
ImportError: cannot import name collect_trainable_weights

please tell me how to solve it.thak you.

We have solved the problem of slow training, please have a look at my github

Hey guys. Our group took this as our course project and met with same questions as other issues said, like "too slow", "didn't learn anything". We tried to made some changes. And now it performs better! Please have a look at https://github.com/QiyuZ/auto_car_experience_training . And I really appreciate the original author's work.

The learning rates for actor and critic are different from those in DDPG paper

In your code:
actor learning rate is 1e-4
critic learning rate is 1e-3
However,in the original DDPG paper which is opposite
actor learning rate is 1e-3
critic learning rate is 1e-4
Could you explain why you make this adjustment,thanks

Cannot See the Cars.

Hi everyone, I've just run this interesting code and decided to learn from it. But I'm wondering that can you guys see the cars in the game? I have checked the gym_torcs.py but it seems nothing useful for this issue.

Is there any options I should take to visualize the cars? Thanks~

Has anyone tried using image input?

Hello,

Has anyone tried using image as input to train the network? I have worked that for couple of days using a 3 layers conv net to process image substituted original low-dimensional states, but it doesn't work properly.

the game was not rendering

Hi,

the game was not rendering (the rendering stuck here).

Any suggestion to investigate this issue?

Did anybody actually get this code to work without modifying anything?

How to switch torcs from GUI mode to text mode to accelerate the training process

If you assign "False" to the "vision" variable in the "ddpg.py" file, you may want to disable the GUI mode during training process. But it doesn't work if you only change the "vision" variable. Here I find a way to significantly switch torcs to text mode when training: Modify the 33rd line in "gym_torcs.py" like this:
os.system('torcs -T -nofuel -nolaptime &')
Here we append a new option "-T" for "torcs" command to enable the text mode.
It's really easy, but have bothered me for minutes. Hope this can help u guys getting started with the code.

Some error when running on tensorflow1.1.0 and Keras2.0.5

When I tried the code under Keras2.0.3 and tensorflow1.1.0, some error occurs.
Python version is 3.5.2

When I execute the command "python ddpg.py"
The detail imformation is shown below:

Traceback (most recent call last):
File "ddpg.py", line 10, in
from keras.engine.training import collect_trainable_weights
ImportError: cannot import name 'collect_trainable_weights'

I cannot sovle this problem. And I want to konw that which part should I modify if I want to run it correctly and why?

Thank you very much!!!

how can i train the model in the environment which not use the “practice” but use "quick race"

when i run the code "ddpg.py",i can't change anything.The code choose the environment by itself,which are named "practice".But I need to train the model under the environment "quick race".
please tell me ,how can i solve the problem,or how can i add the other cars in the environment "practice"
thank you very much.

Training does not learn anything

Hi,

After training for about 75000 steps, only 8 episodes have passed and the agent has not learnt anything useful. My question is: does the current ddpg.py file have the correct hyper-parameters? The saved .h5 model in this repo does indeed work well, so I was wondering if it was trained using the same hyper-parameters.

A part of the output:

('Episode', 8, 'Step', 76169, 'Action', array([[ 0.65779338,  0.13147314,  0.72898711]]), 'Reward', -0.11919999787368826, 'Loss', 0.76000809669494629)
('Episode', 8, 'Step', 76170, 'Action', array([[ 0.72513585,  0.05947028,  0.72827826]]), 'Reward', -0.23072561310301709, 'Loss', 0.12980197370052338)
('Episode', 8, 'Step', 76171, 'Action', array([[ 0.71874098,  0.12731329,  0.73625542]]), 'Reward', 0.012969128165889354, 'Loss', 0.46787607669830322)
('Episode', 8, 'Step', 76172, 'Action', array([[ 0.6552096 ,  0.11575382,  0.72765934]]), 'Reward', -0.0072166273178536217, 'Loss', 0.                9427763819694519)

How the TORCS screen looks right now:

In short, the training does not seem to proceed smoothly. Can you please verify if the current version of the code trains well? If not, I suspect the hyper-parameters may have changed between the model which worked well and this version of the code.

Thanks,

The train problem!

when I run the ddpg.py,I find the car can not move(the indicator is zero).But when I modify the indicator to one,the terminal outputs "Timeout answer for client".Do you have solutions for this problem?Thanks!
like this:

'imeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer'

Episode error

When I tried run your file ddpg.py as your introduction, there are some problems. Just like what is showed in the picture, the Episode is zero all the time, and the car don't move at all. I just learn RL for a short time, so I don't understand why. Would you please help me? The is my we-chat number: wa7739977526. Thank you very much.

getting error for vision

changing vision = True gives following error

  File "ddpg2.py", line 192, in <module>
    playGame()
  File "ddpg2.py", line 84, in playGame
    ob = env.reset(relaunch=True)   #relaunch TORCS every 3 episode because of the memory leak error
  File "/home/user/gym_torcs/gym_torcs.py", line 192, in reset
    self.observation = self.make_observaton(obs)
  File "/home/user/gym_torcs/gym_torcs.py", line 275, in make_observaton
    image_rgb = self.obs_vision_to_image_rgb(raw_obs[names[8]])
  File "/home/user/gym_torcs/gym_torcs.py", line 232, in obs_vision_to_image_rgb
    r = image_vec[0:len(image_vec):3]
TypeError: object of type 'float' has no len()

collect_trainable_weights seems to be out of date

Hi Lau,

Thanks for sharing your work! Quite nice!
I found that in keras1.1.2 version collect_trainable_weights func has been removed, and in fact this func is not used in your src, why keep the line of importing this?

How to test the trained model

Hi,

Thanks for this excellent model. It was really helpful.

My question is:

As a part of initial testing, I have un-commented the lines from 146 to 155 in gym_torcs.py. And trained the model with just 200 episodes by setting train_indicator = 1 and python ddpg.py. So that I could test the model by training with fewer episodes. So in order test I have commented the lines 146 to 155 and set the train_indicator as 0 and run python ddpg.py.

However I am not able to see any movement of the car. Could anybody please help me to validate this.

Thank you,
KK

Dimension mismatch for self.action_grads = tf.gradients(self.model.output, self.action)

I feel like I must be missing something considering nobody has brought this up yet.. however I'm getting an error with trying to feed my states and a_for_grad into grads = critic.gradients(states, a_for_grad) .

Why is self.action_grads = tf.gradients(self.model.output, self.action) created with self.model.output if we are then feeding in the states, which are of different dimensions (and is what is throwing the issues for me)

Model doesn't seem to save weights

Hi -

Whenever I load previous weights, it doesn't seem to actually be loading them. Case in point, my previous run ended with ~11k reward on the last few episodes. When I load the model that it saves (lines 67-70 in ddpg.py), it reverts back to <300 reward per episode. Please advise.

Lou

Track change failed

I trained a model in the track CG Speedway number 1, and it work well when test in this track. But when I change the track(e.g. to CG track 2), and the trained model seem not work, the car run out of the track. Is that means for every different track, I have to train different models ?

The result is not so good as the video in blog

Hi，I run the code,the car can run normally just in that little period .Aftering that period ,the agent encounters a collision.The result video in your blog is so confusion! Can you show the whole video? @yanpanlau

ImportError:No module named 'gym'

When I run python ddpg.py,I have the following problem:
terry@terry:~/gym_torcs$ python3 ddpg.py
Traceback (most recent call last):
File "ddpg.py", line 1, in
from gym_torcs import TorcsEnv
File "/home/terry/gym_torcs/gym_torcs.py", line 1, in
import gym
ImportError: No module named 'gym'

How am I supposed to solve this issue?

some problem with keras 2.0.3

Using TensorFlow backend.
Traceback (most recent call last):
File "ddpg.py", line 10, in
from keras.engine.training import collect_trainable_weights
ImportError: cannot import name collect_trainable_weights

Is there any example about how to run the game without GUI ?

I am trying to run the game on a cluster ,Is any one knows how to run the game without GUI?

gym_torcs error: cannot send car state AFTER about 10700 steps

Hi, I try the vision input for ddpg algorithm. Everything seems OK, but after about 10770 steps, there is something wrong with socket communication. The torcs complains Error: cannot send car stateTimeout for client answer.

I check the file scr_server.cpp (about line 570) in gym_torcs and the reason is that socket function sendto() fails (the following code).

if (sendto(listenSocket[index], stateString.c_str(),stateString.length() + 1, 0, 
        (struct sockaddr *) &clientAddress[index],sizeof(clientAddress[index])) < 0)
std::cerr << "Error: cannot send car state";

After step 1770, every call to sendto() fails. Is there anyone have the same problem? Thanks.

Note: the networks start to train after 10000 steps. I have tried batch_size = 16, 64 but ended with the same problem.

why do we resample new actions before finding their gradients?

DDPG-Keras-Torcs/ddpg.py

Line 128 in 455fade

a_for_grad = actor.model.predict(states)

Hi Ben, great work!
I wonder why do you resample new actions in the line 128. This step has been done at line 89:

DDPG-Keras-Torcs/ddpg.py

Line 89 in 455fade

a_t_original = actor.model.predict(s_t.reshape(1, s_t.shape[0]))

From the way I understand the paper, I think it is better to find the gradients of actions with OU noise, which are added to the buffer. Correct me if I am wrong!
And btw, I am using a good portion of your code in my DDPG project! Thanks a million. I will make sure to site you!

To train the model in remote machine and to restart training from where we left

Hi,

I changed the files based on the link below
#36

I could run few episodes without GUI but later I am getting error as
"Unable to open display 'default"
Can you have a check on this.

Can we restart the training from where we left the training and is there a way to know how many episodes has been already completed.?

Training Time and hardware used for training

Hi
Great Work! I m trying to run it (Training) on my laptop. it s been training for 2 hours and it is still at the first episode. (17000 steps). and by the way the car is not moving, it used to but it just stopped at some point i don t know why... i did not change anything to the code appart from the (training_indicator from 0 to 1 and i deleted the .h5 files to train it from zero knowledge) so i m wondering if this is normal. Also i would really like to know how long was the training on your hardware and what was your hardware.
My laptop is a samsung series 7 ultra notebook.

yanpanlau / ddpg-keras-torcs Goto Github PK

ddpg-keras-torcs's Introduction

Using Keras and Deep Deterministic Policy Gradient to play TORCS

Installation Dependencies:

How to Run?

ddpg-keras-torcs's People

Contributors

Stargazers

Watchers

Forkers

ddpg-keras-torcs's Issues

Recommend Projects

Recommend Topics

Recommend Org