Giter Club home page Giter Club logo

packtpublishing / hands-on-intelligent-agents-with-openai-gym Goto Github PK

View Code? Open in Web Editor NEW
359.0 19.0 146.0 50.86 MB

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch

Home Page: https://www.packtpub.com/big-data-and-business-intelligence/hands-intelligent-agents-openai-gym

License: MIT License

Python 96.77% Shell 3.23%
intelligent-agents deep-reinforcement-learning openai-gym carla-simulator dqn pytorch learning-agents pytorch-a3c pytorch-carla advantage-actor-critic

hands-on-intelligent-agents-with-openai-gym's Introduction

Hands-on Intelligent Agents with OpenAI Gym (HOIAWOG)

The Book Examples of agents you will learn to develop

Topics Covered

HOIAWOG!: Your guide to developing AI agents using deep reinforcement learning. Implement intelligent agents using PyTorch to solve classic AI problems, play console games like Atari, and perform tasks such as autonomous driving using the CARLA driving simulator.

Chapter 8 demo BookAuthority Best Reinforcement Learning eBooks of All Time

Chapter list:

(Click to learn more)

Citing

If you use the code samples in your work or want to cite the book, please use:

@book{Palanisamy:2018:HIA:3285236,
 author = {Palanisamy, Praveen},
 title = {Hands-On Intelligent Agents with OpenAI Gym: Your Guide to Developing AI Agents Using Deep Reinforcement Learning},
 year = {2018},
 isbn = {178883657X, 9781788836579},
 publisher = {Packt Publishing},
}
Other Formats: (Click to View)

MLA
Palanisamy, Praveen. Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd, 2018.
APA
Palanisamy, P. (2018). Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd.
Chicago
Palanisamy, Praveen. Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd, 2018.
Harvard
Palanisamy, P., 2018. Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd.
Vancouver
Palanisamy P. Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd; 2018 Jul 31.

### Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781788836579

hands-on-intelligent-agents-with-openai-gym's People

Contributors

packt-itservice avatar packtutkarshr avatar praveen-palanisamy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hands-on-intelligent-agents-with-openai-gym's Issues

Transfer a policy to CARLA

Hi @praveen-palanisamy, thanks for doing great work.

I have a policy from an RL algorithm in .pk format and I want to see if the car drives well or not on CARLA. Do you know how can I do that or refer me to a good tutorial for this purpose?

test_agent_proc.start()
test_agent_proc.join()

Thanks.

A2C agent doesn't act

Hi,

For some reason when I try to run the trained a2c agent in CARLA it doesn't take any actions, just sits there doing nothing. These are my terminal outputs:

(rl_gym_book) amakri@amakri-Zephyrus-M-GM501GS:~/Hands-On-Intelligent-Agents-with-OpenAI-Gym-master/ch8$ python a2c_agent.py --env Carla-v0 --gpu-id 0
Loaded Advantage Actor-Critic model state from trained_models/A2C_Carla-v0.ptm which fetched a best mean reward of: -0.08981079755592294 and an all time best reward of: 1.0340544147491457
Initializing new Carla server...
Error connecting: (localhost:56639) failed to connect: [Errno 111] Connection refused, attempt 0
Start pos 36 ([0.0, 3.0]), end 40 ([0.0, 3.0])
Starting new episode...
actor0:Episode#:0 ep_reward:5.085380562726176e-10 mean_ep_rew:5.085380562726176e-10 best_ep_reward:1.0340544147491457
ERROR: tcpserver 56641 : error reading message: End of file
Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...
actor0:Episode#:1 ep_reward:1.574227504116339e-09 mean_ep_rew:1.0413827801944784e-09 best_ep_reward:1.0340544147491457
ERROR: tcpserver 56641 : error reading message: End of file
Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...
actor0:Episode#:2 ep_reward:1.461007756921785e-20 mean_ep_rew:6.94255186801189e-10 best_ep_reward:1.0340544147491457
ERROR: tcpserver 56641 : error reading message: End of file
Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...
actor0:Episode#:3 ep_reward:-1.8116123954554487e-19 mean_ep_rew:5.206913900556014e-10 best_ep_reward:1.0340544147491457
ERROR: tcpserver 56641 : error reading message: End of file
Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...
actor0:Episode#:4 ep_reward:8.738404666586943e-10 mean_ep_rew:5.9132120537622e-10 best_ep_reward:1.0340544147491457
ERROR: tcpserver 56641 : error reading message: End of file
Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...

Any ideas on why this is happening?

ERROR in Class Q_Learner

Following you example , the code keeps throwing error

      1 agent = Q_Learner(env)
----> 2 learned_policy = train(agent, env)

<string> in train(agent, env)

<string> in learn(self, obs, action, reward, next_obs)

IndexError: too many indices for array 

in the line
td_target = reward + self.gamma * np.max(self.Q[discretized_next_obs])

ch7 error during reset

When I run python carla-gym/carla_gym/envs/carla_env.py

I always have this error:

Initializing new Carla server...
Start pos 36 ([0.0, 3.0]), end 40 ([0.0, 3.0])
Starting new episode...
Error during reset: Traceback (most recent call last):
  File "/Hands-On-Intelligent-Agents-with-OpenAI-Gym/ch7/carla-gym/carla_gym/envs/carla/client.py", line 174, in _read_sensor_data
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Hands-On-Intelligent-Agents-with-OpenAI-Gym/ch7/carla-gym/carla_gym/envs/carla_env.py", line 223, in reset
    return self.reset_env()
  File "/Hands-On-Intelligent-Agents-with-OpenAI-Gym/ch7/carla-gym/carla_gym/envs/carla_env.py", line 290, in reset_env
    image, py_measurements = self._read_observation()
  File "/Hands-On-Intelligent-Agents-with-OpenAI-Gym/ch7/carla-gym/carla_gym/envs/carla_env.py", line 397, in _read_observation
    measurements, sensor_data = self.client.read_data()
  File "/Hands-On-Intelligent-Agents-with-OpenAI-Gym/ch7/carla-gym/carla_gym/envs/carla/client.py", line 127, in read_data
    return pb_message, dict(x for x in self._read_sensor_data())
  File "/Hands-On-Intelligent-Agents-with-OpenAI-Gym/ch7/carla-gym/carla_gym/envs/carla/client.py", line 127, in <genexpr>
    return pb_message, dict(x for x in self._read_sensor_data())
RuntimeError: generator raised StopIteration

Clearing Carla server state
ERROR:Initializing new Carla server...

Do you know how to fix it? Really thanks a lot.

ModuleNotFoundError: No module named 'gym.envs.atari' still happening

I've seen this issue closed already but I tried the proposed solutions and am getting the same errors:

(base) profversaggi@ubuntu-nuc:~/OpenAIGym/gym$ python run_gym_env.py Alien-ram-v4 2000
/home/profversaggi/OpenAIGym/openai-gym/lib/python3.9/site-packages/ale_py/roms/init.py:94: DeprecationWarning: Automatic importing of atari-py roms won't be supported in future releases of ale-py. Please migrate over to using ale-import-roms OR an ALE-supported ROM package. To make this warning disappear you can run ale-import-roms --import-from-pkg atari_py.atari_roms.For more information see: https://github.com/mgbellemare/Arcade-Learning-Environment#rom-management
_RESOLVED_ROMS = _resolve_roms()
/home/profversaggi/OpenAIGym/gym/gym/envs/registration.py:505: UserWarning: WARN: The environment Alien-ram-v4 is out of date. You should consider upgrading to version v5 with the environment ID ALE/Alien-ram-v5.
logger.warn(
Traceback (most recent call last):
File "/home/profversaggi/OpenAIGym/gym/run_gym_env.py", line 17, in
run_gym_env(sys.argv)
File "/home/profversaggi/OpenAIGym/gym/run_gym_env.py", line 9, in run_gym_env
env = gym.make(argv[1]) # Name of the environment supplied as 1st argument
File "/home/profversaggi/OpenAIGym/gym/gym/envs/registration.py", line 676, in make
return registry.make(id, **kwargs)
File "/home/profversaggi/OpenAIGym/gym/gym/envs/registration.py", line 520, in make
return spec.make(**kwargs)
File "/home/profversaggi/OpenAIGym/gym/gym/envs/registration.py", line 139, in make
cls = load(self.entry_point)
File "/home/profversaggi/OpenAIGym/gym/gym/envs/registration.py", line 55, in load
mod = importlib.import_module(mod_name)
File "/home/profversaggi/anaconda3/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 984, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'gym.envs.atari'

(base) profversaggi@ubuntu-nuc:~/OpenAIGym/gym$ pip list
Package Version


ale-py 0.7.4
atari-py 0.2.6

Box2D 2.3.10
box2d-py 2.3.5
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.12
cloudpickle 1.6.0
Cython 0.29.28
future 0.18.2
glfw 2.5.1
gym 0.23.1
gym-notices 0.0.6
idna 3.3
imageio 2.16.1
importlib-metadata 4.11.3
importlib-resources 5.6.0
lockfile 0.12.2
lz4 4.0.0
mujoco-py 1.50.1.68
numpy 1.22.3
opencv-python 4.5.5.64
Pillow 9.0.1
pip 22.0.4
pycparser 2.21
pygame 2.1.0
pyglet 1.5.0
requests 2.27.1
scipy 1.8.0
setuptools 61.0.0
six 1.16.0
torch 1.11.0
torchvision 0.12.0
typing_extensions 4.1.1
urllib3 1.26.9
wheel 0.37.1
zipp 3.7.0

It might be helpful to note that I'm using python virtual environments.

Continue to turn right when use non-discrete_actions

Hi @praveen-palanisamy ,

I want to test how the model performs in continuous action space, so I change to this:

ENV_CONFIG = {
    "discrete_actions": False,
    "use_image_only_observations": True,  # Exclude high-level planner inputs & goal info from the observations
    "server_map": "/Game/Maps/" + city,
    "scenarios": [scenario_config["Lane_Keep_Town2"]],
    "framestack": 2,  # note: only [1, 2] currently supported
    "enable_planner": True,
    "use_depth_camera": False,
    "early_terminate_on_collision": True,
    "verbose": False,
    "render" : True,  # Render to display if true
    "render_x_res": 800,
    "render_y_res": 600,
    "x_res": 80,
    "y_res": 80,
    "seed": 1
}

However, I only see the car keep turning right after I training it for around 10M steps. Do you have any idea to solve this problem?

Thanks a lot!

about 'Carla_env.py'

I have two questions about carla_env.py:

1 - I am interested to try other senarios such as turn or navigation. However when I change the following part:

"scenarios": [scenario_config["Lane_Keep_Town2"]],

to:

"scenarios": [scenario_config["Curve_Poses_Town2"]],

I face with the error. I have also changed the scenario in the json file to the following:

 "Curve_Poses_Town2": {
    "city": "Town02",
    "start_pos_id": 8,
    "end_pos_id": 24,
    "num_vehicles": 25,
    "num_pedestrians": 30,
    "weather_distribution": [0],
    "max_steps": 200
  },

2- If I wanted to save the observations in, for example, 64x64x3 should I modify the carla_env.py to the following:

    "framestack": 1,  # note: only [1, 2] currently supported
    "enable_planner": True,
    "use_depth_camera": False,
    "early_terminate_on_collision": True,
    "verbose": False,
    "render" : True,  # Render to display if true
    "render_x_res": 1800,
    "render_y_res": 1600,
    "x_res": 64,
    "y_res": 64,
    "seed": 1
}.

Particularity, dose the meaning of frame-stack is the same as RGB channel?

I am asking this question because when I gather some observations in numpy and convert them to RGB image, I get some meaningless images, which are not the images from the road.

wrapper for CARLA 0.9.x

Hello there!
First off, thank you for the dedicated work!
I am having an issue when trying to use the carla_env.py wrapper on CARLA 0.9.5. I am aware the wrapper was written for 0.8.x, but sections in the book concerning version updates gave me hope, that you guys might be able to help.

The error message upon running the carla_env.py script is as follows:

`Initializing new Carla server...
terminating with uncaught exception of type clmdep_msgpack::v1::type_error: std::bad_cast
Signal 6 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=115872 LargeMemoryPoolOffset=247008
Error during reset: Traceback (most recent call last):
File "/envs/carla_env.py", line 223, in reset
return self.reset_env()
File "/envs/carla_env.py", line 271, in reset_env
scene = self.client.load_settings(settings)
File "/envs/carla/client.py", line 75, in load_settings
return self._request_new_episode(carla_settings)
File "/envs/carla/client.py", line 160, in _request_new_episode
data = self._world_client.read()
File "/envs/carla/tcp.py", line 73, in read
header = self._read_n(4)
File "/envs/carla/tcp.py", line 91, in _read_n
raise TCPConnectionError(self._logprefix + 'connection closed')
carla.tcp.TCPConnectionError: (localhost:54854) connection closed

Clearing Carla server state
Initializing new Carla server...`
...
and the connection process restarts but keeps failing.

Do you have any suggestions what one could try to fix this error ?
thanks in advance!

How to replace self._spec = lambda: None

Hi @praveen-palanisamy ,

I am trying to use multiprocessing on carla_env in ch8, but I found this error

File "/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'CarlaEnv.__init__.<locals>.Data'

I think it might be because carla_env class use lambda in its init method.

I tried to use:

def _spec():
     return 0

or

class NS(object): 
           pass
        self._spec = NS()

they all failed, I guess that pickle functions can't work in python. Can you give me some ideas on how to fix this problem? What should I replace this lambda:None?

Deep Q-learning on Carla Env

Hello,

I'm intending to apply the chapter 6 deep q-learner algorithm on Carla environment. I noticed that the below call is commented:

#agent.learn(obs, action, reward, next_obs, done)

At which level is the algorithm operating the gradient descent and computing the new model parameters?

When I activated the line indicated above, I got this message error

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 6, 8, 8], but got 3-dimensional input of size [6, 84, 84] instead

generated at

function_approximator\cnn.py", line 34, in forward
x = self.layer1(x)

Could you please help?

Regards

Carla agent performance and training time

@praveen-palanisamy thank you for your extremely helpful code base. I have some questions that I hope you could give some insights into:

  • I noticed that the reward function is the one that was introduced in the 2017 Carla paper. The team who wrote the paper trained A2C with 10 agents for 10 million steps and they noted that even then their final agent is not a good one. Were you able to achieve a reasonably good agent, and in how long of training ?

  • I tried reimplementing non async A2C with the same reward function and discrete action space because my limited hardware can't handle multiple agents. Can I achieve the same level of performance to that of the async version of A2C ? So far, after several hundred thousand steps, the agent seems to only move straight and cannot make any turn at all, is that an expected behavior ?

  • In the scenario file, I noticed that you only use one pose in each scenario, did you train on only one too. Currently my implementation is switching back and forth between poses in the list of straight poses, curve poses etc. Will that make any difference ?

Once again, thank you for your time and insights.

Question about ch7

Hi Praveen,
I have a basic question about the Ch7. I am interested to collect some observations and actions in CARLA similar to the gym environment. To achieve that goal, one can use the following simple code. But running this leads to following error:

File "environment/carla_gym/envs/carla_env.py", line 326, in step_env
action = DISCRETE_ACTIONS[int(action)]
TypeError: only size-1 arrays can be converted to Python scalars

from environment import carla_gym
env = gym.make("Carla-v0")
def pick_random_action(t, current_action):
#     a = env.action_space.sample()
#     return a
    if t < 60:
        return np.array([0,1,0])
    
    if t % 5 > 0:
        return current_action

    rn = random.randint(0,9)
    if rn in [0]:
        return np.array([0,0,0])
    if rn in [1,2,3,4]:
        return np.array([0,1,0])
    if rn in [5,6,7]:
        return np.array([-1,0,0])
    if rn in [8]:
        return np.array([1,0,0])
    if rn in [9]:
        return np.array([0,0,1])



action = np.array([0,1,0])
for i_episode in range(500):
    print('-----')
    observation = env.reset()
    #env.render()
    t = 0
    done = False
    obs_sequence = []
    action_sequence = []
    while t < 300:
        t = t + 1
        action = pick_random_action(t, action)
        
        observation = observation.astype('float32') / 255.
              
        obs_sequence.append(observation)
        action_sequence.append(action)
        
        observation, reward, done, info = env.step(action)
    
    obs_data.append(obs_sequence)
    action_data.append(action_sequence)
    
    print("Episode {} finished after {} timesteps".format(i_episode, t+1))
    print("Dataset contains {} observations".format(sum(map(len, obs_data))))

Do you know how can I fix it?

Question about "import carla_gym" (register new env)

Hi,
When I run carla_env.py it works great. However, I am not able to run import carla_gym in python.

There is a sentence in a book which I have doubt about it:
You can then create new custom CARLA environments for each of those scenarios, which you can use with the usual gym.make(...) command after you have registered the custom environment, for example, gym.make("Carla-v0") .

Particularly, my question is how can import it in python.

What do you mean by registered. Would you elaborate this point?

Thanks.

use deep_Q_learner.py to train agent in Atari occured error

today i use the code in ch6 deep_Q_learner.py to train agent in Atari, as the instruction of the book, but i faced up some problems when i run the following command:
python deep_Q_learner.py --env RiverrideNoFrameskip-v4
and the error information:
Traceback (most recent call last): File "deep_Q_learner.py", line 287, in <module> agent.replay_experience() File "deep_Q_learner.py", line 170, in replay_experience self.learn_from_batch_experience(experience_batch) File "deep_Q_learner.py", line 151, in learn_from_batch_experience self.Q_target(next_obs_batch).max(1)[0].data TypeError: mul(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

I'm not familiar with pytorch, it seems to be relative with the version of pytorch. the torch i installed in the virtualenv is 1.0 (by pip install torch torchvision ). However, i run the Shallow_Q_Learner.py introduced by the book successfully. Is there anybody can help me locate and solve the error? thanks!

Change weather during the training

Hi @praveen-palanisamy ,

In my training environment, I want to change the weather during the training. For instance, rain in the first 50 steps and sunshine in the last 50 steps. But I have no idea what to do, could do you help me?

Thank you very much.

TCP connection error in ch 7

CARLA version: 0.9.13
Platform/OS: Linux 20.04
Problem you have experienced: TCP Connection error : Port closed -> Carla server is crashing.
What you expected to happen: Rendering carla simulator.
Steps to reproduce: python3 carla-gym/carla_gym/envs/carla_env.py

This the error I am getting while trying to run carla_env.py:


learing Carla server state
Traceback (most recent call last):
  File "carla_env.py", line 554, in <module>
    obs = env.reset()
  File "carla_env.py", line 231, in reset
    raise error
  File "carla_env.py", line 226, in reset
    return self.reset_env()
  File "carla_env.py", line 274, in reset_env
    scene = self.client.load_settings(settings)
  File "/home/ubuntu/carla-gym/carla_gym/envs/carla/client.py", line 75, in load_settings
    return self._request_new_episode(carla_settings)
  File "/home/ubuntu/carla-gym/carla_gym/envs/carla/client.py", line 160, in _request_new_episode
    data = self._world_client.read()
  File "/home/ubuntu/carla-gym/carla_gym/envs/carla/tcp.py", line 73, in read
    header = self._read_n(4)
  File "/home/ubuntu/carla-gym/carla_gym/envs/carla/tcp.py", line 91, in _read_n
    raise TCPConnectionError(self._logprefix + 'connection closed')
carla.tcp.TCPConnectionError: (localhost:44405) connection closed
Killing live carla processes set()
Clearing Carla server state

Sensors problem ch7

Hi @praveen-palanisamy ,
I am trying to run the code in Windows, the carla is running ok , but when i try to runn the code python carla_env.py
i am dealing with that problem
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File ".\carla_env.py", line 228, in reset
return self.reset_env()
File ".\carla_env.py", line 295, in reset_env
image, py_measurements = self._read_observation()
File ".\carla_env.py", line 402, in _read_observation
measurements, sensor_data = self.client.read_data()
File "C:\Users\R4A\OpenAi-Gym\HOIAWOG\ch7\carla-gym\carla_gym\envs\carla\client.py", line 127, in read_data
return pb_message, dict(x for x in self._read_sensor_data())
File "C:\Users\R4A\OpenAi-Gym\HOIAWOG\ch7\carla-gym\carla_gym\envs\carla\client.py", line 127, in
return pb_message, dict(x for x in self._read_sensor_data())
RuntimeError: generator raised StopIteration

input shape of carla-v0.ptm

Hi Praveen Palanisamy, your book is great!
actually I was trying to put the model carla-v0.ptm into env "HomoNcomIndePOIntrxMASS3CTWN3-v0" in macad-gym to see its behavior.
I found the obs.shape is (168,168,3) from the env. yet the input shape of the pretrained model is (168,168,6).
I wonder what's the proper inputs the model required, are those two RGB camera, or one RGB camera plus depth camera?
not sure if you still have the model's info, please help to have a look, thanks!

Getting actions while training A2C RL

Hello @praveen-palanisamy

I'm now evaluating many strategies of training A2C RL for Carla. Since visual evaluation through tensorboard is not showing the expected progress of action returns, I'm checking parts of code where I could probably enhance.

For example at this level:

action = action_distribution.sample()

It seems that actions are still being sampled randomly while training, aren't they assumed to be predicted by the current policy? Did I misunderstand or miss some details?

Thanks

Question about A3C (ch8)

Hi,

I have a question about the plots presented in Ch8, in the section of "Training and testing the deep n-step advantage actor-critic agent" in the book.

The Tensorbroad plots in this section present three graphs for actor0/actor_loss, actor0/critic_loss, and actor0/ep_reward. My question is what does x-axis present? I assume it is the number of frames. If so, it shows for a simple pendulum-v0 it takes roughly 3M frame to get some results? Am I right? I aware the A3C are not sample efficient, but I think this number is kinda unreasonable?

Tensorboard with pytorch

Hello @praveen-palanisamy

I went through your book and it's really helping.
I have a question regarding "Using TensorBoard for logging and visualizing a PyTorch RL agent's progress" p. 107 (chap 6). Which tool did you use to generate the graph reflecting the algorithm performance? While Tensorboard is more or less evident to use in Tensorflow, I haven't been able to find an equivalent tool in Pytorch.

Thank you in advance.

Best episode reward didn't save

So I trained the asyn_a2c model for 1.6M steps and obtained a best reward of 1.105. After ending the training, i tried to run the --test and it says my best reward from the model was now 0.036. Also I tried continuing the training and I am still getting a lower best reward of 0.036 instead of 1.105. What am I missing?

Problem with carla_env.py

Congratulation for this work !
I'm just beginner in the field of reinforcement learning and I have some basic questions.
I'm trying to implement the "a2c_gym.py" function in the Carla environment on windows.
I had a problem when calling carla_env.py due to "os.setsid" and "os.getpsid" specific commands to Linux.
I would like to know how can I replace them for windows? What is exactly their role in the process?
Another request please, how is the model of RL recorded after training? For example, in a CNN case, the output is a .h5 file containing resulting weights. We can use it therefore for further training or test. In the case of RL implemented, how do we get the model for further use?

questions about how to assign the reward for in the carla environment

Hi, recently i have been concentrated on training my agent in carla, it seemed my agent based on dqn did not bad. But i still cannot understand why you calculate the reward in this way:

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/master/ch8/environment/carla_gym/envs/carla_env.py
`

def calculate_reward(self, current_measurement):
    """
    Calculate the reward based on the effect of the action taken using the previous and the current measurements
    :param current_measurement: The measurement obtained from the Carla engine after executing the current action
    :return: The scalar reward
    """
    reward = 0.0

    cur_dist = current_measurement["distance_to_goal"]

    prev_dist = self.prev_measurement["distance_to_goal"]

    if self.config["verbose"]:
        print("Cur dist {}, prev dist {}".format(cur_dist, prev_dist))

    # Distance travelled toward the goal in m
    reward += np.clip(prev_dist - cur_dist, -10.0, 10.0)

    # Change in speed (km/hr)
    reward += 0.05 * (current_measurement["forward_speed"] - self.prev_measurement["forward_speed"])

    # New collision damage
    reward -= .00002 * (
        current_measurement["collision_vehicles"] + current_measurement["collision_pedestrians"] +
        current_measurement["collision_other"] - self.prev_measurement["collision_vehicles"] -
        self.prev_measurement["collision_pedestrians"] - self.prev_measurement["collision_other"])

    # New sidewalk intersection
    reward -= 2 * (
        current_measurement["intersection_offroad"] - self.prev_measurement["intersection_offroad"])

    # New opposite lane intersection
    reward -= 2 * (
        current_measurement["intersection_otherlane"] - self.prev_measurement["intersection_otherlane"])

    return reward

`
Is it a really well-considered solution to calculate reward? like does it think of the situation of traffic light,limit of speed and things like that. and what is each coefficient meaning?
I want to formulate a comprehensive way to calculate reward, but i donot have any good idea, i'm looking forward to your reply. @praveen-palanisamy

question in the carla_env.py

well, today I 'm engaged to reviewing and debugging the implementation of gym compatible environment-carlar_env.py, and I have found several bugs.

First, it's not necessary to define the GROUND_Z , which was mainly used to delivered to Planner instance. In fact, i think it is not advisable to use GROUND_Z, because different game map has different z value, we cannot just fix the ground z value of the vehicle.

Then, i was confused about this code snippet:
elif self.config["enable_planner"]: distance_to_goal = self.planner.get_shortest_path_distance( [current_measurement.transform.location.x, current_measurement.transform.location.y, current_measurement.transform.location.z], [current_measurement.transform.orientation.x, current_measurement.transform.orientation.y, current_measurement.transform.orientation.z], [self.end_pos.location.x, self.end_pos.location.y, self.end_pos.location.z], [self.end_pos.orientation.x, self.end_pos.orientation.y, self.end_pos.orientation.z]) / 100
while you calculate the distance to destination by deviding 100?
I do need your help, thanks a lot!

problems of running a2c_agent.py for Carla-v0

Hi, today i have studied the a2c_agent.py of the actor-critic algorithm, i tested it in several simple environment, and i thought this implementation needs millions steps to get the optimal policy.
Then i wanted to try it in the carla-gym environment, but i always got this kind error information:

`Initializing new Carla server...
Start pos 36 ([0.0, 3.0]), end 40 ([0.0, 3.0])

Starting new episode...
actor0:Episode#:0 ep_reward:0.08275407035052769 mean_ep_rew:0.08275407035052769 best_ep_reward:0.08275407035052769
ERROR: tcpserver 35940 : error reading message: End of file

Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...
actor0:Episode#:1 ep_reward:0.003965873271226895 mean_ep_rew:0.04335997181087729 best_ep_reward:0.08275407035052769
ERROR: tcpserver 35940 : error reading message: End of file

Start pos 36 ([0.0, 2.0]), end 40 ([-1.0, 2.0])
Starting new episode...
actor0:Episode#:2 ep_reward:0.010448467731475838 mean_ep_rew:0.03238947045107681 best_ep_reward:0.08275407035052769
ERROR: tcpserver 35940 : error reading message: End of file
`

it seemed the program has falled into the loop of resuming, i have reviewed the code, but i failed to locate the bug, could you help me?

Updates to setup

I had setup issues and fixed them as follows:

I ran the following after installing Conda on ubuntu:
source $HOME/anaconda/bin/activate
conda update conda -y

I also modified conda_env.yaml to remove all errors
added cython as a dependency
changed version or atari-py - atari-py==0.1.6

different scene after the call of reset_env()

hi, recently I have done many improvement to my agent and the carla_env, today I want to replace the env.reset() function with the env.reset_env() function in the loop of my training process, because each time the env.reset() needs to kill and sweep old carla server process , then create a new carla serve process. It's too time-consuming and occupy most percentage of CPU during the step.
So, I replace the env.reset() function with the env.reset_env() function, however, something unexpected happened. When I reset the carla environment first time, it shows me the first scene as the below picture. Next time I after reset_env, (it doesnot kill the process), it show me second scene as the below picture, then each time I call the reset_env(), the scene is always the second sceneใ€‚ I'm sure it is in the Town01. I have reviewed the code in carla_env.py, and I didnot find the cause of this problem.

Weird RGB camera position

Hi @praveen-palanisamy , thanks for the great work!

  1. When i imshow the observation in carla_env.py .
def _read_observation(self):
    ...
    cv2.imshow("obs", to_rgb_array(observation))
    cv2.waitKey()

It produced weird image like below.
obs_screenshot_06 03 2019

Then I noticed that

camera2 = Camera("CameraRGB")
camera2.set_image_size(
self.config["render_x_res"], self.config["render_y_res"])
camera2.set_position(30, 0, 130)
settings.add_sensor(camera2)

is different from the official client_example.py in 0.8.2 in line 267.

So i modified it to camera2.set_position(0.30, 0, 1.30)
The result of imshow becomes as expects.

obs

The reason of the weird observation may due to the incorrect camere position ?

  1. Could you please explains whats the Straight_Poses_Town2 (or others) in scenario.json for ?

  2. At the end of training the a2c_agent, are we expect the agent will drive safely (no collision/cross lane) around the town?
    Then what's the Lane_Keep_Town2 scenario for ?

Gatthering observations and actions in CARLA

Collecting samples in Open AI gym is fairly straightforward. But what is the best way to collect observations and actions in the complex environment such as CARLA including all driving scenarios such as straight, turn, navigation, and dynamic navigation.

I guess it would be beneficial to add some info/code snippet in the Wiki or repository as this is the first step of many RL algorithms.

Problem with CARLA and Gym

Hi,
Thanks for great work!

I have successfully installed CARLA (0.8.2) on Ubuntu 18 and able to see the car in the environment when I run:

~/software/CARLA_0.8.2/CarlaUE4.sh

However, we I run this command:

~/HOIAWOG/ch7$ python carla-gym/carla_gym/envs/carla_env.py

I will face with the following message:

`Traceback (most recent call last):
File "/home/baheri/HOIAWOG/ch7/carla-gym/carla_gym/envs/carla/transform.py", line 18, in
from . import carla_server_pb2 as carla_protocol
File "/home/baheri/HOIAWOG/ch7/carla-gym/carla_gym/envs/carla/carla_server_pb2.py", line 6, in
from google.protobuf import descriptor as _descriptor
ImportError: No module named 'google'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "carla-gym/carla_gym/envs/carla_env.py", line 32, in
from carla.client import CarlaClient
File "/home/baheri/HOIAWOG/ch7/carla-gym/carla_gym/envs/carla/client.py", line 14, in
from . import sensor
File "/home/baheri/HOIAWOG/ch7/carla-gym/carla_gym/envs/carla/sensor.py", line 19, in
from .transform import Transform, Translation, Rotation, Scale
File "/home/baheri/HOIAWOG/ch7/carla-gym/carla_gym/envs/carla/transform.py", line 20, in
raise RuntimeError('cannot import "carla_server_pb2.py", run '
RuntimeError: cannot import "carla_server_pb2.py", run the protobuf compiler to generate this file`

Not sure what happen...It seems that the code is not able to connect with the client.

Any help appreciated.

DDPG CARLA action space (Ch 8)

Hi!
Thanks for your great work.

I wanted to collect data (i.e., a vast range of observations and a vast range of actions) in CARLA to test an algorithm. There are two options:

Option 1: Use a human recorded data. For example, use the following link where people have gathered a good data-set in CARLA here:

https://github.com/carla-simulator/imitation-learning

In their data-set each observation comes with 28 keys:

  1. Steer, float
  2. Gas, float
  3. Brake, float
  4. Hand Brake, boolean
  5. Reverse Gear, boolean
  6. Steer Noise, float
  7. Gas Noise, float
  8. Brake Noise, float
  9. Position X, float
  10. Position Y, float
  11. Speed, float
  12. Collision Other, float
  13. Collision Pedestrian, float
  14. Collision Car, float
  15. Opposite Lane Inter, float
  16. Sidewalk Intersect, float
  17. Acceleration X,float
  18. Acceleration Y, float
  19. Acceleration Z, float
  20. Platform time, float
  21. Game Time, float
  22. Orientation X, float
  23. Orientation Y, float
  24. Orientation Z, float
  25. High level command, int ( 2 Follow lane, 3 Left, 4 Right, 5 Straight)
  26. Noise, Boolean ( If the noise, perturbation, is activated, (Not Used) )
  27. Camera (Which camera was used)
  28. Angle (The yaw angle for this camera)

My immediate question is which of these considered in example in Ch8 in DDPG algorithm? What was the action space in Ch8? I do not think you have considered all possible 9 actions in that code?

Option 2: We can also collect data by random policy...but it is important for me to have a diverse range of observations and actions? How can I do that in carla_gym? Because that starting and ending position prevents me to collect a diverse range of observations and actions. Dose the code only consider throttle, steer, and break? Could you please clarify this point?

King regards,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.