jsikyoon / dreamer-torch Goto Github PK

Pytorch version of Dreamer, which follows the original TF v2 codes.

License: MIT License

Python 100.00%

dreamer-torch's Introduction

Hi there 👋

I am Yoon, Jaesik (윤재식). I am currently working as a senior machine learning developer in SAP and usually researching a variety of ways to improve generality of Artificial Intelligence. This github is used to do that like sharing our proposed models or reproduced sourcecode of published papers. Hope you and your family be healthy who visits here and my logs be helpful! The details for works are in My HomePage / Google Scholar.

dreamer-torch's People

Contributors

Stargazers

Watchers

dreamer-torch's Issues

the Desire of Hyperparameters of Humanoid-Walk

When I was using your code, I only founded the Hyperparameters of Walker-Walk but Humanoid-Walk. In the official paper, it has the results of Humanoid-Walk Environment. So could you please supply the Hyperparameters of Humanoid-Walk in your config.yaml file?

Help with implementing the latest dreamerv2

I have implemented dreamerv2's current tensorflow code here. Your code helped a lot for the parts of tf which can't be directly translated 1-1. I tried to keep the torch code as faithful to the tensorflow implementation as possible. The model starts training, but after 20k steps the return gradually drops instead of going up. I tried cheetah and cartpole_swingup environments upto 100k steps on few seeds. I also tried comparing all the curves with the tensorflow results, most of them looks similar. The world model, actor and critic loss goes down, grad norms are not crazy. I will try to attach some logs soon.

License?

Hi Jaesik,

This looks very promising. Thank you. Is there a license which would allow us to use the code?

Failed to reproduce results on Atari Pong

Hi,
thank you very much for your contribution of pytorch implementation of Dreamer-v2!
However, I found it hard to reproduce the results on Atari Pong. I noticed previous discussion in issue #2 and tried configurations with both precision 32 bits and 16 bits. However, I still failed to use the latest code to reproduce the results you have provided and could only get about -6 as maximum reward after training with several different seeds. The following is my training curve.
So I was wondering whether there is a configuration or version of code that can work for Atari Pong.

Thank you very much for your effort.
Best regards.

Atari Pong dreamer.py not working with 16bit precision: "No inf checks were recorded for this optimizer" error

Greetings.

Thank you very much for your work on porting the original TF2 implementation to pytorch, as well as open sourcing it.
I have recently been looking over this kind of project that port Dreamer-v1 / v2 to Pytorch for some experiment in my research, hence my interest in your code.

I was wondering if you had some insight on the follow problems I ran into when trying to run the code from this repository.

1. 16 bit precision error.

As a first try, I attempted to execute the default dreamer agaent on Atari's Pong, running the same command as in the original implementation: python3 dreamer.py --logdir ~/logdir/atari_pong/dreamerv2/1 --configs defaults atari --task atari_pong.
However, after the experience buffer was pre-filled and training is about to start, the scirpts returns the following error which seems to be related to Torch's AMP, which I understand as being related to the mixed precision computation:

(dreamer-torch) dreamer-torch$ python dreamer.py --logdir ./logdir/atari_pong/dreamerv2/2 --configs defaults atari --task atari_pong
Logdir logdir/atari_pong/dreamerv2/2
Create envs.
Prefill dataset (0 steps).
Eval episode has 934 steps and return -20.0.
[201372] eval_return -20.0 / eval_length 934.0 / eval_episodes 1.0
Invalid MIT-MAGIC-COOKIE-1 keySimulate agent.
[201372] 
Start evaluation.
Eval episode has 832 steps and return -20.0.
[201372] eval_return -20.0 / eval_length 832.0 / eval_episodes 1.0
Start training.
[201372] fps 0.0
Traceback (most recent call last):
  File "dreamer.py", line 317, in <module>
    main(parser.parse_args(remaining))
  File "dreamer.py", line 295, in main
    state = tools.simulate(agent, train_envs, config.eval_every, state=state)
  File "/home/d055/random/rl/dreamer-torch/tools.py", line 136, in simulate
    action, agent_state = agent(obs, done, agent_state, reward)
  File "dreamer.py", line 75, in __call__
    self._train(next(self._dataset))
  File "dreamer.py", line 144, in _train
    metrics.update(self._task_behavior._train(start, reward)[-1])
  File "/home/d055/random/rl/dreamer-torch/models.py", line 207, in _train
    metrics.update(self._actor_opt(actor_loss, self.actor.parameters()))
  File "/home/d055/random/rl/dreamer-torch/tools.py", line 481, in __call__
    self._scaler.step(self._opt)
  File "/home/d055/anaconda3/envs/dreamer-torch/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 337, in step
    assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

Do you happen to have any idea on why this might be ?

I am using Python 3.8.12, and installed the dependencies using pip install -r requirements.txt provided in the repository.

2. Little out of topic issue: using 32 bit precision

I tried to play with the precision the scripts use for their computation by setting --precision 32 (to my understanding, this is the default precision of Pytorch), and the code does end up running without the previous error.
However, even after training the agent for around 16 millions steps, there is not improvement in performance, unlike the curves you have provided for Atari pong.
I was thus wondering if the change of the precision might have been the reason for this, and would be interested to know your opinion about it, as well as any recommendation for running the code in this repository to at least reproduce the original paper's result.

Thank you very much for your time.
Looking forward to hear back from you.
Best regards.

stop-grad on actor on imagine() function.

Hi,

While I'm looking in the imagine() function in models.py,
I wonder why the gradient from the input feature to the action output should be stopped.

(221 line of the models.py)

detach means the stop-gradient

inp = feat.detach() if self._stop_grad_actor else feat

action = policy(inp).sample()
succ = dynamics.img_step(state, action, sample=self._config.imag_sample)

You introduced the boolean variable self._stopgrad_actor so that the user can choose to stop it or not. But in my opinion, for the full gradient from the initial state to the last step of the sequence, shouldn't the feat flow through the computation graph without the stop gradient? I just wonder why there is a stop gradient. have you tried the code without the stop gradient? What was the result like?

I'm struggling to find out the reason for it, and I hope for a good discussion!
Thanks!

About data replay

Thanks for the great work. I'm learning this repo of codes and I'm confused about how to use old data to train in this code. As I understand, after every episode is done, the environment wrapper will save the episode information. Unfortunately, the only function I find to load these episodes is "tools.load_episodes", which is used before training. Could you please tell me how this code loads the historical episodes during training? Thanks a lot!

Bug when setting config.precision=16

there is a little bug when setting config.precision=16

dreamer-torch/dreamer.py

Line 194 in 7c2331a

env = wrappers.CollectDataset(env, callbacks)

which forgets parsing config.precision, resulting still using float precision collecting data. I think it should be like below.

env = wrappers.CollectDataset(env, callbacks, config.precision)

specificly, below shows that if config.precision not passed, the code will use dtype=np.float32 instead of np.float16 as expected

dreamer-torch/wrappers.py

Lines 225 to 230 in 7c2331a

 class CollectDataset: 

 def __init__(self, env, callbacks=None, precision=32): 

 self._env = env 

 self._callbacks = callbacks or () 

 self._precision = precision

dreamer-torch/wrappers.py

Lines 271 to 274 in 7c2331a

 if np.issubdtype(value.dtype, np.floating): 

 dtype = {16: np.float16, 32: np.float32, 64: np.float64}[self._precision] 

 elif np.issubdtype(value.dtype, np.signedinteger): 

 dtype = {16: np.int16, 32: np.int32, 64: np.int64}[self._precision]

Remaining few tf references in the exploration Plan2Explore class

Hi, Great job porting dreamer2 to torch while matching the tf code.
I think the Plan2Explore has to be ported over.

dreamer-torch/exploration.py

Line 60 in e42d504

stoch = tf.reshape(

Actor and value loss: actor entropy and state entropy addition during the ImagBehavior update

Greetings.
Thank you very much for your assistance last time regarding the precion=32 issue #2
Your implementation has been a great reference for me.

This issue is not so much of an issue, and more of a query about the logic for the actor update.

Namely, when training the ImagBehavior component of Dreamer at line, the target and weights are first computed using _compute_target method.
In _compute_target, in case the actor entropy and state entropy flags are enabled, they are respectively added to the predicted imaginated reward list, as per the following lines

dreamer-torch/models.py

Lines 236 to 247 in e42d504

 def _compute_target( 

 self, imag_feat, imag_state, imag_action, reward, actor_ent, state_ent, 

 slow): 

 if 'discount' in self._world_model.heads: 

 inp = self._world_model.dynamics.get_feat(imag_state) 

 discount = self._world_model.heads['discount'](inp).mean 

 else: 

 discount = self._config.discount * torch.ones_like(reward) 

 if self._config.future_entropy and self._config.actor_entropy() > 0: 

 reward += self._config.actor_entropy() * actor_ent 

 if self._config.future_entropy and self._config.actor_state_entropy() > 0: 

 reward += self._config.actor_state_entropy() * state_ent

Then, before the actor_loss itself is computed in _compute_actor_loss, the actor_entropy and state_entropy are added one more time as per the following lines;

dreamer-torch/models.py

Lines 280 to 283 in e42d504

 if not self._config.future_entropy and (self._config.actor_entropy() > 0): 

 actor_target += self._config.actor_entropy() * actor_ent[:-1][:,:,None] 

 if not self._config.future_entropy and (self._config.actor_state_entropy() > 0): 

 actor_target += self._config.actor_state_entropy() * state_ent[:-1]

From my perspective, it looks like the entropy bonus is added twice in total, and I was wondering if you have any insights on why it is done like so?

Looking forward to hear back from you again, and in the mean time
Best regards.

JIT for faster training?

Hi,

While I'm running the code,
I found that pytorch JIT (for fast running) is not used in the code.
I'm thinking of applying this feature for faster training.
Have you an experience of using this feature that you can possibly share?

Thanks!

replay data memory usage?

Hi, thank you for the good code base.
I just wonder if a normal PC can embrace all the replay data in the memory when the agent step goes over 1 million. If I have about 16 GB memory, then can this agent be trained until the end?
It seems like the replay data size keep increasing as the training proceeds. Do you have any idea that the agent can be trained in a small-sized memory?
Thanks!

Bug about GRU

dreamer-torch/networks.py

Line 175 in e588aa2

x, deter = self._cell(x, [deter])

CUDA_VISIBLE_DEVICES=0 python dreamer.py --logdir logdir/walker/dreamerv2/1 --configs defaults

defaults:
dyn_cell: 'gru'

It will cause bug.

use python3.9

python3.9 install req.txt ok, no error

pip install opencv-python
pip install 'gym[atari]'
pip install memory-maze

sorry ，this for torch dreamerv3 NM512/dreamerv3-torch#17

	class CollectDataset:

	def __init__(self, env, callbacks=None, precision=32):
	self._env = env
	self._callbacks = callbacks or ()
	self._precision = precision

	if np.issubdtype(value.dtype, np.floating):
	dtype = {16: np.float16, 32: np.float32, 64: np.float64}[self._precision]
	elif np.issubdtype(value.dtype, np.signedinteger):
	dtype = {16: np.int16, 32: np.int32, 64: np.int64}[self._precision]

	def _compute_target(
	self, imag_feat, imag_state, imag_action, reward, actor_ent, state_ent,
	slow):
	if 'discount' in self._world_model.heads:
	inp = self._world_model.dynamics.get_feat(imag_state)
	discount = self._world_model.heads['discount'](inp).mean
	else:
	discount = self._config.discount * torch.ones_like(reward)
	if self._config.future_entropy and self._config.actor_entropy() > 0:
	reward += self._config.actor_entropy() * actor_ent
	if self._config.future_entropy and self._config.actor_state_entropy() > 0:
	reward += self._config.actor_state_entropy() * state_ent

	if not self._config.future_entropy and (self._config.actor_entropy() > 0):
	actor_target += self._config.actor_entropy() * actor_ent[:-1][:,:,None]
	if not self._config.future_entropy and (self._config.actor_state_entropy() > 0):
	actor_target += self._config.actor_state_entropy() * state_ent[:-1]