The nerorl from marcometer

nerorl's People

Contributors

Stargazers

Watchers

nerorl's Issues

LSTM use in PPO

Hello,

Thank you for the nice implementation. This is more of a question rather than a problem with code. I am finding it quite hard to nail down what the process is when using LSTM cells in the policy. Assume you have a sequence length of 10 timesteps and you sample 128 sequences. That means your observation is 10, 128, state_dim. Now your actions and log_probs will also be of 10, 128, 1. Am I right on this? Further do you then leave those dimensions as is and calculate the loss? Can you elaborate on the process a bit more on how to construct the loss when using these dimensions?

If you could elaborate a bit on the process I would greatly appreciate it!

Open questions recurrent policy

Assumptions

Taining data has to be processed sequentially
Gradients should not be backpropagated through data that is not part of the to be trained
sequence

Open Questions

Shall the training data be arranged in sequences of fixed length?
Is it sufficient to arrange the data into their respective episodes? (Well, that could cause inconsistent mini batch sizes or an episode might be incomplete)
What if a sequence ends up being shorter due to episode termination?
If those sequences are padded to maintain the fixed sequence length, doesn’t that increase the overall batch size (i.e. number of training samples)?
Upon optimizing the model, is it smarter to recompute the hidden states as they are modified by the optimization? (the originally hidden states from the sampling phase might be deprecated already after processing the first mini batch)
What if a sequence/episode was not completed during the data sampling phase?
How can you check in PyTorch how far the gradients are being backpropagated through the hidden states in the training data?

from torchviz import make_dot

graph = make_dot(final_tensor)
graph.view()

If sequences of fixed length are being trained, do you have to reset the hidden state for doing inference?

Recommend Projects

marcometer / nerorl Goto Github PK

nerorl's People

Contributors

Stargazers

Watchers

Forkers

nerorl's Issues

LSTM use in PPO

Open questions recurrent policy

Assumptions

Open Questions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent