Giter Club home page Giter Club logo

nerorl's People

Contributors

marcometer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nerorl's Issues

LSTM use in PPO

Hello,

Thank you for the nice implementation. This is more of a question rather than a problem with code. I am finding it quite hard to nail down what the process is when using LSTM cells in the policy. Assume you have a sequence length of 10 timesteps and you sample 128 sequences. That means your observation is 10, 128, state_dim. Now your actions and log_probs will also be of 10, 128, 1. Am I right on this? Further do you then leave those dimensions as is and calculate the loss? Can you elaborate on the process a bit more on how to construct the loss when using these dimensions?

If you could elaborate a bit on the process I would greatly appreciate it!

Open questions recurrent policy

Assumptions

  • Taining data has to be processed sequentially
  • Gradients should not be backpropagated through data that is not part of the to be trained
    sequence

Open Questions

  1. Shall the training data be arranged in sequences of fixed length?

  2. Is it sufficient to arrange the data into their respective episodes? (Well, that could cause inconsistent mini batch sizes or an episode might be incomplete)

  3. What if a sequence ends up being shorter due to episode termination?

  4. If those sequences are padded to maintain the fixed sequence length, doesn’t that increase the overall batch size (i.e. number of training samples)?

  5. Upon optimizing the model, is it smarter to recompute the hidden states as they are modified by the optimization? (the originally hidden states from the sampling phase might be deprecated already after processing the first mini batch)

  6. What if a sequence/episode was not completed during the data sampling phase?

  7. How can you check in PyTorch how far the gradients are being backpropagated through the hidden states in the training data?

from torchviz import make_dot

graph = make_dot(final_tensor)
graph.view()
  1. If sequences of fixed length are being trained, do you have to reset the hidden state for doing inference?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.