Giter Club home page Giter Club logo

Comments (7)

lxuechen avatar lxuechen commented on May 29, 2024

Thanks for your interest!

The GRU consumes a (time reversed) sequence of inputs in observation space and outputs a sequence in a 4D latent space.

You're absolutely right!

What's a little unclear to me, is what the context would be. Just one (the last) latent variable from each of the GRU outputs?

The GRU outputs at intermediate times can also be used for practical performance benefits (e.g. check this out). The searchsorted operation here is used to find the "right" context that's produced only using future observations.

However, in your implementation in the example, all the dynamics happen in observation space directly.

I think it's fair to put it that way, and to say that points in the latent space are mapped to the observed space via an identity transform.

So this is not an actual "Latent SDE" model, is it? More like a "Latent informed/controlled SDE"?

I think you have a fair point. I agree that I was somewhat sloppy with the re-implementation. Essentially, the things that are simplified are 1) variational inference at time t0 (I didn't include the KL penalty, or a prior), and 2) an actual non-trivial decoder that maps points in the latent space back to the observed space.

To properly do 1), one would first select a good prior (say N(mu, sigma); mu and sigma can be optimized during training). To compute the KL penalty, one would need also the variational distribution given by the encoder (e.g. dependent on the last output of the GRU or return value of some other encoder).

In general, i think this package could greatly benefit from a more in depth documentation in regards to training models with it and/or better explained examples. One or two examples of standart use cases in jupyter notebooks with detailed explanations could help a lot to make this more accessible to people that don't have much background in SDE (like me).

Thanks for the suggestion and I totally agree! Happy to spend time in documenting the model better in the future, though quite unfortunately my schedule in the near future seems quite packed.

from torchsde.

stebuechho avatar stebuechho commented on May 29, 2024

Thanks for answering! That cleared thing up!

What's a little unclear to me, is what the context would be. Just one (the last) latent variable from each of the GRU outputs?

The GRU outputs at intermediate times can also be used for practical performance benefits (e.g. check this out). The searchsorted operation here is used to find the "right" context that's produced only using future observations.

What i meant was: In the paper a context size of 1 is mentioned, while the latent space is 4 dimensional. I assumend context size of 1 was referring to the latent dimension, so that you took only one of the 4 latent variables of the according timestep of the GRU output sequence as the context. But it referrs to the time dimension, so all for latent variables, got it!

In general, i think this package could greatly benefit from a more in depth documentation in regards to training models with it and/or better explained examples. One or two examples of standart use cases in jupyter notebooks with detailed explanations could help a lot to make this more accessible to people that don't have much background in SDE (like me).

Thanks for the suggestion and I totally agree! Happy to spend time in documenting the model better in the future, though quite unfortunately my schedule in the near future seems quite packed.

Awesome! I am looking forward to it, whenever it may be!

Even though this is porbably not the right place, as github issues are not meant as a forum to ask for help, i'll still try sneak in a few practical questions, i hope you don't mind. I would be thrilled if you could give some answers! However, if that is not appropriate, feel free to shoot me down and close.

In my potential use case, i have a bunch of timeseries data. I would like to fit one sde model to it to make predictions into the future, based on a given portion of timesteps. So it is pretty similiar to the latent_sde_lorenz example. Even more to the example of the geometric brownian motion in your paper, i think.

  1. I am a little confused about the roles of prior h und approx posterior f in this setting. It's probably due to my lack of knowledge in the field, though i think i know what prior and posterior distributions are, but i am not entirely clear by their roles here: Is the approx. posteriors role only to condition the prior, so that we can make good predictions with it? So inferrence would always happen with the prior dirft? In the geometric brownian motion example, you make future predictions with both. But for the model setup as decribed in the last posts, that would mean that the context for the approx. posterior drift would remain static for all t>1, as the GRU didn't give outputs there, is that correct? So i guess that would deteriorate predictions the farther out in the future they are, right?

  2. Instead of the output of one discrete path, a more relevant predicion would be the expected value and its variance at a (future) timestep. Computing those for constant drift and diffusion seems easily possible. But in case of a neual sde, do you know if it is possible to compute those directly? In the example latent_sde, i think you kind of compute them by sampling a bunch of pathes in order to show something like the pdf at each timestep (colorcoded in blue). Having an option to make sdeint compute the expectation and variance natively might be very useful actually!

  3. Since i think this is how the adjoint backward pass is implemented anyway, having an option to integrate backward through time might be very useful too! But to implement it myself, this is a good starting point i guess? (only need to figure out why g**2 is scaled by score and how that translates to my usecase?)

from torchsde.

lxuechen avatar lxuechen commented on May 29, 2024

What i meant was: In the paper a context size of 1 is mentioned, while the latent space is 4 dimensional. I assumend context size of 1 was referring to the latent dimension, so that you took only one of the 4 latent variables of the according timestep of the GRU output sequence as the context. But it referrs to the time dimension, so all for latent variables, got it!

The context vector is an extra piece of information that isn't really related to the latent space. It doesn't really refer to the timestamps either. On the other hand, the timestamp is used to select which context vector we should use for integrating the SDE in a specific time interval.

Even though this is porbably not the right place, as github issues are not meant as a forum to ask for help, i'll still try sneak in a few practical questions, i hope you don't mind. I would be thrilled if you could give some answers! However, if that is not appropriate, feel free to shoot me down and close.

I'm closing the issue for now after the fixes, but I'm also happy to keep chatting here if that may be helpful.

from torchsde.

lxuechen avatar lxuechen commented on May 29, 2024

I'm splitting this reply into several segments, as one giant grid of text may seem intimidating.

I am a little confused about the roles of prior h und approx posterior f in this setting.

If you're familiar with Gaussian processes, I'd say that it's reasonable to think that the prior here is analogous to the prior there when you're only fitting a single time series sequence. In fact, the OU process is a Gaussian process, and this is a special case where the two model classes somewhat coincide. Notably, things are a bit different when one is fitting multiple time series sequences and trying to do interpolation/extrapolation for each sequence individually.

But for the model setup as decribed in the last posts, that would mean that the context for the approx. posterior drift would remain static for all t>1, as the GRU didn't give outputs there, is that correct?

If the goal is extrapolation based on observations of a single time series sequence, I'd recommend using the posterior drift.

from torchsde.

lxuechen avatar lxuechen commented on May 29, 2024

Instead of the output of one discrete path, a more relevant predicion would be the expected value and its variance at a (future) timestep. Computing those for constant drift and diffusion seems easily possible. But in case of a neual sde, do you know if it is possible to compute those directly? In the example latent_sde, i think you kind of compute them by sampling a bunch of pathes in order to show something like the pdf at each timestep (colorcoded in blue). Having an option to make sdeint compute the expectation and variance natively might be very useful actually!

I'd agree the naive method would be to estimate the statistics with samples. I'm aware of works that intend to approximately simulate SDEs by only simulating the marginal mean and covariance ODEs. I may not be up-to-date on the latest developments there, but I haven't seen a paper that convincingly demonstrated that such a method is consistently accurate and leads to models of good utility.

More generally, the problem is related to simulating the Fokker Planck, which is known to be difficult aside from special cases.

from torchsde.

lxuechen avatar lxuechen commented on May 29, 2024

Since i think this is how the adjoint backward pass is implemented anyway, having an option to integrate backward through time might be very useful too! But to implement it myself, this is a good starting point i guess? (only need to figure out why g**2 is scaled by score and how that translates to my usecase?)

Our adjoint implementation is in this file. The core functions of interest are the reverse drift and diffusions (e.g. here and here).

What you listed in the example is something very different, and comes from another paper. I implemented it for MNIST before they released their codebase, and it was mostly for fun for myself.

Notably, the reverse SDE formulation in that paper is totally different from ours. The backward/time-reverse-SDE formulation in our paper ensures that individual sample paths can be reversed given a fixed Brownian motion sample. The time-reverse SDE in their paper only ensures that the marginal distributions can be reconstructed. Note, one could get the same marginals even with different sample paths.

For their purposes and applications, their reverse-time SDE formulation was sufficient.

from torchsde.

stebuechho avatar stebuechho commented on May 29, 2024

Thank you for the answers! I think I'll have to read a little further into the topic and try to set up a small model for my use case when i have the time ( it's just a little side project right now, out of interest ) before asking any more questions here. Thanks for offering to keep on chatting!

from torchsde.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.