google-research / torchsde Goto Github PK

Differentiable SDE solvers with GPU support and efficient sensitivity analysis.

License: Apache License 2.0

Python 100.00%

deep-learning deep-neural-networks pytorch dynamical-systems differential-equations stochastic-processes stochastic-differential-equations stochastic-volatility-models neural-differential-equations

torchsde's Introduction

PyTorch Implementation of Differentiable SDE Solvers

This library provides stochastic differential equation (SDE) solvers with GPU support and efficient backpropagation.

Installation

pip install torchsde

Requirements: Python >=3.8 and PyTorch >=1.6.0.

Documentation

Available here.

Examples

Quick example

import torch
import torchsde

batch_size, state_size, brownian_size = 32, 3, 2
t_size = 20

class SDE(torch.nn.Module):
    noise_type = 'general'
    sde_type = 'ito'

    def __init__(self):
        super().__init__()
        self.mu = torch.nn.Linear(state_size, 
                                  state_size)
        self.sigma = torch.nn.Linear(state_size, 
                                     state_size * brownian_size)

    # Drift
    def f(self, t, y):
        return self.mu(y)  # shape (batch_size, state_size)

    # Diffusion
    def g(self, t, y):
        return self.sigma(y).view(batch_size, 
                                  state_size, 
                                  brownian_size)

sde = SDE()
y0 = torch.full((batch_size, state_size), 0.1)
ts = torch.linspace(0, 1, t_size)
# Initial state y0, the SDE is solved over the interval [ts[0], ts[-1]].
# ys will have shape (t_size, batch_size, state_size)
ys = torchsde.sdeint(sde, y0, ts)

Notebook

examples/demo.ipynb gives a short guide on how to solve SDEs, including subtle points such as fixing the randomness in the solver and the choice of noise types.

Latent SDE

examples/latent_sde.py learns a latent stochastic differential equation, as in Section 5 of [1]. The example fits an SDE to data, whilst regularizing it to be like an Ornstein-Uhlenbeck prior process. The model can be loosely viewed as a variational autoencoder with its prior and approximate posterior being SDEs. This example can be run via

python -m examples.latent_sde --train-dir <TRAIN_DIR>

The program outputs figures to the path specified by <TRAIN_DIR>. Training should stabilize after 500 iterations with the default hyperparameters.

Neural SDEs as GANs

examples/sde_gan.py learns an SDE as a GAN, as in [2], [3]. The example trains an SDE as the generator of a GAN, whilst using a neural CDE [4] as the discriminator. This example can be run via

python -m examples.sde_gan

Citation

If you found this codebase useful in your research, please consider citing either or both of:

@article{li2020scalable,
  title={Scalable gradients for stochastic differential equations},
  author={Li, Xuechen and Wong, Ting-Kam Leonard and Chen, Ricky T. Q. and Duvenaud, David},
  journal={International Conference on Artificial Intelligence and Statistics},
  year={2020}
}

@article{kidger2021neuralsde,
  title={Neural {SDE}s as {I}nfinite-{D}imensional {GAN}s},
  author={Kidger, Patrick and Foster, James and Li, Xuechen and Oberhauser, Harald and Lyons, Terry},
  journal={International Conference on Machine Learning},
  year={2021}
}

References

[1] Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud. "Scalable Gradients for Stochastic Differential Equations". International Conference on Artificial Intelligence and Statistics. 2020. [arXiv]

[2] Patrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry Lyons. "Neural SDEs as Infinite-Dimensional GANs". International Conference on Machine Learning 2021. [arXiv]

[3] Patrick Kidger, James Foster, Xuechen Li, Terry Lyons. "Efficient and Accurate Gradients for Neural SDEs". 2021. [arXiv]

[4] Patrick Kidger, James Morrill, James Foster, Terry Lyons, "Neural Controlled Differential Equations for Irregular Time Series". Neural Information Processing Systems 2020. [arXiv]

This is a research project, not an official Google product.

torchsde's People

Contributors

Stargazers

Watchers

Forkers

ankitshah009 richardkelley arijitthegame smartypans rb512 kurhula francishzq suvrajeet01 seanahmad fagan2888 shafiahmed adithya91 stjordanis hadryan naouarelkhadiri khalilbelghouat belkhir-nacim zeta1999 quantolope hilbertvan coorung voletiv ml-ai-nlp-ir mtsokol chiccoguanzi patrick-kidger muskanmahajan37 pengfight dboyliao iem-computer-vision hiddst davrempe raahul-singh mkofinas xanderyin shrimpceviche surajitdb tianyu-lu wangyongguang matherheart joaquingajardo wegamekinglc algoskynet geofiber bayesianbrad nidhalrahali cobalamin dnguyengithub ahoyosid saraswat wtwong316 rainman21 shi27feng matthewbegun samb-t sailfish009 olegfs zymrael matthewvowels1 xinzhang525 rishirelan reactive-capital udemirezen linglingfan7 dhirschfeld asahin01 melifluos shaun95 sarvex vishalbelsare alfatti wn1695173791 arnabkar shuowang-ai scumechanics jiwoncpark trendingtechnology mdiamore nickdst fundou mr-siddy zehsilva profsumitjha seoyoungh jeongwhanchoi spencerfar openanalytica jiangsy gabrielnobis linhduongtuan isabella232 dashstander hit636 python-repository-hub ilzeamandaa dx122333 suelenferraz xueliu8617112 pierreglaser antarkic

torchsde's Issues

New solvers

Creating an issue to keep track of the various solvers that would be nice to add (eventually):
Ito

Derivative-free Milstein ref (#31)
Talay
SRK: DRI1 ref

Stratonovich

Ridding `_check_2d`

I'm proposing to get rid of the 2d shape checks. These were added in #88 as part of the v0.2.4.

These checks are creating a huge barrier for applications that don't have vectorized data naturally. Flattening and unflattening will hurt efficiency tremendously.

Install command fails if SSH key not set

pip install git+ssh://[email protected]/google-research/torchsde

throws an error for me ([email protected]: Permission denied (publickey)), presumably because one is required to have set their SSH key with GitHub.

Is there a particular reason for this choice? Rather than the usual

pip install git+https://github.com/google-research/torchsde.git

which is a little more user-friendly. :)

Calling `BrownianTree` with `t > t1` returns `None`

Repro:

torchsde.BrownianTree(t0=torch.tensor(0.), w0=torch.tensor(0.))(2.)

Not helped by the implicit assumption that t1 = t0 + 1.

demo emaple code?

Hey,

Does this project include the code for the demo gif in the readme file?

BR,

`dg_ga_jvp_column_sum_v2` can break

It assumes that the batch dimension can be modified, but something like e.g.

z = (batch, channel)
def g(self, t, y):
    return torch.stack([z, y], dim=2)

will then throw an error because of the different batch dimensions. (This being a scenario that I've actually run into.)

Incidentally I'm also concerned (but have not tested) that multiplying the batch dimension will put a peak in our memory usage. Having a dramatically non-rectangular memory profile could reduce the maximum possible batch size and thus slow things down overall.

I'm wondering about switching to v1 for now? At least on test_strat.py I don't find it that much slower. (And as per above, may even be faster across a batch.)

Replace `trapezoidal_approx` with integrated Brownian bridge for speed

Prototype here

Need to add OpenMP flag.

setup.py doesn't currently enable OpenMP for the C++ extensions. (Which by-the-by we may wish to think about the future of as maintaining those is going to be rather a hassle.)

See e.g.
https://github.com/patrick-kidger/signatory/blob/master/setup.py#L33
for an example of how to enable the OpenMP flags.

(Whilst we're at it we should also add the fvisibility flag; see the same example.)

Memory usage

Hi,
I observed that the memory usage increases while training goes on. This is unusual in standard NN, since the size of the parameters is fixed. Is this normal in this model or did I get something wrong? I also noticed there is a cache_size parameter. What is the unit of this integer, is it in MB?

thanks

Build Fails

Reproduce

https://colab.research.google.com/gist/ucalyptus/39977d8f157270a330ef04fc0c9707a9/test-torchsde.ipynb

Correlated Brownian Motions?

Hello!

First of all, fantastic work!! This framework is amazing and I've managed to implement some basic SDE's and solve them. My current goal is to go through some of the canonical SDEs used in Finance and implement them, which leads to my question.

Q: I discovered some slides pertaining to the Neural SDE as GANs paper, in which you reference the Heston model, in torchsde how would I link the two Brownian motions together by their correlation factor rho ? I think I managed to do it in a quick and dirty way below

class CorrelatedBrownianIntervals(BrownianInterval):

    def __init__(self, t0, t1, rho, w1, W=None, H=None):
        super().__init__(t0, t1, (batch_size, 1), W, H)
        self.rho = rho  # Correlation Coefficient between the 2 Brownians
        self.w1 = w1  # The first correlated Brownian

    def correlated_brownian(self, ta, tb):
        z1 = super().__call__(ta, tb)  # This generates an uncorrelated Brownian motion Z1
        corr = (self.rho * self.w1_val + torch.sqrt(1 - self.rho ** 2) * z1)  ## Creates the correlated Brownian W2
        return corr

    def __call__(self, ta, tb):
        self.w1_val = self.w1(tb) - self.w1(ta)
        return self.correlated_brownian(ta, tb)

Is this the right approach? Or is there a better / more flexible way to do this?

Thanks again!

0.2.0 TODO list

To-do list so far:

Brownian motions:

Switch to interval interface.
- Brownian
- BrownianInterval
- BrownianPath (both Python and C++)
- BrownianTree (both Python and C++)
- calls in adjoint
- calls in solvers
- ~~calls in examples/demo.ipynb~~ As of #19 support single-point evaluation
- ~~calls in tests/problems.py~~ As of #19 support single-point evaluation
- calls in tests/test_brownian_path.py
- calls in tests/test_brownian_tree.py
Levy area support:
- BrownianInterval
- BrownianPath (#61)
- BrownianTree (#61)
- Removed 'trapezoidal_approx' as option from SRK solvers (see #12)
- Removed 'trapezoidal_approx' from examples (see #12) (Done in #19)
Tidy up __init__:
- BrowianPath (Chen) (#21)
- BrownianTree (#61)
Misc:
- Handle t outside of t0, t1. (Patrick) (#9, #10, #28)

Solvers:

Not split up by noise type any more, to the extent that's reasonable.
Add Midpoint solver
Add log-ODE solver (#43)
Allow picking the solver for the adjoint pass.
Remove integrate_logqp and interp_logqp
Fix step sizes leaving the region of integration. (Patrick) (#28)

SDEs:

Add gdg_prod for the adjoint. (Chen)
Add general-noise adjoint (Chen)
Tidy up the directory structure for the adjoints a bit.
Add checks in the adjoint for Ito/Stratonovich correction terms. I think the checks should depend on both the sde_type and the method.

Misc:

Bumped version number, python_requires.
diagnostics/ito_scalar.py appears to be using huge amount of memory, and is also using a diagonal (and thus not compatible) problem.
Update documentation. (#71)
Better adaptivity checks. (e.g. general-noise Heun should be possible with adaptive stepping, I believe.) (#73)
Support logqp (#75)
Change sdeint, sdeint_adjoint to automatically select the probably-best solver for a given noise type / SDE type combination. (Rather than just always using SRK.) (#73)
SDE-GAN example

Tests:

Fix broken tests test_sdeint.py, test_adjoint.py, test_adjoint_logqp.py (Chen) (#21)
Write tests for BrownianInterval (Patrick) (#28)
Add repetitions back to test_brownian_interval.py::test_normality_conditional (#44)
Write distribution tests for space-time Levy area. (#44)
Update test_sdeint.py to pytest. (#73)
Add tests checking that every solver works. (#73)
Fix flakiness+slowness of BInterval tests (#76)

Bugs to fix:

Correction a from space-time Levy area to Levy area is of the wrong shape. (Patrick) (#28)
Davie's approximation is currently missing its correction term from the Brownian parabola. (Patrick) (#28)
Use the correct noise term for I_k0 in SRK.
Default values of method='srk' and levy_area_approximation='none' are incompatible with one another. (Done in #19)
If bm isn't passed then it defaults to being a BrownianPath on the CPU, potentially causing device errors later. (Done in #19)

Further Python-end optimizations

Avoid re-evaluating g in ForwardSDE and AdjointSDE for g_prod and gdg_prod/gdg_jvp/gdg_jvp.
~~Dictionary lookup -> if/else statements after profiling.~~ (I know you put this in because I commented on it, but it's not important enough that I'm fussed. - Patrick)
Don't flatten and force all params to require gradient in adjoint.

Lots of code reuse between solvers, and between adjoint SDEs

To be clear, not raising this because I feel like being nitpicky - read on!

Other than very minor differences (e.g. order), the code for each Euler--Maruyama/Milstein/SRK method is essentially the same across noise types. Instead one can have e.g. a single Euler that accepts order as an argument, and four different wrappers that set that argument.

Similarly, the adjoint SDE definitions have lots of code reuse; it'd easier to write down a single adjoint SDE for the general noise case. (And any unnecessary computation can be masked out with if statements checking the noise type.)

I think the latter in particular offers an easy opportunity to unify Ito and Stratonovich: it'd just be another if statement checking if the correction term should be applied.

As for why I'm raising this - I'm interested in:

Defining custom SDE solvers.
Solving the adjoint SDE with general noise. (And I'm happy to do so with Euler--Maruyama, knowing the strong order is 0.5. It should still work, right?)
Solving Stratonovich SDEs.

The first is aided by tidying up the existing code for solvers (I don't want to define MyCustomSolverAdditive, MyCustomSolverGeneral,...), the second is made possible by improving the adjoint SDE, and the third is a stretch goal that'll probably be easier if it can just be implemented with if sde_type= corrections in the right places.

Tidying up the solvers at least is probably something I can offer a PR on, but I wanted to check if this fit your general vision, or if you think there's anything that might go wrong, as it pretty much involves removing the whole methods directory structure.

Using `names` argument without a prior drift on the SDE fails.

From:

torchsde/torchsde/_core/base_sde.py

Line 148 in 647b6e5

self.h = getattr(base_sde, prior_drift)

A prior drift is assumed to exist on base_sde.

Commit "Mask out private packages." broke library import

See this comment:

41fb567#r41181450

Operate on tensors rather than tuple-of-tensors?

Hi Xuechen - I appreciate I'm bombarding this repository with a lot of issues/PRs.

Something we've recently found over in torchdiffeq#115 is that it's noticeably more efficient to have the internals of the library operate on tensors, rather than tuples-of-tensors, and have the adjoint method simply torch.cat its different channels together.

Of course this also makes the code a whole lot cleaner, and I expect easier to develop going forward.

Not something I've got time to put together a PR for I'm afraid, but I thought I'd flag this up as something you might be interested in over here.

Computational time for Brownian Interval

I observed something strange about computation time for brownain interval

sde.cnt = 0
%timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 1.0]), dt=0.01, method="euler")
print(sde.cnt)
# 1.87 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
# 1428

sde.cnt = 0
%timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 5.0]), dt=0.05, method="euler")
print(sde.cnt)
# 57.3 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
# 1414

sde.cnt = 0
%timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 10.0]), dt=0.1, method="euler")
print(sde.cnt)
# 57.2 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
# 1414

where the sde is very similar to the one defined in the Quick example in README. In the above three examples, I change the different ts and dt. I think they should have roughly the same computation time. But it turns out the time used by the line are very different. According to the paper, the worse case should roughly be O(log T/dt) if I understand correctly. Why the first case is so slow?

On BrownianInterval/Path/Tree

@lxuechen soliciting your thoughts on something I've been contemplating, namely to make BrownianPath and BrownianTree special cases of BrownianInterval.

BrownianInterval w/ infinite cache is essentially the same as BrownianPath.
BrownianInterval w/ some tolerance at which the search is terminated, and a prespecified tree structure, is essentially the same as BrownianTree.

Not so many Brownian* objects to support.
Avoid the need to add Levy area support to BrownianPath/Tree, which is currently an outstanding task.
Can remove the C++ code and just use the existing Python implementation (which seems performant enough in my experience).
Can remove the dependence on blist, which was last updated in 2014 and has been spitting out warnings about running afoul of Python 3.9 deprecations.

Would need to add a tolerance to the search in BrownianInterval, and for robustness a trampoline to the relevant parts, neither of which should be very hard.

Make `dtype` of BrownianInterval constructor an optional argument

Could we make this argument an optional one so that users are freed from typing this in when they explicitly initialize BrownianInterval? An optional could be torch.get_default_dtype().

Latent SDE for irregularly spaced time series?

Hi, thanks for the great work. I have questions on using Latent SDE for irregularly spaced time series.

As far as I can tell there are no code examples covering latent SDE for irregularly spaced time series in this repo. latent_sde_lorenz.py is regularly spaced and the encoder is just a regular GRU with no time information. Is there any publicly available code on this?
If we would like to adapt latent SDE for irregularly spaced time series, what kind of encoder would you suggest? Would an ODE-RNN work for example?

Thanks!

Global entropy doesn't determine sampled values outside of [t0, t1] for `BrownianTree`

Following the note, the entropy argument passed to initialize BrownianTree doesn't govern the random numbers sampled outside the predetermined range [t0, t1].

Queries of this type is rate. Though, at the very least, there should be some warning or documentation on this.

Error in long simulation

Hello,

I've tried to simulation the basic model supplied in the example for 20 seconds. t_size remained 100.
The method used us euler as in the supplied example. It results in an error. The file is attached.
long_SDE_error.zip

Thanks in advance,
Ron Teichner

The error is:
/torchsde/_brownian/brownian_interval.py", line 322, in _split_exact
generator = np.random.SeedSequence(entropy=self._top._entropy,
File "bit_generator.pyx", line 316, in numpy.random.bit_generator.SeedSequence.init
File "bit_generator.pyx", line 391, in numpy.random.bit_generator.SeedSequence.get_assembled_entropy
File "bit_generator.pyx", line 156, in numpy.random.bit_generator._coerce_to_uint32_array
File "<array_function internals>", line 5, in concatenate
RecursionError: maximum recursion depth exceeded while calling a Python object

sdeint_adjoint problem

Hello, thanks for the great work!

I was trying the library and I think there is a problem with sdeint_adjoint when it's used in custom loss functions (sdeint works however). Below I give an example based on the LatentSDE which shows a case of the error. The example doesn't have any meaningful interpretation, it is just for illustration purposes.

import torch
from torch import nn

from torchsde import sdeint, sdeint_adjoint, SDEIto
from torch.distributions import Normal

class LatentSDE(SDEIto):
    def __init__(self, drift, sigma=0.5):
        super(LatentSDE, self).__init__(noise_type="diagonal")
        self.drift = torch.tensor(drift, requires_grad=False)
        self.sigma = torch.tensor(sigma, requires_grad=False)

        # Approximate posterior drift: Takes in 2 positional encodings and the state.
        self.net = nn.Sequential(
            nn.Linear(2, 10),
            nn.Tanh(),
            nn.Linear(10, 1)
        )
        self.net[-1].weight.data.fill_(0.)
        self.net[-1].bias.data.fill_(0.)

    def h(self, t, y):  # Prior drift.
        return self.drift * y

    def f(self, t, y):  # Approximate posterior drift.
        if t.dim() == 0:
            t = float(t) * torch.ones_like(y)
        # Positional encoding in transformers; must use `t`, since the posterior is likely inhomogeneous.
        inp = torch.cat((torch.sin(t), torch.cos(t)), dim=-1)
        return self.net(inp)

    def g(self, t, y):  # Shared diffusion.
        return self.sigma

batch_size, d, T = 3, 1, 20
sde = LatentSDE(1.0)
ts = torch.linspace(0, 1, T)
y0 = torch.zeros(batch_size, 1).fill_(0.1)  # (batch_size, d)
dist = Normal(0,1.0)

def loss():
    ys, logqp = sdeint_adjoint(sde, y0, ts, logqp=True)
    data = ys.sum(0).mean()
    res = -dist.log_prob(data)
    return res

learning_rate = 1e-4
optimizer = torch.optim.Adam(sde.parameters(), lr=learning_rate)

for t in range(3):
    optimizer.zero_grad()
    l = loss()
    print(l)
    l.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

The error is

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-16-65102f371b73> in <module>
      3     l = loss()
      4     print(l)
----> 5     l.backward()
      6 
      7     # Calling the step function on an Optimizer makes an update to its

~/opt/anaconda3/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    196                 products. Defaults to ``False``.
    197         """
--> 198         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    199 
    200     def register_hook(self, hook):

~/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     98     Variable._execution_engine.run_backward(
     99         tensors, grad_tensors, retain_graph, create_graph,
--> 100         allow_unreachable=True)  # allow_unreachable flag
    101 
    102 

RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([3, 1]) and output[0] has a shape of torch.Size([]).

Is this a known issue or am I doing something wrong?

Thanks a lot in advance.

Upload to PyPI

It'd be nice to make torchsde available on PyPI. There's a few limitations with hosting it only on GitHub, most notably that (a) you can't specify a range of acceptable versions (torchsde>=0.2.4), and (b) it's not possible to upload any other package to PyPI that depends on torchsde: PyPI disallows uploading packages that have dependencies outside of PyPI.

Solver for "general"-type noise missing...

I was trying to use general noise with either sdeint or sdeint_adjoint. But I always get the error message below:

SDE has noise type general but solver only supports noise types ('additive', 'diagonal', 'scalar')

Would you mind implementing the corresponding solver?

Another question: what are the meanings of g_prod, gdg_prod and g_prod_and_gdg_prod?

Thanks

Scalar noise SDE shape check fails

Latest on dev, the contract checking fails for the following

import torch

from torchsde import BrownianInterval, BaseSDE, sdeint

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

batch_size, d, m = 16, 3, 2
t0, t1 = 0.0, 0.3
dtype = torch.get_default_dtype()
y0 = torch.zeros(batch_size, d, device=device)


class ScalarSDE(BaseSDE):

    def f(self, t, y):
        return torch.sin(y)

    def g(self, t, y):
        return torch.cos(y).sigmoid()


sde = ScalarSDE(sde_type='ito', noise_type='scalar')
bm = BrownianInterval(t0=t0, t1=t1, shape=(batch_size, 1), dtype=dtype, device=device, levy_area_approximation='foster')
sdeint(sde=sde, y0=y0, ts=[t0, t1], bm=bm)

Also it might be worthwhile to characterize exactly what API we'd like to use. I think we should go with g outputting (batch_size, d, 1) or (batch_size, d) or preferably both if possible for backward compatibility.

how to perform the sde with decreasing time

When I run the default sdeint, it says "evaluation times 'ts' must be strictly increasing", is there any setting or solver that can be used to solve the problem with decreasing time.

Project

Projet

Question about the "increment of Brownian motion"

Can someone explain what exactly does the parameter W (The increment of the Brownian motion over the interval [t0, t1]) in BrownianInterval mean mathematically?

Let's say that I have a Brownian motion with E[dB_t dB_t^T] = Qdt. Is the parameter W related to Q? If yes, how should I set the value of W given that I have Q?

Use of `np.searchsorted` in `BrownianTree`.

I think this line is slightly wrong:

torchsde/torchsde/_brownian/brownian_tree.py

Line 175 in 381f4e9

i = np.searchsorted(self._ts, t)

and should have side='right' set, as np.searchsorted([0., 1.], 0.) == 0 for example.

searchsorted is used in a few places elsewhere and I'm not completely sure if they're correct either, which is why I'm opening an issue rather than just fixing it.

Incidentally torch.bucketize is available as of 1.6 so we can switch to that now.

Slight refactor, eventually

Not suggesting that this happen now; I'm just opening an issue to keep track of it.

I think the current hierarchy of:

_brownian
_core
    methods
brownian_lib
_brownian_lib [from C++]

(not counting _core/adjoint_sdes which I know you've removed in your latest branch)

should be switched to:

_impl
    brownian
    methods
    core

(So additionally tidying the Brownian things together somehow. If we end up implementing every Brownian* in C++ then that would suffice, for example.)

Bump dependencies

It would be great if torchsde could be installed into an environment with numpy==1.20.* and scipy==1.6.*

torchsde/setup.py

Line 37 in 4c84649

 install_requires=['torch>=1.6.0', 'numpy==1.19.*', 'boltons>=20.2.1', 'trampoline>=0.1.2', 'scipy==1.5.*'], 

Query about the reproducibility of the Motion Capture dataset in "Scalable Gradients..." (Li et al., 2020)

I am trying to reproduce the results of the CMU Motion Capture dataset. I use references from examples/latent_sde_lorenz.py, the paper, and the preprocessed dataset linked by ODE2VAE's repo.

My current runs' results have large discrepancies from the results in the paper so I want to check if there are any training details I'm missing. (I am not very familiar with Bayesian modeling so I try to follow the hyperparameters in the paper as closely as possible.)

Here are the main issues:

The validation and the test MSE for Latent SDE are in the range of 20-30, while Latent SDE's Test MSE in the paper is 4.03 +- 0.20.
The log-likelihood term in the ELBO for Latent SDE is in the magnitude of 10^6 to 10^9 depending on the choice of hyperparameters, while the code for ODE2VAE has the log-likelihood for ODE2VAE in the magnitude of 10^4.

Here are the main training details I think I am most likely to wrongly interpret from the paper:

When calculating the log-likelihood, I am following your example and take the sum over all timesteps and the data dimension, then take the mean over the number of samples. Please let me know if this is correct for the CMU mocap dataset.
In training, the prediction and the log-likelihood should only be calculated for the last 297 predictions (since the first 3 observations are used to encode the context).
The solver used is not mentioned. I have tried euler_heun, milstein, and even reversible_heun.
The initial learning rate for Adam is 0.01 in the paper, but in some runs with the initial lr to be 0.001, they seem to be more stable. I'm curious if you have any comments about this.
The dt for the solver is said to be 1/5 of the minimum time difference between two observations. All observations are regular so we can choose the minimum time to be a particular value a (eg., 1), then dt would be 0.2 * a. I want to know if my interpretation here is correct. The paper didn't mention this value a or the start/end time. It would be nice if you remembered this.

Tagging @lxuechen since you know the most about the exp details. Thank you for your time!

Bug in BrownianInterval

This bug relates to my concern that there's virtually no test for BrownianInterval at the moment, and in retrospect it shouldn't have been merged into dev without testing out the aspects I mentioned in #15. Shape tests are basic, though it is helpful at most times. The change of API for the solvers also breaks backward compatibility.

The data structure couldn't produce A with the correct shape

import torch

from torchsde import BrownianInterval

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

batch_size, d, m = 16, 3, 2
t0, t1 = 0.0, 0.3
dtype = torch.get_default_dtype()

bm_general = BrownianInterval(
    t0=t0, t1=t1, shape=(batch_size, m), dtype=dtype, device=device, levy_area_approximation='foster'
)
W, U, A = bm_general(0.0, 0.1)  # Fails.
print(W.size())
print(U.size())
print(A.size())

sdeint_adjoint doesn't seem to work on GPU

Run the code in latent_sde.py. On cpu everything works fine, but when switching to GPU and using sdeint_adjoint() the following error appears:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

As mentioned above, the only difference is the use of a GPU. Tried, just in case, moving leaf variables to device and it didn't fix the problem (they had already been moved to the appropriate device in the original code). I can also confirm that all variables have requires_grad = True.

Finally, sdeint() works on both CPU and GPU.

Experiments Code in "Neural SDEs AS Infinite-Dimensional GANs"

Could you please provide the reproducible experimental source code for experiment 4.2-4.4?
Or, I would like to see the code for the reproducible metric (Classification, Prediction, MMD) to check the results of running the experiment.

thanks for reading.

GPU support for `sde_gan.py`

I am trying the run the example sde_gan.py with GPU support.
I have run the code directly as it is in a Google Colab session (with GPU execution) as well as in local and in both cases, there is almost no improvement in the running time in comparison with CPU execution.
I have looked up the code but I did not find any problem in sde_gan.py; maybe the problem is in the source files for torchsde and torchcde.

Clarification on logqp_path

Hello,

Thanks for the work, it looks great. However, I have a question about logqp_path and its involvement in KL term in latent_sde.py example. I have read the paper "Neural SDE: Stabilizing Neural ODE ..." and went through your code, but it is still not clear why logqp_path is a part of KL term in addition to logqp0 and why we compute it as 1/2*((f-h)/g)^2. It looks like log of Gaussian pdf, but I did not make a connection. I would appreciate if you can clarify how to derive logqp_path.

Jurijs

[On dev branch] New BrownianInterval hangs when `dt` is very small

I think this relates to the issue @patrick-kidger mentioned that when running the scalar diagnostics, large amounts of memory is consumed. I'm seeing this also on my machine.

I've managed to pinpoint the issue: When tb-ta is very small, the dependency tree creation phase hangs, with _set_points being repeated indefinitely. More specifically, this line.

>0.2.0 feature list

Creating a list of potential features that may be nice to have, but aren't necessarily a priority right now.

Double backward w/ adjoint (#49)
Stratonovich adjoint support for additive/scalar/diagonal noise.
Continuous integration.
Fix speed performance regression (compared to v0.1.1, the CPU-only example is 3 times as slow as before for v0.2.1) (#86)
autogenerated documentation
Add Milstein for general noise, using Levy area approximation.
Allow an f_and_g function on SDEs, that computes drift+diffusion simultaneously. (#84)

Tests:

Write tests for full Levy area.
Write tests for sdeint actually approximating the analytical solution...
Write tests that logqp is computing the thing it's supposed to compute.
Fix BI doesn't pass tests when alpha=1e-2 issue. Very likely, something is wrong.

Maybe?

Derivative-using Stratonovich Milstein w/ scalar noise doesn't look like it's getting the right weak/strong convergence behaviour. (Curved line on log-log plot).
Use batch-vjp to compute dg_ga, if and when batch-vjp ever becomes available.
Add static tensors as an input to sdeint, sdeint_adjoint
C++ BInterval/BPath (to discuss) (If we do this, then #54 is relevant)

Python version

Can we pick a Python version to support? I'd suggest >=3.6.

Future statements: currently used are absolute_import, division and print_function. As far as I know these only affect Python 2 which is out of support anyway. Should we really include them?

Likewise, the setting of annotations to {} seems unnecessary to me. The minimum verison of Python now supported by PyTorch is 3.6, and type annotations were introduced in 3.5 I believe.

I think we're already demanding at least Python 3.3 for the @property@abstractmethod to work.

example/latent_sde_lorenz different from whats written in the paper?

Hi there,

first of all thank you for your work! This is great!

Disclaimer: i don't know much about SDE, but i would love to make use of neural sde, since i have some use cases that i think could work with it. But mostly i find it highly interesting! So i hope i didn't get it all wrong entirely.

I think there are quite some differences between your code example and whats decribed in your paper.

In the paper it says, that you have a 1 layer GRU to recover the dynamics and f_net and h_net both are 1 layer MLP, while f_net takes an additional context variable of size 1 . Then, there's a decoder to map back from latent space to feature space.
So if i understand correctly, this model would work just like shown in figure 4 of your paper:
The GRU consumes a (time reversed) sequence of inputs in observation space and outputs a sequence in a 4D latent space. The final output (at t0) then is your initial condition (z0) for the sde, which is integrated through time, producing another sequence in latent space. So the dynamics would happen in latent space. Then the decocer maps back to observation space. What's a little unclear to me, is what the context would be. Just one (the last) latent variable from each of the GRU outputs?
Anyway, this describes an actual latent sde, since all the dynamics are happening in latent space.

However, in your implementation in the example, all the dynamics happen in observation space directly. f_net and h_net both map from observation space to drifts in obeservation space (well f_net again sees an additional context, which is of size 64 here). So the GRU encoder "only " provides the context, but not the initial latent state. Meaning it does not recover the dynamics, if i understood it right.
So this is not an actual "Latent SDE" model, is it? More like a "Latent informed/controlled SDE"?

Similar thing for the latent_sde example. The dynamics are learned in obesrvation space as well, so there is no latent space involved. One could claim that the latent space is equal to the observation space here of course. Or do i simply have the wrong idea of what "Latent SDE/ODE" actually means?

In general, i think this package could greatly benefit from a more in depth documentation in regards to training models with it and/or better explained examples. One or two examples of standart use cases in jupyter notebooks with detailed explanations could help a lot to make this more accessible to people that don't have much background in SDE (like me).

Misunderstandig of default Brownian Motion/Wiener process

I have a set of coupled SDE with white noise. I would like to solve it using torchsde but I completely do not understand what is exactly W(t) from documentation.

Is it Wiener process with zero mean and unit volatility? How can I add Wiener process with sigma != 1 into each equation of my set?

Our naive guess was:

    def g(self, t, number_of_equations):
        noise = np.sqrt(2*A)* torch.ones_like(number_of_equations).unsqueeze(-1)
        return noise

where 2*A is the volatility of Wiener process.

However, it seems that it does not work.

Next, consider the object BrownianInterval. We have a hope that this can help. We initialize:

bm = BrownianInterval(t0=ts[0]., t1=ts[-1]., size=(1,1), device='cuda')

and next try to solve SDE by using sdeint.

What is exactly batch_size? What is exactly brownian_size? For brownian_size I can not find any definition or any other documentation.

Finally, from documentation:

size (tuple of int): The shape of each Brownian sample. If zero dimensional represents a scalar Brownian motion. If one dimensional represents a batch of scalar Brownian motions. If >two dimensional the last dimension represents the size of a a multidimensional Brownian motion, and all previous dimensions represent batch dimensions.

but if we set size=1, we see TypeError: int object is no iterable

Could you please clarify these points?

Convolution inside drift and diffusion term

Hi,

I am re-implementing Neural SDE paper.

When I tried using Neural SDE with bunch of linear layers, every thing seem to be good, the train accuracy about 0.92 on valid set.

After that, I tried with convolution layers. First of all, I realize that f and g take (B,N) shape of input. So in the forward step of my network, I used torch.view method as a way to reshape flattened input into the square image. And continue feed the data through bunch of Convolution layer, and as final step, one more time, I use the torch.view method to reshape in to (B,N) shape. (Take a look below).

class ConvolutionDrift(nn.Module):
    def __init__(self, in_channel, size=32, device="cpu"):
        super(ConvolutionDrift,self).__init__()
        self.size=size
        self.in_channel=in_channel
        self.conv1 = ConcatConv2d(in_channel, 64,ksize=3,padding=1)
        self.norm1 = Norm(64) 
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = ConcatConv2d(64, 64, ksize=3,padding=1)
        self.norm2 = Norm(in_channel) 
        
    def forward(self,t,x):
        bs = x.shape[0]
        out = x.view(bs, self.in_channel, self.size, self.size)
#        print(f"{out.shape}\n\n\n\n")
        out = self.conv1(t,out)
        out = self.norm1(out)
        out = self.relu(out)
        out = self.conv2(t,out)
        out = self.norm2(out)
        out = self.relu(out)
        out = out.view(bs,-1)
        return out
class ConvolutionDiffusion(nn.Module):
    def __init__(self, in_channel, size=32, brownian_size = 2, device="cpu"):
        super(ConvolutionDiffusion,self).__init__()
        self.size=size
        self.in_channel=in_channel
#        self.net = nn.Sequential(*[
#            nn.Conv2d(in_channel, 64,3,padding=1),
#            nn.GroupNorm(32,64),
#            nn.ReLU(),
#            nn.Conv2d(64, 64,3,padding=1),
#            nn.GroupNorm(32,64),
#            nn.ReLU(),
#            nn.Conv2d(64,in_channel * 2,3,padding=1),
#            nn.ReLU(),
#            
#        ]).to(device)
        self.relu = nn.ReLU()
        self.norm1 = Norm(64)
        self.conv1 = ConcatConv2d(in_channel, 64, ksize=3, padding = 1)
        self.conv2 = ConcatConv2d(64,64, ksize=3, padding = 1)
        self.norm2 = Norm(64)
        self.conv3 = ConcatConv2d(64, in_channel * brownian_size, ksize = 3, padding = 1)
        self.norm3 = Norm(in_channel * brownian_size)
    def forward(self,t,x):
        bs = x.shape[0]
        out = x.view(bs, self.in_channel, self.size, self.size)
        # out = self.net(out)
        out = self.conv1(t,out)
        out = self.norm1(out)
        out = self.relu(out)
        out = self.conv2(t,out)
        out = self.norm2(out)
        out = self.relu(out)
        out = self.conv3(t,out)
        out = self.norm3(out)
        out = self.relu(out)
        out = out.view(bs,-1)
        return out
        
class SDEBlock(nn.Module):
    noise_type="general"
    sde_type="ito"
    def __init__(self, state_size, brownian_size, batch_size, option = dict(), device="cpu", parallel=False,
        method="euler", noise_type="general", integral_type="ito", is_ode=False, input_conv_channel = 64,input_conv_size=6, layers="linear"):
        super(SDEBlock, self).__init__()
        self.noise_type=noise_type
        self.sde_type=integral_type
        self.state_size = state_size
        self.batch_size = batch_size
        self.brownian_size = brownian_size
        self.parallel = parallel
        self.device = device
        self.is_ode = is_ode
        if parallel:
            self.batch_size = int(self.batch_size / 2)
        
        if layers=="linear":
            self.drift = LinearDrift(state_size, state_size).to(device)
            self.diffusion = LinearDiffusion(state_size, state_size * brownian_size).to(device)

        elif layers=="conv":
            self.drift = ConvolutionDrift(input_conv_channel, input_conv_size).to(device)
            self.diffusion = ConvolutionDiffusion(input_conv_channel, input_conv_size, brownian_size = self.brownian_size).to(device)


    def f(self,t,x):  
        out = self.drift(t,x)
        return out
        #return self.f(x)
    def g(self,t,x):
        bs = x.shape[0]
        if self.is_ode:
            out =  torch.zeros_like((self.batch_size,self.state_size, self.brownian_size)).to(self.device)
            return out
        out = self.diffusion(t,x)
        
        out =  out.view(bs, self.state_size, self.brownian_size)
        return out

        


   

"""
SDEBlock: LinearDrift dx + LinearDiffusion dW
SDENet: fe -> SDEBlock -> fcc
"""
    
    
class SDENet(Model):
    def __init__(self, input_channel, input_size, state_size, brownian_size, batch_size, option = dict(), method="euler",
        noise_type="general", integral_type="ito", device = "cpu", is_ode=False,parallel = False):
        """"""""""""""""""""""""
        super(SDENet, self).__init__()
        self.batch_size = batch_size
        self.parallel = parallel
        self.input_size = input_size
        self.option = option
        self.input_channel = input_channel
        #state_size = 64 * 14 * 14
        self.device = device
        self.fe = nn.Sequential(*[
            nn.Conv2d(input_channel,16,3,padding=1),
            nn.GroupNorm(8,16),
            nn.ReLU(),
            nn.Conv2d(16,32,4,padding=2),
            nn.GroupNorm(16,32),
            nn.ReLU(),
            nn.Conv2d(32,64,3,2),
            nn.GroupNorm(32,64),
            nn.ReLU(),

        ]).to(device)
        state_size, input_conv_channel, input_conv_size = self.get_state_size()
        self.input_conv_channel = input_conv_channel
        self.input_conv_size = input_conv_size
        
#        print(state_size, input_conv_channel, input_conv_size, "ehehehehehe\n\n\n\n")
#        print(f"Init features extraction layer with device {self.device}")
        # Output shape from (B,3,32,32) -> (B,64,14,14)
        if parallel:
            self.batch_size = int(self.batch_size /  2)
        self.rm = SDEBlock(
                state_size=state_size,
                brownian_size = brownian_size,
                batch_size = batch_size,
                option=option,
                method=method,
                integral_type=integral_type,
                noise_type=noise_type,
                device=device,
                parallel=parallel,
                is_ode=is_ode,
                input_conv_channel=input_conv_channel,
                input_conv_size=input_conv_size,
                layers="conv"
            ).to(device)


        self.fcc = nn.Sequential(*[
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Linear(input_conv_channel,10),
            nn.Softmax(dim = 1)
        ]).to(device)


        self.intergrated_time = torch.Tensor([0.0,1.0]).to(device)
        self.device = device
        self.method = method
    def get_state_size(self):
        out = torch.rand((1,self.input_channel,self.input_size, self.input_size)).to(self.device)
        shape = self.fe(out)
        return shape.view(1,-1).shape[-1], shape.shape[1], shape.shape[2]
    def forward(self,x):
        out = self.fe(x)
        bs = x.shape[0]
#        print(f"Shape after Feature Extraction Layer: {out.shape}")
        out = out.view(bs,-1)
#        print(f"After the feature extraction step, shape is: {out.shape}")
#        print(f"Device of out {out.device}")
#        print(f"Shape before the SDE Intergral: {out.shape}")
        out = sdeint(self.rm,out,self.intergrated_time, options=self.option,method="euler", atol=5e-2,rtol=5e-2, dt=0.1, dt_min=0.05,adaptive=True)[-1]
        out = out.view(bs,self.input_conv_channel, self.input_conv_size, self.input_conv_size)
        out = self.fcc(out)
        return out

The code run with no any mistake. But there is a thing that the results is not so good. It usually converges at second / thrid epoch.
And accuracy with convolution layers is not good at all.
So my question here is: "Will the convolution layer is available in torchsde by someway or the loss.backward() does not update ConcatConv2ds' parameters?"

Continuous integration

Examples for paper's toy problems

Hey,

Could you guys share the example code for the toy problems presented in the paper? Especially the one for Stochastic Lorenz Attractor?

I was looking at how to solve a system of stochastic differential equations using torchsde where there is inter-dependecy between some of the functions. Can you help me out?

Cheers,
Miguel

Missing attributes of `brownian_lib`

When I run the tests, it looks like BrownianTree and BrownianPath are missing in brownian_lib module.
After inspecting the test itself, it looks like BrownianTree and BrownianPath from brownian_lib and _brownian should work compatibly.

I think there are two ways to fix it:

add an if hasattr check in the testing code
add import statements from _brownian in brownian_lib/__init__.py if the imports in the try block fails.

Learned SDE is quite smooth

Hi,

I am trying to modify the latent_sde_lorenz example to learn an SDE from a simple 2-dimension noise data. The code works well but the only problem is that the learned SDE will always generate quite smooth samples.

I'd like the learned SDE to show the same amount of stochasticity as the data, any idea how to deal with this? Is there any parameter that needs to be tuned?

torchsde vs DiffEqFlux.jl

Hi,

First of all, thanks a lot for the work, this is excellent research!

I am aware of a Julia library that does roughly the same thing (https://github.com/SciML/DiffEqFlux.jl).
Could you please describe what the differences are in your opinion (either algorithmic differences or interface ones)? So far the only one I can see relates to the fact that it seems easier to request a batch of trajectories in torchsde than it is in DiffEqFlux (might be wrong though).

Thanks a lot

Adrien