wotan's Introduction

Wōtan...

...offers free and open source algorithms to automagically remove trends from time-series data.

In Germanic mythology, Odin (/ˈoːðinː/ Old High German: Wōtan) is a widely revered god. He gave one of his eyes to Mimir in return for wisdom. Thus, in order to achieve a goal, one sometimes has to turn a blind eye. In Richard Wagner's "Der Ring des Nibelungen", Wotan is the King of the Gods (god of light, air, and wind) and a bass-baritone. According to Wagner, he is the "pinnacle of intelligence".

Example usage

from wotan import flatten
flatten_lc, trend_lc = flatten(time, flux, window_length=0.5, method='biweight', return_trend=True)

For more details, have a look at the interactive playground, the documentation. We also have examples and tutorials available, such as the 📑Example: Basic wotan functionality

Available detrending algorithms

Time-windowed sliders with location estimates: (📑Example: Comparison of sliders)
- biweight Robust M-estimator using Tukey's biweight (📑Example)
- huber Robust M-estimator from Huber (1981) (iterative)
- huber_psi Robust M-estimator based on Huber's ψ (one-step)
- hampel Robust M-estimator based on Hampel (1972), 3-part descending, known as (a,b,c), 17A, 25A
- andrewsinewave Robust M-estimator using Andrew's sine wave
- welsch Robust M-estimator from Welsch-Leclerc
- ramsay Robust M-estimator from Ramsay (1977), known as Ramsay's E^a
- tau Robust τ estimator from Yohai & Zamar (1986)
- hodges Rank-based robust R-estimator Hodges-Lehmann-Sen
- median The most robust (but least efficient)
- medfilt A cadence-based median filter (not time-windowed) for comparison
- mean The least robust (but most efficient for white noise)
- trim_mean Trimmed mean (outliers are removed)
- winsorize Trimmed mean (outliers are winsorized to a specified percentile)
- hampelfilt Trimmed mean (outliers are replaced with the median)
Splines: (📑Example)
- rspline Spline with iterative sigma-clipping
- hspline Spline with a robust Huber estimator (Huber 1981)
- pspline Penalized spline to automatically select the knot distance (Eilers 1996), with iterative sigma-clipping
Polynomials and sines: (📑Example)
- cofiam Cosine Filtering with Autocorrelation Minimization (Kipping et al. 2013)
- cosine Sum of sines and cosines, with option for iterative sigma-clipping
- savgol Sliding segments are fit with polynomials (Savitzky & Golay 1964), cadence-based
Regressions: (📑Example)
- lowess Locally weighted scatterplot smoothing (Cleveland 1979)
- supersmoother Friedman's (1984) Super-Smoother, a local linear regression with adaptive bandwidth
Fitting a model that is a sum of Gaussian bases: (📑Example)
- ridge Ridge regression (L2 loss, Tikhonov regularization)
- lasso LASSO regression (L1 loss, Least Absolute Shrinkage Selector Operator, Tibshirani (1996))
- elasticnet Linear regression model with 50% L1 and 50% L2 norm regularization
gp Gaussian Processes offering: (📑Example: GP Standard vs. robust)
- squared_exp Squared-exponential kernel, with option for iterative sigma-clipping
- matern Matern 3/2 kernel, with option for iterative sigma-clipping
- periodic Periodic kernel informed by a user-specified period (📑Example)
- periodic_auto Periodic kernel informed by a Lomb-Scargle periodogram pre-search

Available features

window_length The length of the filter window in units of time (usually days).
break_tolerance If there are large gaps in time, especially with corresponding flux level offsets, the detrending is much improved when splitting the data into several sub-lightcurves and applying the filter to each individually. Comes with an empirical default and is fully adjustable.
edge_cutoff Trends near edges are less robust. Depending on the data, it may be beneficial to remove edges.
cval Tuning parameter for the robust estimators (see documentation)
return_trend If True, the method will return a tuple of two elements (flattened_flux, trend_flux) where trend_flux is the removed trend. Otherwise, it will only return flattened_flux.
transit_mask Mask known transits during detrending (📑Example)

What method to choose?

It depends on your data and what you like to achieve (relevant xkcd). If possible, try it out! Use wotan with a selection of methods, iterate over their parameter space, and choose what gives the best results for your research.

If that is too much effort, you should first examine your data.

Is it mostly white (Gaussian) noise? Use a time-windowed sliding mean. This is the most efficient method for white noise.
With prominent outliers (such as transits or flares), use a robust time-windowed method such as the biweight. This is usually superior to the median or trimmed methods.
Are there (semi-) periodic trends? In addition to a time-windowed biweight, try a spline-based method. Experimenting with periodic GPs is worthwhile.

Installation

To install the released version, type

$ pip install wotan

which automatically installs numpy, numba and scipy if not present. Depending on the algorithm, additional dependencies exist:

huber, ramsay, and hampel depend on statsmodels
hspline and gp depend on sklearn
pspline depends on pygam
supersmoother depends on supersmoother

To install all additional dependencies, type $ pip install statsmodels sklearn supersmoother pygam.

Originality

As all scientific work, wōtan is standing on the shoulders of giants. Particularly, many detrending methods are wrapped from existing packages. Original contributions include:

A time-windowed detrending master module with edge treatments and segmentation options
Robust location estimates using Newton-Raphson iteration to a precision threshold for Tukey's biweight, Andrew's sine wave, and the Welsch-Leclerc. This is probably a "first", which reduces jitter in the location estimate by ~10 ppm
Robustified (iterative sigma-clipping) penalized splines for automatic knot distance determination and outlier resistance
Robustified (iterative sigma-clipping) Gaussian processes
GP with a periodic kernel informed by a Lomb-Scargle periodogram pre-search
Bringing together many methods in one place in a common interface, with sensible defaults
Providing documentation, tutorials, and a paper which compares and benchmarks the methods

Attribution

Please cite Hippke et al. (2019, AJ, 158, 143) if you find this code useful in your research. The BibTeX entry for the paper is:

@ARTICLE{2019AJ....158..143H,
       author = {{Hippke}, Michael and {David}, Trevor J. and {Mulders}, Gijs D. and
         {Heller}, Ren{\'e}},
        title = "{W{\={o}}tan: Comprehensive Time-series Detrending in Python}",
      journal = {\aj},
     keywords = {eclipses, methods: data analysis, methods: statistical, planetary systems, planets and satellites: detection, Astrophysics - Earth and Planetary Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics},
         year = "2019",
        month = "Oct",
       volume = {158},
       number = {4},
          eid = {143},
        pages = {143},
          doi = {10.3847/1538-3881/ab3984},
archivePrefix = {arXiv},
       eprint = {1906.00966},
 primaryClass = {astro-ph.EP},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2019AJ....158..143H},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

wotan's People

Contributors

Stargazers

Watchers

wotan's Issues

Usage examples in documentation

One for every single method

Hampel filter

Replace trimmed values with median

https://books.google.de/books?id=f9-YCgAAQBAJ&pg=PA138&lpg=PA138&dq=hampel+estimator+python&source=bl&ots=iXRyNATiXn&sig=ACfU3U0k9NT7s5AGS-Z4LBDyQJCNCF2lSg&hl=de&sa=X&ved=2ahUKEwiH3rio-_zhAhV2VRUIHfAVDzUQ6AEwB3oECAkQAQ#v=onepage&q=hampel%20estimator%20python&f=false

Add documentation for ``transit_mask``

Currently only works with cosine

Robustify GPs

Huber slider cval=1.5 fails to converge

sometimes?

n = y.size
e = np.ones((1, n))
D = scipy.sparse.spdiags(np.vstack((e, -2*e, e)), range(3), n-2, n)
x = cp.Variable(shape=n)
obj = cp.Minimize(0.5 * cp.sum_squares(y - x) + degree * cp.norm(D*x, 1))
prob = cp.Problem(obj).solve()

Large memory use

Great software! Running into a memory issue though I was hoping you could help with. I have a simple for loop with on order ~50-500 iterations and inside each one I run wotan.flatten on a Kepler light curve (long 30 minute cadence, so arrays of ~60,000 elements. The aim is to assess how good different detrending methods are.

I am finding that according to the Activity Monitor a very large amount of memory is being taken up with this, on the order of 10's of GB. It is also making it impossible to CTRL-C cancel out of the python loop.

Even if I just do single instances of wotan.flatten it takes about 100 MB of memory per run and that does not seem to get released until I restart python.

Have you encountered this before? Is there an easy way of clearing up the memory? I have tried Garbage Collection (gc.collect) without luck.

Thanks!

Add progress bar option

Make fast huber estimator

use numba instead of statsmodels
make running_segment numba again
remove MAXITER constant

Expand ``transit_mask`` to more methods

Easily possible:

Huber spline
GP
iterspline with second mask
pspline, ridge, lasso, elasticnet using new axis?

???

Sliders? If window becomes empty, use different interpolation?

Perhaps not useful

SuperSmoother easy (using weights), but no users?
Same for cofiam, savgol?

Butterworth

https://oceanpython.org/2013/03/11/signal-filtering-butterworth-filter/

Reduce dependencies (e.g., statsmodels)

Change estimators Hampel, Huber, Ramsay from RLS to estimate_location
lowess: Convert Cython to numba?

Supersmoother window

window if condition in code commented out. Check and test if working correctly.

T14

'MaskedArray' object not callable

I'm trying to perform a process as follows:

Detrend (WOTAN)--> search and find a signal (TLS) --> set a mask corresponding to the planetary signal in the original data and detrend again (WOTAN) --> search for a second signal (TLS) --> etc.

The point is that my pipeline explores several detrends searching for the best one (maximize the SNR SDE etc.). Once a planet has been detected, I would like to remove it from the raw data and detrend again (exploring several options) previous the next search. I would like to use the raw data in this second step instead of the already detrended just to be sure that my next choice of best detrend model is not affected by the existence of the first planet. Unfortunately, I have this error:

flatten_lc, trend_lc = flatten(time_second_run, flatten_second_run, method='gp', kernel='matern', kernel_size=wl[i], return_trend=True, break_tolerance=0.5)
TypeError: 'MaskedArray' object is not callable

does it mean I can not set a mask in the data that will be read in flatten command? if yes, is there any way to solve?

Thanks a lot!

Add option to mask transit for detrending

PyPI package

Add Bayesian Adaptive Regression Splines

BARS (DiMatteo, Genovese, and Kass 2001) uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors), as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise) while adapting to sudden changes (retaining high-frequency signal). However, BARS is computationally intensive.

https://gist.github.com/hippke/817fa533e21f0899026452d722aac44a

LS-GP

Produces warnings. Test more cases.

replace untrendy

Code coverage with numba

See https://github.com/HazyResearch/numbskull/blob/master/.travis.yml

Add Hodrick–Prescott filter

def hp_filter(x, lamb=1600):
    w = len(x)
    b = [[1]*w, [-2]*w, [1]*w]
    D = scipy.sparse.spdiags(b, [0, 1, 2], w-2, w)
    I = scipy.sparse.eye(w)
    B = (I + lamb*(D.transpose()*D))
    return scipy.sparse.linalg.dsolve.spsolve(B, x)

Tutorials

Add robust GP

Here's the model in PyMC3. We use a Gamma(2,1) prior over the lengthscale parameter, and weakly informative HalfCauchy(5) priors over the covariance function scale, and noise scale. A Gamma(2,0.1) prior is assigned to the degrees of freedom parameter of the noise. Finally, a GP prior is placed on the unknown function.

https://nbviewer.jupyter.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section5_1-Gaussian-Processes.ipynb

Computation of the RMS of a given lightcuve

Hi,

I was wondering if there is any way with WOTAN to compute the RMS let's say from TESS data.
If yes, should this be computed before or after detrended ?

Thanks!

Interactive demo

Add Kalman filter

Robust? Two-sided?
https://github.com/milsto/robust-kalman

GP vs. biweight

Hi,
this is not really an issue, it just an usage concern.
I have been exploring different algorithms of WOTAN, and I notice that biweight algorithm yield almost the same result than a GP (whatever kernel) but much more faster. Then, my question is about why use a GP model? is somehow better?

thanks in advance!

More things in paper

Re-test

untrendy
periodic GP
supersmoother (new params)

Add

winsorize
huber slider
penalized spline

Robust Kalman

https://github.com/milsto/robust-kalman

new cosine filter and cofiam

Make submodules

not one long file

Add piecewise polynomials

https://jekel.me/piecewise_linear_fit_py/examples.html#use-custom-optimization-routine

segments = 15
degree = 3
my_pwlf = pwlf.PiecewiseLinFit(time, flux, degree)
res = my_pwlf.fitfast(segments)
trend = my_pwlf.predict(time)

adding model with interpolated data

Hi, I wonder if it's a good idea to add interpolated model for values with large gaps. For example, TESS data always has large gap mid-way each sector. As shown in the figure below, it would be great if best model prediction during data gap (+ uncertainty) can be superposed after flattening the data e.g. using gp method.

PyPI update

p-splines

Make rspline more reliable

Kernel size

Hi!
which is the physical meaning of the kernel-size that have to be chosen when using a GP model?

from wotan import flatten
flatten_lc1, trend_lc1 = flatten(time, flux, kernel_size=5, return_trend=True, method='gp', kernel='matern')

Not sure how to use this GP model correctly, or how to make it as much efficient as possible..

Thanks!

Test faster GP

https://nbviewer.jupyter.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section5_1-Gaussian-Processes.ipynb

Add uncertainties

biweight, other sliders: See literature
lowess: Bootstrapping? https://stackoverflow.com/questions/31104565/confidence-interval-for-lowess-in-python

Student's t mean estimator

https://users.ece.cmu.edu/~vsaragad/files/misc/vishwa_robust.pdf

may ground based observations benefit from WOTAN?

Hi,
in my group we normally apply polinomial detrending taking into account different aspects such as airmass, x-y position, FWHM etc. for our ground based observations. May I use the GP detrending algorithm available in WOTAN instead of our typical polinomials... or GP (or other algorithm available in WOTAN) are only for space missions such as Kepler, TESS etc?

Thanks a lot!

hippke / wotan Goto Github PK

wotan's Introduction

Wōtan...

Example usage

Available detrending algorithms

Available features

What method to choose?

Installation

Originality

Attribution

wotan's People

Contributors

Stargazers

Watchers

Forkers

wotan's Issues

Recommend Projects

Recommend Topics

Recommend Org