Giter Club home page Giter Club logo

wotan's Introduction

Logo

pip Documentation Image Image Build Status

Wōtan...

...offers free and open source algorithms to automagically remove trends from time-series data.

In Germanic mythology, Odin (/ˈoːðinː/ Old High German: Wōtan) is a widely revered god. He gave one of his eyes to Mimir in return for wisdom. Thus, in order to achieve a goal, one sometimes has to turn a blind eye. In Richard Wagner's "Der Ring des Nibelungen", Wotan is the King of the Gods (god of light, air, and wind) and a bass-baritone. According to Wagner, he is the "pinnacle of intelligence".

Example usage

from wotan import flatten
flatten_lc, trend_lc = flatten(time, flux, window_length=0.5, method='biweight', return_trend=True)

For more details, have a look at the interactive playground, the documentation. We also have examples and tutorials available, such as the 📑Example: Basic wotan functionality

Available detrending algorithms

Available features

  • window_length The length of the filter window in units of time (usually days).
  • break_tolerance If there are large gaps in time, especially with corresponding flux level offsets, the detrending is much improved when splitting the data into several sub-lightcurves and applying the filter to each individually. Comes with an empirical default and is fully adjustable.
  • edge_cutoff Trends near edges are less robust. Depending on the data, it may be beneficial to remove edges.
  • cval Tuning parameter for the robust estimators (see documentation)
  • return_trend If True, the method will return a tuple of two elements (flattened_flux, trend_flux) where trend_flux is the removed trend. Otherwise, it will only return flattened_flux.
  • transit_mask Mask known transits during detrending (📑Example)

What method to choose?

It depends on your data and what you like to achieve (relevant xkcd). If possible, try it out! Use wotan with a selection of methods, iterate over their parameter space, and choose what gives the best results for your research.

If that is too much effort, you should first examine your data.

  • Is it mostly white (Gaussian) noise? Use a time-windowed sliding mean. This is the most efficient method for white noise.
  • With prominent outliers (such as transits or flares), use a robust time-windowed method such as the biweight. This is usually superior to the median or trimmed methods.
  • Are there (semi-) periodic trends? In addition to a time-windowed biweight, try a spline-based method. Experimenting with periodic GPs is worthwhile.

Installation

To install the released version, type

$ pip install wotan

which automatically installs numpy, numba and scipy if not present. Depending on the algorithm, additional dependencies exist:

  • huber, ramsay, and hampel depend on statsmodels
  • hspline and gp depend on sklearn
  • pspline depends on pygam
  • supersmoother depends on supersmoother

To install all additional dependencies, type $ pip install statsmodels sklearn supersmoother pygam.

Originality

As all scientific work, wōtan is standing on the shoulders of giants. Particularly, many detrending methods are wrapped from existing packages. Original contributions include:

  • A time-windowed detrending master module with edge treatments and segmentation options
  • Robust location estimates using Newton-Raphson iteration to a precision threshold for Tukey's biweight, Andrew's sine wave, and the Welsch-Leclerc. This is probably a "first", which reduces jitter in the location estimate by ~10 ppm
  • Robustified (iterative sigma-clipping) penalized splines for automatic knot distance determination and outlier resistance
  • Robustified (iterative sigma-clipping) Gaussian processes
  • GP with a periodic kernel informed by a Lomb-Scargle periodogram pre-search
  • Bringing together many methods in one place in a common interface, with sensible defaults
  • Providing documentation, tutorials, and a paper which compares and benchmarks the methods

Attribution

Please cite Hippke et al. (2019, AJ, 158, 143) if you find this code useful in your research. The BibTeX entry for the paper is:

@ARTICLE{2019AJ....158..143H,
       author = {{Hippke}, Michael and {David}, Trevor J. and {Mulders}, Gijs D. and
         {Heller}, Ren{\'e}},
        title = "{W{\={o}}tan: Comprehensive Time-series Detrending in Python}",
      journal = {\aj},
     keywords = {eclipses, methods: data analysis, methods: statistical, planetary systems, planets and satellites: detection, Astrophysics - Earth and Planetary Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics},
         year = "2019",
        month = "Oct",
       volume = {158},
       number = {4},
          eid = {143},
        pages = {143},
          doi = {10.3847/1538-3881/ab3984},
archivePrefix = {arXiv},
       eprint = {1906.00966},
 primaryClass = {astro-ph.EP},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2019AJ....158..143H},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}


wotan's People

Contributors

hippke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wotan's Issues

Add L1 (Koh, Kim, Boyd)

Recommended by: https://arxiv.org/pdf/1908.07151.pdf
Example using cvxopt

n = y.size
e = np.ones((1, n))
D = scipy.sparse.spdiags(np.vstack((e, -2*e, e)), range(3), n-2, n)
x = cp.Variable(shape=n)
obj = cp.Minimize(0.5 * cp.sum_squares(y - x) + degree * cp.norm(D*x, 1))
prob = cp.Problem(obj).solve()

Large memory use

Great software! Running into a memory issue though I was hoping you could help with. I have a simple for loop with on order ~50-500 iterations and inside each one I run wotan.flatten on a Kepler light curve (long 30 minute cadence, so arrays of ~60,000 elements. The aim is to assess how good different detrending methods are.

I am finding that according to the Activity Monitor a very large amount of memory is being taken up with this, on the order of 10's of GB. It is also making it impossible to CTRL-C cancel out of the python loop.

Even if I just do single instances of wotan.flatten it takes about 100 MB of memory per run and that does not seem to get released until I restart python.

Have you encountered this before? Is there an easy way of clearing up the memory? I have tried Garbage Collection (gc.collect) without luck.

Thanks!

Expand ``transit_mask`` to more methods

Easily possible:

  • Huber spline
  • GP
  • iterspline with second mask
  • pspline, ridge, lasso, elasticnet using new axis?

???

  • Sliders? If window becomes empty, use different interpolation?

Perhaps not useful

  • SuperSmoother easy (using weights), but no users?
  • Same for cofiam, savgol?

Supersmoother window

window if condition in code commented out. Check and test if working correctly.

'MaskedArray' object not callable

I'm trying to perform a process as follows:

Detrend (WOTAN)--> search and find a signal (TLS) --> set a mask corresponding to the planetary signal in the original data and detrend again (WOTAN) --> search for a second signal (TLS) --> etc.

The point is that my pipeline explores several detrends searching for the best one (maximize the SNR SDE etc.). Once a planet has been detected, I would like to remove it from the raw data and detrend again (exploring several options) previous the next search. I would like to use the raw data in this second step instead of the already detrended just to be sure that my next choice of best detrend model is not affected by the existence of the first planet. Unfortunately, I have this error:

flatten_lc, trend_lc = flatten(time_second_run, flatten_second_run, method='gp', kernel='matern', kernel_size=wl[i], return_trend=True, break_tolerance=0.5)
TypeError: 'MaskedArray' object is not callable

does it mean I can not set a mask in the data that will be read in flatten command? if yes, is there any way to solve?

Thanks a lot!

Add Bayesian Adaptive Regression Splines

BARS (DiMatteo, Genovese, and Kass 2001) uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors), as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise) while adapting to sudden changes (retaining high-frequency signal). However, BARS is computationally intensive.

https://gist.github.com/hippke/817fa533e21f0899026452d722aac44a

LS-GP

Produces warnings. Test more cases.

Add Hodrick–Prescott filter

def hp_filter(x, lamb=1600):
    w = len(x)
    b = [[1]*w, [-2]*w, [1]*w]
    D = scipy.sparse.spdiags(b, [0, 1, 2], w-2, w)
    I = scipy.sparse.eye(w)
    B = (I + lamb*(D.transpose()*D))
    return scipy.sparse.linalg.dsolve.spsolve(B, x)

GP vs. biweight

Hi,
this is not really an issue, it just an usage concern.
I have been exploring different algorithms of WOTAN, and I notice that biweight algorithm yield almost the same result than a GP (whatever kernel) but much more faster. Then, my question is about why use a GP model? is somehow better?

thanks in advance!

More things in paper

Re-test

  • untrendy
  • periodic GP
  • supersmoother (new params)

Add

  • winsorize
  • huber slider
  • penalized spline

new cosine filter and cofiam

  • Change max order to 100 (?)
  • Add to paper
  • Add to doc, tutorial, examples
  • Add tests for travis
  • Add GitHub feature list
  • Release on github and pypi

adding model with interpolated data

Hi, I wonder if it's a good idea to add interpolated model for values with large gaps. For example, TESS data always has large gap mid-way each sector. As shown in the figure below, it would be great if best model prediction during data gap (+ uncertainty) can be superposed after flattening the data e.g. using gp method.

gp

p-splines

  • robustify
  • test
  • integrate from separate file into wotan
  • document / tutorial
  • Determine sensible PSPLINES_MAX_SPLINES

Kernel size

Hi!
which is the physical meaning of the kernel-size that have to be chosen when using a GP model?

from wotan import flatten
flatten_lc1, trend_lc1 = flatten(time, flux, kernel_size=5, return_trend=True, method='gp', kernel='matern')

Not sure how to use this GP model correctly, or how to make it as much efficient as possible..

Thanks!

may ground based observations benefit from WOTAN?

Hi,
in my group we normally apply polinomial detrending taking into account different aspects such as airmass, x-y position, FWHM etc. for our ground based observations. May I use the GP detrending algorithm available in WOTAN instead of our typical polinomials... or GP (or other algorithm available in WOTAN) are only for space missions such as Kepler, TESS etc?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.