Giter Club home page Giter Club logo

saxpy's Introduction

Time series symbolic discretization with SAX

Latest PyPI version Latest Travis CI build status image image

This code is released under GPL v.2.0 and implements in Python:

  • Symbolic Aggregate approXimation (SAX) (with z-normalization and PAA) [1]
  • EMMA -- an algorithm for time series motif discovery [2]
  • HOT-SAX - a time series anomaly (discord) discovery algorithm [3]
  • SAX-ZSCORE - an extension to SAX for multi-dimensional time-series SAX that modifies the z-normalization to multi-dimensional sequences and PAA aggregation with the average along data-dimensions. [4]
  • SAX-REPEAT - an extension to SAX for multi-dimensional time-series that performs standard SAX on individual dimensions, then clusters to map multi-dimensional words into strings from the required alphabet size. [4]

Note that all of the library's functionality is also available in R and Java.

References

[1] Lin, J., Keogh, E., Patel, P., and Lonardi, S.,
Finding Motifs in Time Series, The 2nd Workshop on Temporal Data Mining, the 8th ACM Int'l Conference on KDD (2002)

[2] Patel, P., Keogh, E., Lin, J., Lonardi, S.,
Mining Motifs in Massive Time Series Databases, In Proc. ICDM (2002)

[3] Keogh, E., Lin, J., Fu, A.,
HOT SAX: Efficiently finding the most unusual time series subsequence, In Proc. ICDM (2005)

[4] Mohammad, Y., Nishida T.,
Robust learning from demonstrations using multidimensional SAX, 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014)

Citing this work

If you are using this implementation for you academic work, please cite our Grammarviz 2.0 paper:

[Citation] Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S., Lerner, M., GrammarViz 2.0: a tool for grammar-based pattern discovery in time series, ECML/PKDD Conference, 2014.

SAX in a nutshell

SAX is used to transform a sequence of rational numbers (i.e., a time series) into a sequence of letters (i.e., a string). An illustration of a time series of 128 points converted into the word of 8 letters:

SAX in a nutshell

As discretization is probably the most used transformation in data mining, SAX has been widely used throughout the field. Find more information about SAX at its authors pages: SAX overview by Jessica Lin, Eamonn Keogh's SAX page, or at sax-vsm wiki page.

Building

The code is written in Python and hosted on PyPi, so use pip to install it. This is what happens in my clean test environment:

$ pip install saxpy
Collecting saxpy
Downloading saxpy-1.0.0.dev154.tar.gz (180kB)
	100% |████████████████████████████████| 184kB 778kB/s 
Requirement already satisfied: numpy in /home/psenin/anaconda3/lib/python3.6/site-packages (from saxpy)
Requirement already satisfied: pytest in /home/psenin/anaconda3/lib/python3.6/site-packages (from saxpy)
...
Installing collected packages: coverage, pytest-cov, codecov, saxpy
Successfully installed codecov-2.0.15 coverage-4.5.1 pytest-cov-2.5.1 saxpy-1.0.0.dev154

Simple time series to SAX conversion

To convert a time series of an arbitrary length to SAX we need to define the alphabet cuts. Saxpy retrieves cuts for a normal alphabet (we use size 3 here) via cuts_for_asize function:

from saxpy.alphabet import cuts_for_asize
cuts_for_asize(3)

which yields an array:

array([      -inf, -0.4307273,  0.4307273])

To convert a time series to letters with SAX we use ts_to_string function but not forgetting to z-normalize the input time series (we use Normal alphabet):

import numpy as np
from saxpy.znorm import znorm
from saxpy.sax import ts_to_string
ts_to_string(znorm(np.array([-2, 0, 2, 0, -1])), cuts_for_asize(3))

this produces a string:

'abcba'

Time series to SAX conversion with PAA aggregation (by "chunking")

In order to reduce dimensionality further, the PAA (Piecewise Aggregate Approximation) is usually applied prior to SAX:

import numpy as np
from saxpy.znorm import znorm
from saxpy.paa import paa
from saxpy.sax import ts_to_string

dat = np.array([-2, 0, 2, 0, -1])
dat_znorm = znorm(dat)
dat_paa_3 = paa(dat_znorm, 3)

ts_to_string(dat_paa_3, cuts_for_asize(3))

and a string with three letters is produced:

'acb'

Time series to SAX conversion via sliding window

Typically, in order to investigate the input time series structure in order to discover anomalous (i.e., discords) and recurrent (i.e., motifs) patterns we employ time series to SAX conversion via sliding window. Saxpy implements this workflow:

import numpy as np
from saxpy.sax import sax_via_window

dat = np.array([0., 0., 0., 0., 0., -0.270340178359072, -0.367828308500142,
            0.666980581124872, 1.87088147328446, 2.14548907684624,
            -0.480859313143032, -0.72911654245842, -0.490308602315934,
            -0.66152028906509, -0.221049033806403, 0.367003418871239,
            0.631073992586373, 0.0487728723414486, 0.762655178750436,
            0.78574757843331, 0.338239686422963, 0.784206454089066,
            -2.14265084073625, 2.11325193044223, 0.186018356196443,
            0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.519132472499234,
            -2.604783141655, -0.244519550114012, -1.6570790528784,
            3.34184602886343, 2.10361226260999, 1.9796808733979,
            -0.822247322003058, 1.06850578033292, -0.678811824405992,
            0.804225748913681, 0.57363964388698, 0.437113583759113,
            0.437208643628268, 0.989892093383503, 1.76545983424176,
            0.119483882364649, -0.222311941138971, -0.74669456611669,
            -0.0663660879732063, 0., 0., 0., 0., 0.,])

sax_none = sax_via_window(dat, win_size=6, paa_size=3, alphabet_size=3, nr_strategy=None, znorm_threshold=0.01)

sax1

the result is represented as a data structure of resulting words and their respective positions on time series:

defaultdict(list,
        {'aac': [4, 10, 11, 30, 35],
         'abc': [12, 14, 36, 44],
         'acb': [5, 16, 21, 37, 43],
         'acc': [13, 52, 53],
         'bac': [3, 19, 34, 45, 51],
         'bba': [31],
         'bbb': [15, 18, 20, 22, 25, 26, 27, 28, 29, 41, 42, 46],
         'bbc': [2],
         'bca': [6, 17, 32, 38, 47, 48],
         'caa': [8, 23, 24, 40],
         'cab': [9, 50],
         'cba': [7, 39, 49],
         'cbb': [33],
         'cca': [0, 1]})

sax_via_window is parameterised with a sliding window size, desired PAA aggregation, alphabet size, a numerosity reduction strategy, z-normalization threshold, and a SAX type ('unidim' for unidimensional SAX (default), 'zscore' for SAX-ZSCORE, 'repeat' for SAX-REPEAT):

def sax_via_window(series, win_size, paa_size, alphabet_size=3,
               nr_strategy='exact', z_threshold=0.01, sax_type='unidim')

Time series discord discovery with HOT-SAX

Saxpy implements HOT-SAX discord discovery algorithm in find_discords_hotsax function which can be used as follows:

import numpy as np
from saxpy.hotsax import find_discords_hotsax
from numpy import genfromtxt
dd = genfromtxt("data/ecg0606_1.csv", delimiter=',')  
discords = find_discords_hotsax(dd)
discords

and discovers anomalies easily:

[(430, 5.2790800061718386), (318, 4.1757563573086953)]

The function has a similar parameterization: sliding window size, PAA and alphabet sizes, z-normalization threshold, and a parameter specifying how many discords are desired to be found:

def find_discords_hotsax(series, win_size=100, num_discords=2, a_size=3,
                     paa_size=3, z_threshold=0.01)

Saxpy also provides a brute-force implementation of the discord search if you'd like to verify discords or evaluate the speed-up:

find_discords_brute_force(series, win_size, num_discords=2,
                          z_threshold=0.01)

which can be called as follows:

discords = find_discords_brute_force(dd[100:500], 100, 4)
discords

[(73, 6.198555329625453), (219, 5.5636923991016136)]

Time series motif discovery with EMMA

ToDo...

Made with Aloha!

Made with Aloha!

saxpy's People

Contributors

ameya98 avatar psenin-sanofi avatar seninp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

saxpy's Issues

Sliding window - adjust sliding size

I am trying to use a sliding window method and if I understand correctly I can only adjust window size, but the sliding size is always 1, am I correct?

Is there a way to change this?
In my example, I have hourly values, and I want to have daily windows ( win_size=24 and also sliding size 24).

Thanks

install problem due to pbr weirdness + possible solution

When I tried to install saxpy by cloning the repo and running python setup.py install with Python 3.6.3 and all of the requirements already installed, I observed the following exception:

ERROR:root:Error parsing                                                                                                                                                                           
Traceback (most recent call last):                                                                                                                                                                 
  File "/home/lebedov/saxpy/.eggs/pbr-3.1.1-py3.6.egg/pbr/core.py", line 111, in pbr                                                                                                     
    attrs = util.cfg_to_args(path, dist.script_args)                                                                                                                                               
  File "/home/lebedov/saxpy/.eggs/pbr-3.1.1-py3.6.egg/pbr/util.py", line 267, in cfg_to_args                                                                                             
    wrap_commands(kwargs)                                                                                                                                                                          
  File "/home/lebedov/saxpy/.eggs/pbr-3.1.1-py3.6.egg/pbr/util.py", line 569, in wrap_commands                                                                                           
    cmdclass = ep.resolve()                                                                                                                                                                        
  File "/home/lebedov/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2437, in resolve                                                                                    
    module = __import__(self.module_name, fromlist=['__name__'], level=0)                                                                                                                          
ModuleNotFoundError: No module named 'tasks'                                                                                                                                                       
error in setup command: Error parsing /home/lev/Work/projects/saxpy/setup.cfg: ModuleNotFoundError: No module named 'tasks'  

This seems to be due to some weirdness in pbr that appears to have affected other Python packages. Installing the cli_helpers package resolved the problem; perhaps it should be added to setup_requires in setup.py?

A stable version tag?

Hi, I'm trying to add this package to Conda Forge. But the only version on PyPI is a dev build and there isn't a tag here on GH that's a stable version. Can a stable tag be created?

Docs

I'm pretty sure that in the README that

sax_none = sax_via_window(dat, win_size=6, paa_size=3, alphabet_size=3, nr_strategy=None, znorm_threshold=0.01)

sax1

should be

sax_none = sax_via_window(dat, win_size=6, paa_size=3, alphabet_size=3, nr_strategy=None, z_threshold=0.01)

sax_none

Please optimize ts_to_string

I'd like to suggest an improvement once function ts_to_string takes a while to process.

def ts_to_string(series, cuts):

import pandas as pd
import string
from saxpy.alphabet import cuts_for_asize

import 
alphabet = 
cuts = cuts_for_asize(alphabet) # cuts_for_asize must include upper interval inf
lables = list(string.ascii_lowercase)[:alphabet]

ts_to_string = pd.cut(dat['znorm'], bins=cuts, labels=labels)


Missing indices in output using sliding window

When running the following code:

sax_via_window([0, 0, 1, 0, 0, 0, 0, 1, 0], 3, 3, 3)

I receive the following output:

defaultdict(list, {'aac': [0, 5], 'aca': [1], 'caa': [2], 'bbb': [3]})

It is clear to me why each index corresponds to a given discretisation, but I do not understand why index 4 does not have a value. Is this because it is the same as the previous value, namely bbb? Can I always assume that a missing value means that it is the same as the previous value?

Anomaly Scores

Hi,
I am wondering how I can get an anomaly score for each point of a time series using the HOT SAX algorithm. Somehow, I do not know how to use the num_discords parameter in the find_discords_brute_force function. It cannot be known in advance, how many anomalies I have in a certain time series. In such cases it would be more helpful to have a anomaly score for each point (or for segments) of a time series...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.