Giter Club home page Giter Club logo

pyculiarity's People

Contributors

michael-erasmus avatar msouder avatar nicolasmiller avatar vmarkovtsev avatar zrnsm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyculiarity's Issues

r.py doesn't support python 2 any more

Collecting rpy2 (from pyculiarity)
  Downloading rpy2-2.9.1.tar.gz (192kB)
    100% |████████████████████████████████| 194kB 6.6MB/s 
    Complete output from command python setup.py egg_info:
    rpy2 is no longer supporting Python < 3. Consider using an older rpy2 release when using an older Python release.

This, combined with the pandas deprecation warnings (mentioned in another issue and pull request), means that unless pyculiarity can be bumped to python 3 pretty soon, it will, sadly, become unusable. It was a handy little library and we'll miss it.

Are there any plans to support Python 3, or should we be looking for a new tool to do this sort of thing?

Why expected_value contain negative number?

I use my data and the code is as follows
results = detect_ts(c,
max_anoms=0.2,longterm=False,
direction='neg', only_last='day',e_value=True)
My original data did not contain negative numbers,but some expected_value are negative.why?What is the formula for calculating expected_value?
10 1980-09-27 18:00:00 0 0.128816
11 1980-09-27 19:00:00 0 -2.732376
12 1980-09-27 20:00:00 0 -3.102446,

ValueError: Anom detection needs at least 2 periods worth of data

I was trying detect_ts with the following parameters;
max_anoms=0.1, direction='both', alpha=0.02, longterm=True

It seems to call detect_anoms with lesser data than required. Also my data is of 15 min resolution with
a daily seasonality (from Fourier Transform), this might have to do with the way you are assigning the period in your program.


ValueError Traceback (most recent call last)
in ()
6
7 results = detect_ts(pd.DataFrame({'time':full_data.index.values, 'values':full_data.metric_value.values}),
----> 8 max_anoms=0.1, direction='both', alpha=0.02, longterm=True)

c:\users\inder\desktop\sonalake\pyculiarity\pyculiarity\detect_ts.py in detect_ts(df, max_anoms, direction, alpha, only_last, threshold, e_value, longterm, piecewise_median_period_weeks, plot, y_log, xlabel, ylabel, title, verbose)
223 one_tail=anomaly_direction.one_tail,
224 upper_tail=anomaly_direction.upper_tail,
--> 225 verbose=verbose)
226
227 # store decomposed components in local variable and overwrite

c:\users\inder\desktop\sonalake\pyculiarity\pyculiarity\detect_anoms.py in detect_anoms(data, k, alpha, num_obs_per_period, use_decomp, one_tail, upper_tail, verbose)
39 # Check to make sure we have at least two periods worth of data for anomaly context
40 if num_obs < num_obs_per_period * 2:
---> 41 raise ValueError("Anom detection needs at least 2 periods worth of data")
42
43 # Check if our timestamps are posix

ValueError: Anom detection needs at least 2 periods worth of data

Error while setting e_value argument to true

When I set the arguments as follows:

results = detect_ts(data,
max_anoms=0.1,
direction='both', only_last=None, longterm=True, e_value=True)

I get the following error:

Issue occurs only when I am using e_value. I had removed the argument and the function works fine.


ValueError Traceback (most recent call last)
in
12 results = detect_ts(data,
13 max_anoms=0.1,
---> 14 direction='both', only_last=None, longterm=True, e_value=True)
15

C:\Program Files\Anaconda\lib\site-packages\pyculiarity\detect_ts.py in detect_ts(df, max_anoms, direction, alpha, only_last, threshold, e_value, longterm, piecewise_median_period_weeks, plot, y_log, xlabel, ylabel, title, verbose)
325 'anoms': all_anoms.value
326 }
--> 327 anoms = DataFrame(d, index=d['timestamp'].index)
328
329 return {

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.py in init(self, data, index, columns, dtype, copy)
390 dtype=dtype, copy=copy)
391 elif isinstance(data, dict):
--> 392 mgr = init_dict(data, index, columns, dtype=dtype)
393 elif isinstance(data, ma.MaskedArray):
394 import numpy.ma.mrecords as mrecords

C:\Program Files\Anaconda\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
210 arrays = [data[k] for k in keys]
211
--> 212 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
213
214

C:\Program Files\Anaconda\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
54
55 # don't force copy because getting jammed in an ndarray anyway
---> 56 arrays = _homogenize(arrays, index, dtype)
57
58 # from BlockManager perspective

C:\Program Files\Anaconda\lib\site-packages\pandas\core\internals\construction.py in _homogenize(data, index, dtype)
263 # Forces alignment. No need to copy data since we
264 # are putting it into an ndarray later
--> 265 val = val.reindex(index, copy=False)
266 else:
267 if isinstance(val, dict):

C:\Program Files\Anaconda\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
3736 @appender(generic.NDFrame.reindex.doc)
3737 def reindex(self, index=None, **kwargs):
-> 3738 return super(Series, self).reindex(index=index, **kwargs)
3739
3740 def drop(self, labels=None, axis=0, index=None, columns=None,

C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
4354 # perform the reindex on the axes
4355 return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356 fill_value, copy).finalize(self)
4357
4358 def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
4372 obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
4373 fill_value=fill_value,
-> 4374 copy=copy, allow_dups=False)
4375
4376 return obj

C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
4488 fill_value=fill_value,
4489 allow_dups=allow_dups,
-> 4490 copy=copy)
4491
4492 if copy and new_data is self._data:

C:\Program Files\Anaconda\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
1222 # some axes don't allow reindexing with dups
1223 if not allow_dups:
-> 1224 self.axes[axis]._can_reindex(indexer)
1225
1226 if axis >= self.ndim:

C:\Program Files\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
3085 # trying to reindex on an axis with duplicates
3086 if not self.is_unique and len(indexer):
-> 3087 raise ValueError("cannot reindex from a duplicate axis")
3088
3089 def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis

Is there any idea to improve the performance?

The latency on my machine with data around 1 week takes about 10 seconds, I found the STL function takes too much time. Is there any way to imporve the performance? Such as analyze the data incrementally.

date conversion issue on Mac

Hi I'm using your pyculiarlity anomaly detection code to run on one of my dataset and got
TypeError: unsupported operand type(s) for -: 'str' and 'str'
error.

I used this code to fix some time format issue for 'timestamp' column in my dataset:
datetime.datetime.strptime(time,'%m/%d/%y %H:%M').strftime('%y-%m-%d %H:%M')

The error trace shows that
Traceback (most recent call last):
File "", line 1, in
File "PAT.py", line 54, in PAT
results = detect_ts(df, alpha=0.001, max_anoms=0.02, direction='both')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyculiarity/detect_ts.py", line 142, in detect_ts
gran = get_gran(df)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyculiarity/date_utils.py", line 46, in get_gran
gran = int(round(np.timedelta64(largest - second_largest) / np.timedelta64(1, 's')))
TypeError: unsupported operand type(s) for -: 'str' and 'str'

I've tried to modify my line of time format conversion code into:
return np.datetime64(str(datetime.datetime.strptime(time,'%m/%d/%y %H:%M').strftime('%y-%m-%d %H:%M')))
so that the timestamp is always a datatime64 type, which supports subtraction (-).
but then the error says
Traceback (most recent call last):
File "", line 1, in
File "PAT.py", line 54, in PAT
results = detect_ts(df, alpha=0.001, max_anoms=0.02, direction='both')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyculiarity/detect_ts.py", line 84, in detect_ts
df = format_timestamp(df)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyculiarity/date_utils.py", line 22, in format_timestamp
column[0]):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
TypeError: buffer size mismatch

It seems to say that column[0] is somehow wrong here (correct me if it is wrong). Any suggestions to fix this?

Thanks so much!

TypeError: 'numpy.float64' object cannot be interpreted as an integer

I am trying to run the detect_ts function from pyculiarity package but getting this error on passing a two-dimensional dataframe.

data=pd.read_csv('C:\Users\nikhil.chauhan\Desktop\Bosch_Frame\dataset1.csv',usecols=['A','B'])
from pyculiarity import detect_ts
results = detect_ts(data,max_anoms=0.02,alpha=0.001,direction = 'both',only_last=None)
Traceback (most recent call last):
File "", line 1, in
TypeError: detect_ts() got an unexpected keyword argument 'only_last'
results = detect_ts(data,max_anoms=0.02,alpha=0.001,direction = 'both')
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\nikhil.chauhan\Downloads\Compressed\pyculiar-0.0.5\pyculiarity\detect_ts.py", line 177, in detect_ts
verbose=verbose)
File "C:\Users\nikhil.chauhan\Downloads\Compressed\pyculiar-0.0.5\pyculiarity\detect_anoms.py", line 69, in detect_anoms
decomp = stl(data.value, np=num_obs_per_period)
File "C:\Users\nikhil.chauhan\Downloads\Compressed\pyculiar-0.0.5\pyculiarity\stl.py", line 35, in stl
res = sm.tsa.seasonal_decompose(data.values, model='additive', freq=np)
File "C:\Anaconda3\lib\site-packages\statsmodels\tsa\seasonal.py", line 88, in seasonal_decompose
trend = convolution_filter(x, filt)
File "C:\Anaconda3\lib\site-packages\statsmodels\tsa\filters\filtertools.py", line 303, in convolution_filter
result = _pad_nans(result, trim_head, trim_tail)
File "C:\Anaconda3\lib\site-packages\statsmodels\tsa\filters\filtertools.py", line 28, in pad_nans
return np.r
[[np.nan] * head, x, [np.nan] * tail]
TypeError: 'numpy.float64' object cannot be interpreted as an integer

Updating the PyPI package?

Hi @nicolasmiller!

I was curious if the PyPI package could be updated to the latest version of the project? It seems at the moment the package hasn't been updated since July 23, 2015.

This will make it easier to just do a pip install pyculiarity instead of having to install from github.

FutureWarning for pandas.lib deprecation

On pyculiarity import, pandas displays the following warning:

detect_vec.py:6: FutureWarning: The pandas.lib module is deprecated and will be removed in a future version. These are private functions and can be accessed from pandas._libs.lib instead from pandas.lib import Timestamp

iget fails when longterm is set to true and granularity is in hr or day

File "example.py", line 18, in
direction='both', verbose=True, granularity='hr')
File "C:\ProgramData\Anaconda3\lib\site-packages\pyculiarity\detect_ts.py", line 138, in detect_ts
last_date = df.timestamp.iget(-1)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2970, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'iget'

Library is not working as expected

Hello, I'm trying to test this library to a simple sinusoidal signal with some anomalies, but it's not working as I expected.

This is the sinusoidal:
image

And this is the script:

import matplotlib.pyplot as plt
import numpy as np
from pyculiarity import detect_ts
import pandas as pd
import random
from datetime import datetime, timedelta

def datetime_range(start, end, delta):
    current = start
    while current < end:
        yield current
        current += delta

# Creating the base signal
Fs = 8000
f = 5
sample = 8000
now = datetime.now()
dts = [now + timedelta(hours=index) for index in range(sample)]
x = np.arange(sample)
y = np.sin(2 * np.pi * f * x / Fs)

# Now lets add some anomalies
for x in range(7200, 7270):
    y[x] = random.random()

# We call the library for detecting the anoms
data = pd.DataFrame({'dates':dts, 'values':y})
results = detect_ts(data,
                    max_anoms=0.1,
                    alpha=0.1,
                    direction='both')
print(results)

plt.plot(dts, y)
plt.xlabel('date')
plt.ylabel('voltage(V)')
plt.show()

Do you know why it's not working, @nicolasmiller ?

No module named 'detect_vec'

Hi, forgive my naivety but I'm getting an import error after I installed the module and imported it into a test project file. It seems to originate from the init file.

import error

  6 from past.builtins import basestring
  7 from pandas import DataFrame, to_datetime

----> 8 from pandas.lib import Timestamp
9 import numpy as np
10

ImportError: cannot import name Timestamp

one_tail,up_tail

detect_anoms.py is inconsistent with detect_ts.py
Abviously ,use one parameter direction is enough,why do you use two parameters?

AttributeError: module 'pandas' has no attribute 'Int64Index'

python 3.11
pandas 2.2.2

when run demo, then error:

Traceback (most recent call last):
File "D:\workspaces\python_pros\timeSeries1\src\edm\breakoutDetect1.py", line 12, in
results = detect_ts(df, max_anoms=0.007, direction='both')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\configs\pythonenv\Lib\site-packages\pyculiarity\detect_ts.py", line 220, in detect_ts
s_h_esd_timestamps = detect_anoms(all_data[i], k=max_anoms, alpha=alpha,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\configs\pythonenv\Lib\site-packages\pyculiarity\detect_anoms.py", line 58, in detect_anoms
if not isinstance(data.index, ps.Int64Index):
^^^^^^^^^^^^^
AttributeError: module 'pandas' has no attribute 'Int64Index'

rpy2

Hi!

Thank for creating this python port. I noticed that the readme file states that this package depends on rpy2, but it appears to me that it no longer does, since you are now using the rstl port of R's stl function. Or is there another place where rpy2 is needed that I didn't notice?

Thanks again!

The output of detect_ts function

I try to run the example.

from pyculiarity import detect_ts
import pandas as pd
twitter_example_data = pd.read_csv('raw_data.csv',
                                    usecols=['timestamp', 'count'])
results = detect_ts(twitter_example_data,
                    max_anoms=0.02,
                    only_last='day')
print(results)

When I run python script as python test.py, the output of the result is unable to parse as JSON and it is quite unreasonable.

{'anoms':                         anoms           timestamp
timestamp                                        
1980-10-05 01:12:00   56.4691 1980-10-05 01:12:00
1980-10-05 01:13:00   54.9415 1980-10-05 01:13:00
1980-10-05 01:14:00   52.0359 1980-10-05 01:14:00
1980-10-05 01:15:00   47.7313 1980-10-05 01:15:00
1980-10-05 01:16:00   50.5876 1980-10-05 01:16:00
1980-10-05 01:17:00   48.2846 1980-10-05 01:17:00
1980-10-05 01:18:00   44.6438 1980-10-05 01:18:00
1980-10-05 01:19:00   42.3077 1980-10-05 01:19:00
1980-10-05 01:20:00   38.8363 1980-10-05 01:20:00
1980-10-05 01:21:00   41.0145 1980-10-05 01:21:00
1980-10-05 01:22:00   39.5523 1980-10-05 01:22:00
1980-10-05 01:23:00   38.9117 1980-10-05 01:23:00
1980-10-05 01:24:00   37.3052 1980-10-05 01:24:00
1980-10-05 01:25:00   36.1725 1980-10-05 01:25:00
1980-10-05 01:26:00   37.5150 1980-10-05 01:26:00
1980-10-05 01:27:00   38.1387 1980-10-05 01:27:00
1980-10-05 01:28:00   39.5351 1980-10-05 01:28:00
1980-10-05 01:29:00   38.1834 1980-10-05 01:29:00
1980-10-05 01:30:00   37.5988 1980-10-05 01:30:00
1980-10-05 01:31:00   43.6522 1980-10-05 01:31:00
1980-10-05 01:32:00   47.9571 1980-10-05 01:32:00
1980-10-05 13:08:00  210.0000 1980-10-05 13:08:00
1980-10-05 13:18:00   40.0000 1980-10-05 13:18:00
1980-10-05 13:28:00  250.0000 1980-10-05 13:28:00
1980-10-05 13:38:00   40.0000 1980-10-05 13:38:00, 'plot': None}

What am I missing?

Remaining Python3 test failure, next PyPI release

@michael-erasmus

Hey Michael (and Eric if you're still around, it looks like your account is gone)

I've merged and things are looking good under Python2, but I'm still seeing a single test failure under 3. I'd appreciate it if you guys could try to reproduce, debug.

I'm on an up to date Ubuntu box testing against the system Pythons which are 2.7.15rc1 and 3.6.5. I've nuked all the dependencies and started from scratch via setup.py under both, so I should have all the latest PyPi versions in each case. I'm running the tests under 2 with:

nosetests .

and under 3 with

nosetests3 .

The following test is failing for me in the second case:

FAIL: test_both_directions_e_value_threshold_med_max (test_vec.TestVec)
    eq_(len(results['anoms'].iloc[:,1]), 6)
AssertionError: 48 != 6

Obviously it would be nicer to have a more isolated and repeatable way to deal with the tests and dependencies across different machines. I'm open to suggestions if you guys have ideas there.

more datetime conversion issues

In date_util.py you have:

def format_timestamp(indf, index=0):
    if indf.dtypes[0].type is np.datetime64:
        return indf

Unfortunately my column is type:
datetime64[ns, UTC]

So it doesn't match and we fall through to the regexes which choke on the data type.

The problem, I think, is that I have timezone aware datetime types. Simply converting the timestamp column to a string did not work.

So, I stripped the timezones off before converting it to a string and that seemed to do the trick:

    df['mytimecolumn'] = pandas.to_datetime(df['mytimecolumn'])
    df['mytimecolumn'] = df['mytimecolumn'].dt.strftime('%Y-%m-%d %H:%M:%S')

Possible date conversion error?

Hello,

I am trying to run the example that you have posted on the main page, ie:


from pyculiarity import detect_ts
import pandas as pd
twitter_example_data = pd.read_csv('raw_data.csv',
usecols=['timestamp', 'count'])
results = detect_ts(twitter_example_data,
max_anoms=0.02,
direction='both', only_last='day')


but unfortunately, i get an error (please see below):


Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/user1/Current projects//test/test_pyculiarity_0.py", line 7, in
direction='both', only_last='day')
File "build\bdist.win32\egg\pyculiarity\detect_ts.py", line 142, in detect_ts
gran = get_gran(df)
File "build\bdist.win32\egg\pyculiarity\date_utils.py", line 46, in get_gran
gran = int(round((largest - second_largest) / np.timedelta64(1, 's')))
TypeError: ufunc divide cannot use operands with types dtype('O') and dtype('<m8[s]')


i have traced the issue best that i could, and it appears that there is an error with the "largest - second_largest" step:


14395 1980-10-05 13:56:00
14396 1980-10-05 13:57:00
14397 1980-10-05 13:58:00
Name: timestamp, Length: 14398, dtype: object
nlargest(2, col): ['1980-10-05 13:58:00', '1980-10-05 13:57:00']
largest: 1980-10-05 13:58:00
second_largest: 1980-10-05 13:57:00
(largest - second_largest):
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/user1/Current projects//test/test_pyculiarity_0.py", line 17, in
print "(largest - second_largest): ", (largest - second_largest)
TypeError: unsupported operand type(s) for -: 'str' and 'str'


when i ran the "nosetests ." command, all 13 tests failed with the same TyperError.

any suggestions? Please advise, and i appreciate your time in advance. Thank you.

PS: sorry if my formatting, etc. is odd and not standard. this is my first issue posting. i can edit any clarifications you need.

plot from the result set of detect_ts

Hi ,
I just wanted to know if there is inbuilt function for plotting the anoms in time series data like it is in R anomaly detection package. Or we just can get anoms from it and we have to visuliaze it ourselves ?

Thanks for your time,
Shreyak

Python 3.5

Command "python setup.py egg_info" failed with error code 1 in C:\Users\Archit\AppData\Local\Temp\pip-build-_8fu4cqd\rpy2\

I am using Python 3.5, on pip install pyculiarity I am getting above error.

detect_ts didn't return plot at all

and the method in detect_ts has many unused parameters.
It's really irresponsible behavior.
You are not worthful for so much fame to this library.

    return {
        'anoms': anoms,
        'plot': None
    }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.