jarnorfb / epysurv Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 4.0 416 KB

Epidemiological surveillance in Python

License: MIT License

Python 66.54% R 0.11% Shell 0.03% Jupyter Notebook 33.32%

epidemiology outbreak-detection surveillance

epysurv's People

Contributors

Stargazers

Watchers

Forkers

aauss fagan2888 burgersmoke tibaredha

epysurv's Issues

FarringtonFlexible with weekly data results in RRuntimeError

I've found that when using daily case count data, I can use FarringtonFlexible without problem but with weekly or multi-week data, I am getting a runtime error. I've tried all sorts of ways of working around this including passing in glmWarnings as False to try to avoid since this it looks more like a warning() than an actual failure.

I will keep trying to debug this on my own, but I'm not a great R programmer and it's hard to tell if this is an issue in the surveillance package or how I am constructing data or something else.

So while I will keep trying to troubleshoot this, a few questions if there are any thoughts:

Would it help if I could try to reproduce this in R?
Can you think of any other workarounds?
Specifically, do you know if there is any way in rpy2 or any other way of ignoring an issue like this since it looks more like a warning() than a fatal Exception?

RRuntimeError (a fuller callstack is below):

E       rpy2.rinterface.RRuntimeError: Error in algo.farrington.data.glm(dayToConsider = dayToConsider, b = control$b,  : 
E         Some reference values did not exist (index<1).

Code to reproduce (as a pytest)

import pandas as pd
import pytest

from epysurv.models.timepoint import FarringtonFlexible

def test_farrington_weekly_example():

    model = FarringtonFlexible()

    total_periods = 100
    test_size = 20
    case_count = 10

    # set up some weekly data
    dates = pd.date_range('2017-07-09', periods=total_periods, freq='7D')
    # just make this constant (but this also fails with random or real case count values)
    case_counts = [case_count] * total_periods

    df = pd.DataFrame({'n_cases': case_counts}, index = dates)

    train_data = df[:-1 * test_size]
    test_data = df[-1 * test_size:]

    # make sure we can fit and predict
    model.fit(train_data)
    _ = model.predict(test_data)

Fuller callstack:

def test_farrington_weekly_example():
    
        model = FarringtonFlexible()
    
        total_periods = 100
        test_size = 20
        case_count = 10
    
        # set up some weekly data
        dates = pd.date_range('2017-07-09', periods=total_periods, freq='7D')
        # just make this constant (but this also fails with random or real case count values)
        case_counts = [case_count] * total_periods
    
        df = pd.DataFrame({'n_cases': case_counts}, index = dates)
    
        train_data = df[:-1 * test_size]
        test_data = df[-1 * test_size:]
    
        # make sure we can fit and predict
        model.fit(train_data)
>       _ = model.predict(test_data)

test_farrington_specific_example.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\epysurv\models\timepoint\_base.py:131: in predict
    surveillance_result = self._call_surveillance_algo(r_instance, detection_range)
..\epysurv\models\timepoint\farrington.py:166: in _call_surveillance_algo
    surv = surveillance.farringtonFlexible(sts, control=control)
C:\anaconda3\envs\epysurv-dev\lib\site-packages\rpy2\robjects\functions.py:178: in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = R object with classes: ('function',) mapped to:
<DocumentedSTFunction - Python:0x000001C714FD1608 / R:0x000001C70F26E4A8>
args = (R object with classes: ('sts',) mapped to:
<RS4 - Python:0x000001C7151F5548 / R:0x000001C70FF061F0>,)
kwargs = {'control': R object with classes: ('list',) mapped to:
<ListVector - Python:0x000001C7152E92C8 / R:0x000001C71400440...ect with classes: ('character',) mapped to:
<StrVector - Python:0x000001C715468C88 / R:0x000001C7149A7E00>
['delta']}
new_args = [R object with classes: ('sts',) mapped to:
<RS4 - Python:0x000001C7151F5548 / R:0x000001C70FF061F0>]
new_kwargs = {'control': R object with classes: ('list',) mapped to:
<ListVector - Python:0x000001C7152E92C8 / R:0x000001C71400440...ect with classes: ('character',) mapped to:
<StrVector - Python:0x000001C715471C88 / R:0x000001C7149A7E00>
['delta']}
k = 'control'
v = R object with classes: ('list',) mapped to:
<ListVector - Python:0x000001C7152E92C8 / R:0x000001C714004408>
[IntVect...ject with classes: ('character',) mapped to:
<StrVector - Python:0x000001C715478F48 / R:0x000001C7149A7E00>
['delta']

    def __call__(self, *args, **kwargs):
        new_args = [conversion.py2ri(a) for a in args]
        new_kwargs = {}
        for k, v in kwargs.items():
            new_kwargs[k] = conversion.py2ri(v)
>       res = super(Function, self).__call__(*new_args, **new_kwargs)
E       rpy2.rinterface.RRuntimeError: Error in algo.farrington.data.glm(dayToConsider = dayToConsider, b = control$b,  : 
E         Some reference values did not exist (index<1).

Conda-forge install: cannot import RRuntimeError when using rpy2 3.x

After I performed a
conda install -c conda-forge epysurv

I then attempted to run and my environment was unable to import from this line:

from rpy2.rinterface import RRuntimeError

I checked which rpy2 had been installed and I had 3.4.5 installed.

In rpy2 3.x the new import is this which did work for me:

from rpy2.rinterface_lib.embedded import RRuntimeError

I see a couple of ways to fix this:

Update the conda-forge dependency requirements
Check the version number of rpy2 and use the correct import between the two above

I'm not sure if the second option above would work since I wonder if there are dependencies which truly do expect 2.x of rpy2.

Fix broken conda feedstock pipeline

Feedstock (https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=350396&view=logs&jobId=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642) breaks on pickle.load:

epysurv/tests/conftest.py

Line 46 in d031511

cases = pickle.load(handle)

As per https://stackoverflow.com/a/68342039/6256888, the recommended workaround is to replace pickle.load with pandas.read_pickle for pandas v1.3.0.

conda.exceptions.UnsatisfiableError: The following specifications were found to be in conflict:
  - python=3.7 -> vc[version='>=14,<15.0a0']
  - vc=9
Use "conda search <package> --info" to see the dependencies for each package.
During handling of the above exception, another exception occurred:
`conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform win-64: {"vc[version='>=14,<15.0a0']", 'vc=9'}`

Since I will work more with simulations in the coming weeks, I thought about including the algorithms in here. For example I am currently copying the simulation used for the "An improved algorithm for outbreak detection in multiple surveillance systems". Should I try writing more simulations in a similar manner as the other algorithms and push them to epysurv?

csv files are not being packaged

Remove provenance column from prediction frame

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.