romain-jacob / triscale Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 2.0 8.48 MB

TriScale software

License: GNU General Public License v3.0

Python 0.29% Jupyter Notebook 99.71%

data-analysis experimental-design networking replicability

triscale's People

Contributors

Stargazers

Watchers

Forkers

arurke sekinat95

triscale's Issues

KPI calculated even if too little data supplied

This might be an issue in TriScale, or me misunderstanding a use-case.

TL;DR: analysis_kpi() returns a valid value when too few data-points supplied - if the "unintuitive" bound is selected (upper for percentile < 50, and vice versa).

Background: The intuitive way to calculate a KPI is to specify a bound which gives us the "worst case" (upper when percentile > 50, and vice versa). This allows us to make the "performance is at least X"-statements. However, I was thinking there was information in the other bound as well. This would show the width of the CI, and we could learn if the given metric varies a lot between runs. The first example coming to mind is industrial scenarios, where not only the maximum latency is interesting, but also its variability.

With this background I was routinely calling analysis_kpi() twice, once with bound set to upper and another with lower. Doing this I noticed I would be getting a valid value when the "unintuitive" bound was selected (upper for percentile < 50, and vice versa), even if I had too little data.

Example with too little data:

import triscale as triscale
import numpy as np

data = np.random.randint(0, 10, size=(5))

settings = {"bound": "lower", "percentile": 99,
            "confidence": 95, "bounds": [min(data), max(data)]}

independent, kpi = triscale.analysis_kpi(
                    data,
                    settings,
                    verbose=False)
print("KPI: " + str(kpi))

With bound set to "upper", the KPI correctly returns NaN. With bound set to "lower", a number is returned.

Division by zero in convergence test if yMax == yMin,

In analysis_metric(), if all elements in "y-axis" data are the same (i.e. min and max are identical), it leads to a division by zero in the convergence test, see https://github.com/romain-jacob/triscale/blob/master/helpers.py#L66 and two lines above.

The issue can easily be reproduced:

x = np.arange(0, 100)
y_same_value = np.full(len(x), 100)
df = pd.DataFrame(
    {'x': x,
     'y': y_same_value})

triscale.analysis_metric( 
    df,
    metric = {'measure': 50},
    convergence = {'expected': True})

>  triscale/helpers.py:66: RuntimeWarning: invalid value encountered in true_divide

An intuitive solution is to simply state that the data is converged in such cases with identical elements, but perhaps I am missing something about the statistics so I'll leave the PR to someone else 😅

romain-jacob / triscale Goto Github PK

triscale's People

Contributors

Stargazers

Watchers

Forkers

triscale's Issues

KPI calculated even if too little data supplied

Division by zero in convergence test if yMax == yMin,

Both `bounds` and `bound` in KPI definition

Default tolerance value not matching comments

fix the trace ordering in the CC use case

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent