This might be an issue in TriScale, or me misunderstanding a use-case. <p dir="aut

I looked and played a bit with the two-sided option. I modified <code class="notransla

KPI calculated even if too little data supplied about triscale HOT 4 OPEN

arurke commented on July 22, 2024

KPI calculated even if too little data supplied

from triscale.

Comments (4)

romain-jacob commented on July 22, 2024

Sorry for the delay! I have been quite busy recently.

TL;DR: Everything works as expected AFAIU

The number of data points you need depends on the bound (upper/lower) that you pick, for a given percentile (except for the median, of course). To understand why let's keep your example. P=99, C=95.

When we compute the "upper" CI for that percentile, what we are actually doing is checking whether we have one data point that has at least 95% probability to be larger than the 99th percentile. This requires more than 5 samples, so the method returns NaN.
If one computes the "lower" CI for the same percentile, we check whether we have one data point that has at least 95% probability to be smaller than the 99th percentile. And that's easy because most samples (99% of them) are expected to be smaller than the 99th percentile. So one needs very few samples.

Side-note
If you are interested in the variability of a given KPI, you might want to look at the two-sided option. In short, it spares you the calling of the method twice (plus, you are sure that you have 95% confidence of the percentile to be between the two bounds returned).

from triscale.

arurke commented on July 22, 2024

Thanks a lot for a very detailed and enlightening explanation. It makes very much sense. I tunnel-visioned, assuming they had the same requirements. Regarding the side-note: You mean in analysis_kpi()? It forces one-sided as per master now. But I do see there seems to be support for it in ThompsonCI() - is this ready to be utilized?

Sorry for the delay! I have been quite busy recently.

No need to apologize, I am grateful for you taking the time!

from triscale.

romain-jacob commented on July 22, 2024

Ah yes, you're right. You'll need to go back to the ThompsonCI() function to get access to the two-sided option (or you just overwrite the TriScale function to allow that option).

The two-sided option is reliable. JSYK, I've opened a PR ages ago to include this ThompsonCI() function into scipy but never got around to finish it... which is a shame but you know... life. :-/

from triscale.

arurke commented on July 22, 2024

I looked and played a bit with the two-sided option. I modified analysis_kpi() to basically call ThompsonCI() directly, and give me the lower- and upper-bound it calculates. I then call it with 1000 data-points, and varying the class and percentile, example:

data = np.random.randint(1,10,size=(1000))
settings = {"bound": "lower", "percentile": 90,
            "confidence": 95, "bounds": [min(data), max(data)],
            "class":"two-sided"}

The lower- and upper-bounds I get is as follows:

"one-sided":
- 90p: 883 - 915. # With 95c, the true 90p is between index 883 and 915.
- 10p: 84 - 116. # With 95c, the true 10p is between index 84 and 116.
"two-sided":
- 90p: 84 - 915. # With 95c, 90 % of the data is between index 84 and 915
- 10p: 84 - 915. # With 95c, 10 % of the data is between index 84 and 915?!

I am struggling to combine my understanding of CIs and "bounds", the terms in Triscale, and the data I am seeing. I was conflicted, so I added some statements behinds the bounds - I was hoping I could ask you to comment, clarify, confirm?

from triscale.

KPI calculated even if too little data supplied about triscale HOT 4 OPEN

Comments (4)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent