Comments (4)
Sorry for the delay! I have been quite busy recently.
TL;DR: Everything works as expected AFAIU
The number of data points you need depends on the bound (upper/lower) that you pick, for a given percentile (except for the median, of course). To understand why let's keep your example. P=99, C=95.
- When we compute the "upper" CI for that percentile, what we are actually doing is checking whether we have one data point that has at least 95% probability to be larger than the 99th percentile. This requires more than 5 samples, so the method returns NaN.
- If one computes the "lower" CI for the same percentile, we check whether we have one data point that has at least 95% probability to be smaller than the 99th percentile. And that's easy because most samples (99% of them) are expected to be smaller than the 99th percentile. So one needs very few samples.
Side-note
If you are interested in the variability of a given KPI, you might want to look at thetwo-sided
option. In short, it spares you the calling of the method twice (plus, you are sure that you have 95% confidence of the percentile to be between the two bounds returned).
from triscale.
Thanks a lot for a very detailed and enlightening explanation. It makes very much sense. I tunnel-visioned, assuming they had the same requirements. Regarding the side-note: You mean in analysis_kpi()
? It forces one-sided as per master now. But I do see there seems to be support for it in ThompsonCI()
- is this ready to be utilized?
Sorry for the delay! I have been quite busy recently.
No need to apologize, I am grateful for you taking the time!
from triscale.
Ah yes, you're right. You'll need to go back to the ThompsonCI()
function to get access to the two-sided option (or you just overwrite the TriScale function to allow that option).
The two-sided option is reliable. JSYK, I've opened a PR ages ago to include this ThompsonCI()
function into scipy but never got around to finish it... which is a shame but you know... life. :-/
from triscale.
I looked and played a bit with the two-sided option. I modified analysis_kpi()
to basically call ThompsonCI()
directly, and give me the lower- and upper-bound it calculates. I then call it with 1000 data-points, and varying the class and percentile, example:
data = np.random.randint(1,10,size=(1000))
settings = {"bound": "lower", "percentile": 90,
"confidence": 95, "bounds": [min(data), max(data)],
"class":"two-sided"}
The lower- and upper-bounds I get is as follows:
- "one-sided":
- 90p: 883 - 915. # With 95c, the true 90p is between index 883 and 915.
- 10p: 84 - 116. # With 95c, the true 10p is between index 84 and 116.
- "two-sided":
- 90p: 84 - 915. # With 95c, 90 % of the data is between index 84 and 915
- 10p: 84 - 915. # With 95c, 10 % of the data is between index 84 and 915?!
I am struggling to combine my understanding of CIs and "bounds", the terms in Triscale, and the data I am seeing. I was conflicted, so I added some statements behinds the bounds - I was hoping I could ask you to comment, clarify, confirm?
from triscale.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from triscale.