Giter Club home page Giter Club logo

Comments (4)

romain-jacob avatar romain-jacob commented on June 19, 2024

Sorry for the delay! I have been quite busy recently.


TL;DR: Everything works as expected AFAIU

The number of data points you need depends on the bound (upper/lower) that you pick, for a given percentile (except for the median, of course). To understand why let's keep your example. P=99, C=95.

  • When we compute the "upper" CI for that percentile, what we are actually doing is checking whether we have one data point that has at least 95% probability to be larger than the 99th percentile. This requires more than 5 samples, so the method returns NaN.
  • If one computes the "lower" CI for the same percentile, we check whether we have one data point that has at least 95% probability to be smaller than the 99th percentile. And that's easy because most samples (99% of them) are expected to be smaller than the 99th percentile. So one needs very few samples.

Side-note
If you are interested in the variability of a given KPI, you might want to look at the two-sided option. In short, it spares you the calling of the method twice (plus, you are sure that you have 95% confidence of the percentile to be between the two bounds returned).

from triscale.

arurke avatar arurke commented on June 19, 2024

Thanks a lot for a very detailed and enlightening explanation. It makes very much sense. I tunnel-visioned, assuming they had the same requirements. Regarding the side-note: You mean in analysis_kpi()? It forces one-sided as per master now. But I do see there seems to be support for it in ThompsonCI() - is this ready to be utilized?

Sorry for the delay! I have been quite busy recently.

No need to apologize, I am grateful for you taking the time!

from triscale.

romain-jacob avatar romain-jacob commented on June 19, 2024

Ah yes, you're right. You'll need to go back to the ThompsonCI() function to get access to the two-sided option (or you just overwrite the TriScale function to allow that option).

The two-sided option is reliable. JSYK, I've opened a PR ages ago to include this ThompsonCI() function into scipy but never got around to finish it... which is a shame but you know... life. :-/

from triscale.

arurke avatar arurke commented on June 19, 2024

I looked and played a bit with the two-sided option. I modified analysis_kpi() to basically call ThompsonCI() directly, and give me the lower- and upper-bound it calculates. I then call it with 1000 data-points, and varying the class and percentile, example:

data = np.random.randint(1,10,size=(1000))
settings = {"bound": "lower", "percentile": 90,
            "confidence": 95, "bounds": [min(data), max(data)],
            "class":"two-sided"}

The lower- and upper-bounds I get is as follows:

  • "one-sided":
    • 90p: 883 - 915. # With 95c, the true 90p is between index 883 and 915.
    • 10p: 84 - 116. # With 95c, the true 10p is between index 84 and 116.
  • "two-sided":
    • 90p: 84 - 915. # With 95c, 90 % of the data is between index 84 and 915
    • 10p: 84 - 915. # With 95c, 10 % of the data is between index 84 and 915?!

I am struggling to combine my understanding of CIs and "bounds", the terms in Triscale, and the data I am seeing. I was conflicted, so I added some statements behinds the bounds - I was hoping I could ask you to comment, clarify, confirm?

from triscale.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.