Giter Club home page Giter Club logo

Comments (9)

sethaxen avatar sethaxen commented on June 19, 2024 1

It seems what the docstring for invlogcdf fails to do is be explicit about what it means by "inverse". Specifically, what's missing is what invlogcdf(d, logcdf(d, x)) should return when x is not in the support of the distribution or for continuous distributions is the endpoint of an open interval that is not in the support of the distribution. In these cases logcdf is not strictly monotonic (it has horizontal line segments), but one can still define a notion of inverse as the minimum point in any such line segment.

I think Distributions uses the definition invlogcdf: logp → infimum{x ∈ support(d) : logcdf(d, x) ≥ logp}. As noted in https://en.wikipedia.org/wiki/Quantile_function, this is an "almost sure left inverse" of logcdf, i.e. the set of all points where invlogcdf(d, logcdf(d, x)) != x has a total probability mass of 0.

I suggest we can resolve this by adding a note in the docstring that invlogcdf is the unique function that satisfies invlogcdf(d, logp) ≤ x iff logp ≤ logcdf(d, x) for all real x, although to me the description in terms of the infimum is more intuitive. quantile would need a similar note.

from distributions.jl.

sethaxen avatar sethaxen commented on June 19, 2024 1

I proposed changes to the docs in #1814

from distributions.jl.

andreasnoack avatar andreasnoack commented on June 19, 2024

Please try to outline the behavior that you are proposing? Are you proposing that the inv methods error out for all values where the cdf isn't one-to-one? Avoiding exceptions is often convenient so I'm not sure the current behavior is undesirable. We could expand the documentation to explain the behavior at the boundary of the support.

from distributions.jl.

jaksle avatar jaksle commented on June 19, 2024

I have problems understanding this example and the problem in general. CDF of $\mathcal U(0,1)$ is straight line $x\mapsto x$. So logcdf is $x\mapsto \ln(x)$. Point $x=-5$ is outside of $\mathcal U(0,1)$ support; thus assuming logcdf(-5) == -Inf is the most reasonable choice. Note it has nothing to do with invlogcdf which is just $y\mapsto \exp(y)$ both analytically and numerically.

from distributions.jl.

aplavin avatar aplavin commented on June 19, 2024

Note it has nothing to do with invlogcdf

It's the other way around: cdf and logcdf are totally correct, no issues there. The problem is with invlogcdf, as it claims to be an inverse, but actually isn't.

julia> d = Uniform(0, 1)

julia> f(x) = logcdf(d, x)

julia> invf(x) = invlogcdf(d, x)

# isn't an inverse in either of the directions:
julia> f(invf(5))
0.0

julia> invf(f(5))
1.0

Please try to outline the behavior that you are proposing

I'm not proposing any specific solution, just pointing out an issue with these functions that are called "inverse" but actually aren't. Leaving the decision on what the fix should be to Distributions.jl devs.

from distributions.jl.

jaksle avatar jaksle commented on June 19, 2024

The function invlogcdf(Uniform(0,1),y) is just $y\mapsto \exp(y)$ function. You can check what is called when it is used and indeed, exp(x) is called. It is defined for any $y\in \mathbb R$.

Value $x = 5$ is also outside support of $\mathcal U(0,1)$. CDF function is not a bijection there; its inverse (quantile function) is mathematically defined as a "generalized inverse". This is standard. The Julia behaviour is consistent with this mathematical definition.

EDIT. Maybe to rephrase it differently: quantile function acts on probabilities. Asking what corresponds to probability -500% or 500% does not have sense, but numerically of course can happen. But then, returning the lowest or highest values which the distribution can generate, i.e. $x = 0$ or $x =1$ is arguably the best choice and this is what happens.

from distributions.jl.

jaksle avatar jaksle commented on June 19, 2024

@sethaxen I checked before. Distributions.jl is not consistent with its behaviour. E.g. quantile for TriangularDist returns error for $x > 1$, whereas quantile for Uniform returns $x$ for $x>1$. Uniform is not internally consistent here, as logcdf returns 0 for $x>1$ not $\ln(x)$. However, as I mentioned above, this leads to a reasonable behaviour of inverse applied to the original in this case.

As written in the Wikipedia article that you mention, quantile function is defined for $0\le x \le 1$ and is a generalized inverse of CDF there. There is no standard agreement what to do for larger domain.

from distributions.jl.

sethaxen avatar sethaxen commented on June 19, 2024

I don't see any good reason to extend the support of quantile to beyond [0, 1]. The whole point of the function is to provide in some sense an inverse to cdf, and cdf cannot produce outputs outside of that range.

quantile for Uniform returns x for x>1.

One could definte quantile in such a way that this is the right behavior, but I think that would be a very strange way to define it. cdf's outputs are probabilities, and accepting a negative probability as input to quantile is very strange.

from distributions.jl.

jaksle avatar jaksle commented on June 19, 2024

It is speculation, but the reason for $x\mapsto x$ everywhere was probably just the simplicity of the code and speed. Documentation of quantile mentions it is defined on $0 \le x \le 1$ so I guess the philosophy of Distributions.jl is to leave it unspecified and the same goes for invlogcdf which is internally just quantile(d,exp(y)). I doubt it will be changed, though adding that invlogcdf has domain $x\le 0$ seems like a good idea.

from distributions.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.