Giter Club home page Giter Club logo

ggdensity's People

Contributors

dkahle avatar jamesotto852 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ggdensity's Issues

Change `draw_key_polygon()` to `draw_key_rect()` in `geom_hdr_rug()`

Currently, geom_hdr_rug() uses draw_key_polygon() to create glyphs for its guides. It should probably use draw_key_rect() instead, for several reasons:

  • The resulting guides have no spacing between the glyphs
  • draw_key_rect() needs fewer aesthetics to draw the glyphs, only looking at fill

Because draw_key_rect() only looks at the fill aesthetic, that helps us avoid exposing/requiring aesthetics like color, linetype, and linewidth. While these aesthetics can be useful when using grid::rectGrob(), I can't imagine a scenario in which they would be specified with geom_hdr_rug(). Instead, we should set these in GeomHdrRug, changing

grid::gpar(
    col = alpha(data_y$colour, data_y$alpha),
    fill = alpha(data_y$fill, data_y$alpha),
    lty = data_y$linetype,
    lwd = data_y$size * .pt
)

to

grid::gpar(
    col = alpha(data_x$fill, data_x$alpha),
    fill = alpha(data_x$fill, data_x$alpha),
    lwd = 0
)

Scale fail?

Any ideas what's going on here? Seems like the scale of the numbers matters.

library("tidyverse"); theme_set(theme_minimal())
  library("ggdensity")
  
  N <- 1e3
  df <- tibble(x = rnorm(N, 0, 200), y = rnorm(N, 0, 200)) 
    
  ggplot(df, aes(x, y)) + geom_point()

  ggplot(df, aes(x, y)) + geom_hdr()

  
  df <- tibble(x = rnorm(N, 0, 1), y = rnorm(N, 0, 1)) 
  
  ggplot(df, aes(x, y)) + geom_point()

  ggplot(df, aes(x, y)) + geom_hdr()

Created on 2021-09-25 by the reprex package (v2.0.1)

Move `install.pacakges("ggdensity")` to a code block

This is a small detail, but in the readme the instructions to use install.packages("ggdensity") is inline so it's hard to find it when skimming the file. Since this would be the main way of installing it, I think it should be on its own code block, which is the usual convention in R package

`geom_hdr_fun()` not producing correct HDRs with axes transformations

geom_hdr_fun() is not drawing HDRs correctly when there are transformations of the coordinate axes. I believe this is because of the Riemann approximation in approx_prob(). If this is the case, we should be able to fix this by accounting for the Jacobian of the transformation of the axes -- possibly by computing the area of the transformed rectangles partitioning the plotting range for the Riemann sum?

library("ggdensity")
#> Loading required package: ggplot2

set.seed(1)
df <- data.frame(
  x = rexp(1000, 1),
  y = rnorm(1000)
)

ggplot(df, aes(x, y)) +
  geom_hdr_fun(fun = \(x, y) dexp(x, 1) * dnorm(y)) +
  geom_point()

ggplot(df, aes(x, y)) +
  geom_hdr_fun(fun = \(x, y) dexp(x, 1) * dnorm(y)) +
  geom_point() +
  scale_x_continuous(trans = "log10")

Created on 2022-01-18 by the reprex package (v2.0.1)

`uniroot` failing when `method = "histogram"`

For particular datasets, method = "histogram" results in an error (most likely from uniroot). This does not happen for any other method choices.

library("ggdensity"); theme_set(theme_bw())
#> Loading required package: ggplot2
library("patchwork")

set.seed(1) 
df <- data.frame(
  x = rexp(1000, 1),
  y = rexp(1000, 1)
)


p1 <- ggplot(df, aes(x^2, y^2)) + 
  geom_hdr(method = "histogram", smooth = TRUE)

p2 <- ggplot(df, aes(x^2, y^2)) + 
  geom_hdr(method = "kde")

p3 <- ggplot(df, aes(x^2, y^2)) + 
  geom_hdr(method = "mvnorm")

p4 <- ggplot(df, aes(x^2, y^2)) + 
  geom_hdr(method = "freqpoly")

p1 
#> Warning: Computation failed in `stat_hdr()`:
#> f() values at end points not of opposite sign

p2 + p3 + p4 & coord_fixed() & theme(legend.position = "none")

Created on 2021-11-18 by the reprex package (v2.0.1)

Wrong legend when using probs in ascending order

Hi @jamesotto852, thanks for this wonderful package!

I just realized that the legend is wrong (labels are in reversed order) when passing custom probs. Simple example:

library(ggplot2)

## correct order with default probs
ggplot(mpg, aes(x = hwy, y = displ)) +
  ggdensity::geom_hdr_lines()

## wrong order with modified probs
ggplot(mpg, aes(x = hwy, y = displ)) +
  ggdensity::geom_hdr_lines(probs = c(.25, .5, .75, .95))

Created on 2023-11-10 with reprex v2.0.2

I am using package version 0.1.0 as it is available on CRAN currently.

Order of scale for plots using both geom_hdr() and geom_hdr_points()

geom_hdr_points() plots points outside of the largest specified HDR with a probs value of 100% (as they are contained in the 100% HDR). This adds another value to the fill/color scale. When geom_hdr_points() is used in the same plot as geom_hdr() or geom_hdr_lines(), the order of the layers impacts the order of the scale. In order to get the correct order, geom_hdr_points() needs to be first.

Ideally, the order of the scale would be independent of the order of the layers. For example, for the default value of probs, we would have 100% > 99% > 95% > 80% > 50%.

library("ggdensity")
#> Loading required package: ggplot2

set.seed(1)
df <- data.frame(
  x = rnorm(5000),
  y = rnorm(5000)
) 

ggplot(df, aes(x, y)) + 
  geom_hdr_lines(aes(color = after_stat(probs)), alpha = 1, xlim = c(-4, 4), ylim = c(-4, 4)) +
  geom_hdr_points(xlim = c(-4, 4), ylim = c(-4, 4))

ggplot(df, aes(x, y)) + 
  geom_hdr_points(xlim = c(-4, 4), ylim = c(-4, 4)) + 
  geom_hdr_lines(aes(color = after_stat(probs)), alpha = 1, xlim = c(-4, 4), ylim = c(-4, 4))

Created on 2022-03-29 by the reprex package (v2.0.1)

Preserve state of random number generator in .onAttach()

Currently .onAttach() will change the state of a users random number generator:

ggdensity/R/attach.R

Lines 1 to 4 in 56de580

.onAttach <- function(...) {
if(!interactive() || stats::runif(1) > 0.1) return()
packageStartupMessage(' Please cite ggdensity! See citation("ggdensity") for details.')
}

Not a bug per se but would you consider a change to preserve it?

For reference I believe {ggplot2} was updated to use withr::with_preserve_seed() for this reason.

Motivate by this thread https://fosstodon.org/@[email protected]/112079393660379008

`boundary_x` and `boundary_y`

It'd be neat to have boundary_x and boundary_y arguments that you could pass into geom_hdr() and geom_hdr_lines() when method = "histogram", see my examples in the documentation to remember how it works for geom_histogram(). It'd be nice to have those work for all the methods, in fact. Have you come across any theory that addresses correcting density estimators for restricted support? The naive way would simply be to cut it to 0 and multiplicatively redistribute to the rest of the density (a la the truncated normal distribution), but I'd imagine others have thought about it more.

Update package to the new `linewidth` aesthetic

The latest ggplot2 update added the linewidth aesthetic to control well... line width instead of size. While the old size parameter still works (for now), it would be nice for the package not to throw warnings to the user. Here's a guide on how to update: https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/

Note:

Hi! I was asked to review your paper for the R Journal.

I think this is a great package with clear use cases and motivations. I cannot wait to find a reason to use it and blow all my colleagues minds ๐Ÿ˜„ .

As part of the review, I'm looking at the code of the package and i thought that in the spirit of open source, it would be much more productive if I voiced some of my comments as issues. This would allow us to have a conversation to clear up any misunderstanding instead of you having to rely solely on my potentially confusing review comments. This is in the spirit of rOpenSci package peer review system, which I think it's great. The editor told me that it was fine that I do this, although it would compromise my anonymity. Well, consider it compromised, I guess ๐Ÿ˜.

Issues with next version of ggplot2

Hi

We preparing the next release of ggplot2 and our reverse dependency checks show that your package is failing with the new version. Looking into it we see that your package somehow includes references to ggplot2 code from when it was build, which causes the check to emit the "Import not declared from..." error. Please see https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/ for some more info about why this can cause issues.

You can install the release candidate of ggplot2 using devtools::install_github('tidyverse/[email protected]') to test this out.

We plan to submit ggplot2 by the end of October and hope you can have a fix ready before then

Kind regards
Thomas

Fix extra guide generated by `geom_hdr_rug()` when `alpha` is overriden

When setting alpha = 1 or alpha = NA in geom_hdr_rug() (typically when mapping the computed value probs to fill), an extra guide is drawn. This is illustrated in an example in the README:

ggplot(cars, aes(speed, dist)) +
  geom_hdr() +
  geom_point(color = "red") +
  geom_hdr_rug(aes(fill = after_stat(probs)), length = unit(.2, "cm"), alpha = 1) + 
  scale_fill_viridis_d(option = "magma", begin = .8, end = 0)

Right now, the "fix" is to specify guides(alpha = "none"):

library("ggplot2"); theme_set(theme_bw())
library("ggdensity")

ggplot(cars, aes(speed, dist)) +
  geom_hdr() +
  geom_point(color = "red") +
  geom_hdr_rug(aes(fill = after_stat(probs)), length = unit(.2, "cm"), alpha = 1) + 
  scale_fill_viridis_d(option = "magma", begin = .8, end = 0) +
  guides(alpha = "none")

Created on 2022-06-06 by the reprex package (v2.0.1)


My understanding of the issue is that geom_hdr_rug() is calling draw_key_rect() 2 times for each group, for both fill and alpha. I don't know why this is happening, note that it doesn't happen with geom_hdr() when fill is used instead of alpha.

Remove duplicated code.

There is quite a bit of code duplication. For instance, the code to define the histogram mesh is copied 6 times (in hdr.R, hdr_lines.R and hdr_point.R, once for method == "histogram" and once for method == "freqpoly" in each file). The code to switch() between methods is also repeated 5 times.

These duplications will make it very hard to maintain the package and to add new density estimators. So I recommend refactoring the code. This includes 2 related recommendations:

  1. Move the code that computes HDR into a single stat (or two: one for 1D and one for 2D, if needed). Then, each geom can represent that stat with different geometries. (This is more aligned with the difference between the statistical transformations and the geometries used to represent them).

  2. Implement the stat to be agnostic towards which method is used to estimate the density and then implement the different density estimators separately. This might make it much easier for the developers to maintain and implement new methods as well as for users to use their own. (It's also related to #28).

    To illustrate how this might work, take the example of fitting an bivariate exponential distribution that's in the article. This looks like a natural extension of method = "mvnorm", which fits a bivariate normal. This suggests that it could be desirable for the user to use their own methods to estimate the density distribution. So the bivariate exponential could be simplified to something along these lines:

    # define my own density estimation function
    biexp <- function(data) {
      # estimate parameters governing joint pdf
      lambda_hat <- apply(data, 2, mean)
    
      # output a density function
      function(x, y) dexp(x, lambda[1]) * dexp(y, lambda[2])
    }
    
    ggplot(df, aes(x, y)) +
      geom_hdr(method = biexp) +
      geom_point(fill = "lightgreen", shape = 21) +
      coord_fixed() +
      scale_x_continuous(limits = c(0, 7)) +
      scale_y_continuous(limits = c(0, 7))

    Most of the work for this change seems to be already done in the fun_iso() function. This would make geom_hdr_fun() redundant (less maintenance overhead and less code duplication) and potentially make it much easier to implement new methods.

Plot theoretical HDRs corresponding to specified pdf

Implement geom_hdr_f and geom_hdr_lines_f which accept function f specifying arbitrary pdf. Calculate exact HDRs corresponding to specified pdf, plotting as contours. Similar to geom_function.

library("tidyverse")
library("ggdensity")

pdf <- \(x, y) dexp(x) * dexp(y)

df <- expand_grid(
  x = seq(0, 1.5, length.out = 101),
  y = seq(0, 1.5, length.out = 101)
) |>
  mutate(density = pdf(x, y))

# Possible implementation:
# ggplot() +
#   geom_hdr_f(f = pdf, xlim = c(0, 1.5), ylim = c(0, 1.5))

ggplot(df, aes(x, y, z = density)) +
  geom_contour_filled() +
  coord_equal(xlim = c(0, 1.5), ylim = c(0, 1.5), expand = FALSE)

Created on 2021-11-18 by the reprex package (v2.0.1)

method = "histogram" is slow?

We've discussed this in person, but I'm still surprised by how slow this thing is going. Here's a simple test case:

library("tidyverse"); theme_set(theme_minimal())
df <- tibble("x" = rnorm(1e5), "y" = rnorm(1e5))
ggplot(df, aes(x^2, y^2)) + geom_hdr(method = "histogram")

Plotting bug on geom_hdr_lines_fun?

I tried plotting the following mixture density using geom_hdr_lines_fun:

mixture_density <- function(x, y) {
  (1/2*dnorm(x, -2) + 1/2*dnorm(x, 2)) * dnorm(y, sd=2)
}

ggplot() + ggdensity::geom_hdr_lines_fun(fun=mixture_density, xlim=c(-8, 8), ylim=c(-8, 8)) + theme_bw()

I got this weird line cutting across the two modes of the contours.

Screen Shot 2022-08-23 at 7 10 02 PM

`geom_hdr_lines()` broken?

library("ggdensity")
#> Loading required package: ggplot2

df <- data.frame(x = rnorm(1000), y = rnorm(1000))

ggplot(df, aes(x, y)) + geom_hdr_lines()
#> Warning: Computation failed in `stat_hdr_lines()`:
#> object 'res' not found

Created on 2021-11-16 by the reprex package (v2.0.1)

Remove `library(tidyverse)` from README

Currently README.md calls library(tidyverse). This means that one needs to have the whole tidyverse installed to render the readme. However, it only uses functions from ggplot2. It would be cleaner to just use library(ggplot2).

Passing probs in not descending order returns the wrong legend

I may have a bug in geom_hdr(). I accidentally passed a vector of probabilities in a random order and I got a reasonable plot but with the wrong legend. It seems that the legend's keys get ordered using the probs argument but there is no warning or mention in the documentation about passing the probabilities in a specific order to match the plot with the legend.

Here is a reprex using examples similar to the documentation. I hope it helps.

library(ggdensity)
#> Loading required package: ggplot2

df <- data.frame(x = rnorm(1000), y = rnorm(1000))

# Defaout options for probs
ggplot(df, aes(x, y)) + 
  geom_hdr(probs = c(0.99, 0.95, 0.8, 0.5))

# Same probs but in reverse order
ggplot(df, aes(x, y)) + 
  geom_hdr(probs = rev(c(0.99, 0.95, 0.8, 0.5)))

# Same probs with no order
ggplot(df, aes(x, y)) + 
  geom_hdr(probs = c(0.95, 0.99, 0.5, 0.8))

Created on 2023-11-27 with reprex v2.0.2

Align function names to ggplot2's conventions

It might be too late for such a drastic change, but how about changing the names of the functions to follow more closely ggplot2's naming conventions?

For example, renaming geom_hdr_lines() to geom_density_2d_hdr() will make the function more discoverable through autocompletion (as the user can start typing geom_density and then get a list of the available density-related geometries).

Same with geom_hdr_rug() to geom_rug_hdr() and geom_hdr_point() to geom_point_hdr(). (This will also solve the namespace conflict with gghdr::geom_hdr_rug().)

Add tests

Currently there are no test in the package and many examples are not run. covr::package_coverage() reports 0.15% coverage. I think that adding tests to increase coverage to at least 60% is necessary before publication (ideally it should be higher, but testing ggplot2 extensions can be hard since it involves a lot of visual testing; even ggplot2 barely reaches 80%).

Check the vdiffr package for a guide on how to use visual regression testing.

Provide additional computed variable `level`

Now that the computed variable for the HDR probability is called probs (4d80857), it might be good to provide the value of the density cutoffs corresponding to the each HDR via level. This would be consistent with ggplot2::geom_density2d(). This mostly seems relevant for pedagogical purposes -- being able to identify HDRs with the height of the density estimator may be useful in explaining how they are calculated and how the graphics are created.

`geom_hdr()` fails when `probs` results in equal HDRs

When the any of the HDRs corresponding to probs are the same, geom_hdr() doesn't work. This is especially problematic for method = "histogram".

library("ggdensity")
#> Loading required package: ggplot2

df <- data.frame(
  x = rnorm(1000),
  y = rnorm(1000)
)

ggplot(df, aes(x, y)) +
  geom_hdr(method = "histogram", probs = c(.99, .991))
#> Warning: Computation failed in `stat_hdr()`:
#> factor level [2] is duplicated

Created on 2022-01-13 by the reprex package (v2.0.1)

`scale_x_continuous` incompatibility

When setting limits via scale_x_continuous, the following error is thrown:

Screen Shot 2021-11-18 at 10 55 18 AM


Interestingly, when rendering a reprex the error is not encountered:

library("ggdensity")
#> Loading required package: ggplot2

set.seed(1) 
df <- data.frame(
  x = rnorm(1000),
  y = rnorm(1000)
)

ggplot(df, aes(x^2, y^2)) + 
  geom_hdr(method = "mvnorm") +
  scale_x_continuous(limits = c(0, 9))
#> Warning: Removed 3 rows containing non-finite values (stat_hdr).

Created on 2021-11-18 by the reprex package (v2.0.1)

Add link to Github in pkgdown site

Hi,

to add the github link to your pkgdown site, you could change your description

In DESCRIPTION

URL: https://jamesotto852.github.io/ggdensity/

to

URL: https://jamesotto852.github.io/ggdensity, 
       https://github.com/jamesotto852/ggdensity
BugReports: https://github.com/jamesotto852/ggdensity/issues/

Export the algorithm in its own function

There's quite a lot of novel (to me, at least) implementation in this package. It would be nice for it to be accessible outside ggplot2. Perhaps consider exporting the logic behind StatHdr$compute_group() into its own function so that non-ggplot2 user can also benefit from this algorithm.

`geom_hdr_lines` with `method = "histogram"` doesn't draw lines on boundary

When using method = "histogram", xlim and ylim are not expanded. This is to respect hard cutoffs in the density's support. However, as geom_hdr_lines only draws in the region defined by xlim and ylim this results in plots where the contour lines are not drawn correctly on hard boundaries.

library("ggdensity"); theme_set(theme_bw())
#> Loading required package: ggplot2

set.seed(1) 
df <- data.frame(
  x = rexp(1000, 1),
  y = rexp(1000, 1)
)

ggplot(df, aes(x, y)) + 
  geom_hdr_lines(method = "histogram")

Created on 2021-11-18 by the reprex package (v2.0.1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.