Giter Club home page Giter Club logo

Comments (16)

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024

@ibecav You can raise the ideas you had for 0.0.8 release here and we can discuss them in this thread.

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

Okay I have 3 things initially:

  1. The movies_long dataset needs a revamp. Right now the categories overlap so when you run most analyses you violate the assumption of independence because a single movie may appear in two different places for genre. I've already made the necessary changes in my development environment in data-raw just need to chase down all the downstream effects. movies_wide is fine. I'll send a PR today.

  2. There's a small bug somewhere in ggbetweenstats (see reprex below). The output tibble seems to be breaking the genre "PG-13" in to two different groups "PG" and "13"

  3. If you look at the output plot from the reprex you'll see that outliers actually are plotted twice. Once as part of the normal jitter of all the data. The other time as an outlier. So if you look clodely at the top of the PG-13 category you'll see 7 dots. 4 orangeish and three black. Those 7 dots actually all portray the same 4 movies, the 3 Lord of the Rings + "Cera un volta...". Is it possible to suppress the regular dot when a point is identified as an outlier? If cam be confusing

library(ggstatsplot)
str(movies_long)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    2433 obs. of  8 variables:
#>  $ title : Factor w/ 1599 levels "'Til There Was You",..: 1270 871 870 872 1139 1231 1360 1363 246 290 ...
#>  $ year  : int  1994 2003 2001 2002 1994 1993 1977 1980 1968 2002 ...
#>  $ length: int  142 251 208 223 168 195 125 129 158 135 ...
#>  $ budget: num  25 94 93 94 8 25 11 18 5 3.3 ...
#>  $ rating: num  9.1 9 8.8 8.8 8.8 8.8 8.8 8.8 8.7 8.7 ...
#>  $ votes : int  149494 103631 157608 114797 132745 97667 134640 103706 17241 25964 ...
#>  $ mpaa  : Factor w/ 3 levels "PG","PG-13","R": 3 2 2 2 3 3 1 1 2 3 ...
#>  $ genre : Factor w/ 6 levels "Action","Animation",..: 5 1 1 1 5 5 1 1 5 5 ...
##### Testing
ggstatsplot::ggbetweenstats(movies_long,
  mpaa,
  rating,
  type = "parametric",
  effsize.type = "biased",
  partial = FALSE,
  var.equal = TRUE,
  mean.ci = TRUE,
  pairwise.comparisons = TRUE,
  p.adjust.method = "holm",
  outlier.tagging = TRUE,
  outlier.label = title
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#> 
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> # tibble [5 × 8]
#>   group1 group2 mean.difference conf.low conf.high  p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>    <dbl> <chr>       
#> 1 PG     13              0.0513  -0.135      0.237 NA       <NA>        
#> 2 R      PG              0.256    0.0797     0.431  1.33e-3 **          
#> 3 R      PG              0.204    0.0781     0.331  1.33e-3 **          
#> 4 PG-13  PG             NA       NA         NA      5.18e-1 ns          
#> 5 R      PG-13          NA       NA         NA      4.52e-4 ***         
#> # ... with 1 more variable: p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#> 
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = < 0.001
#> 

screen shot 2018-12-10 at 08 23 06

Created on 2018-12-10 by the reprex package (v0.2.1)

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024
  1. Just commented on that PR.

  2. Yeah, that shouldn't be happening. I will look into what's going on.

  3. Yeah, unfortunately, I've made peace with this issue. The problem is that the ggrepel labels need to be jittered in the same way as the underlying raw data points and I couldn't figure out a way to do that. So instead I am using non-jittered data points, which are flagged by geom_boxplot(). I'll be happy to accept a PR that circumvents this issue.

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

#1. I'm done with the PR.

#3 outlier labelling. I think I may have done a poor job explaining my issue. I'm content with the ggrepel labels I understand there is only so much you can do. My concern is that every outlier is being plotted twice once by geom_point (this happens first) then again by geom_boxplot. That means that there are extra points about.

I'm not toally sure but I think we can just use some of your existing condition code in front of the geom_point to set a filter to only plot points which are NOT outliers and then let boxplot handle plotting the ouliers. Here';s a very simple reprex to show you.

library(ggstatsplot)
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
theme_set(theme_bw())
xxx <- movies_long %>% 
  group_by(mpaa) %>% 
  mutate(
    isanoutlier = ifelse(
      ggstatsplot:::check_outlier(rating,coef = 1.5),
      yes = TRUE, 
      no = FALSE)
    )
ggplot(data = xxx, aes(x=mpaa, y=rating, label = title)) + 
  geom_point(data = filter(xxx, !isanoutlier), aes(color=mpaa), position = position_jitterdodge( jitter.width = 1)) + 
  stat_boxplot(alpha=.2, outlier.color = "black", outlier.alpha = 1.0) + 
  ggrepel::geom_label_repel(data = filter(xxx, isanoutlier)) 

Created on 2018-12-11 by the reprex package (v0.2.1)

Notice that outliers no longer have both a color point and a black point.

screen shot 2018-12-11 at 12 22 47

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

Okay I think I have a path to a proper PR sorted out for the outliers. I made break it into two separate PRs just to keep things manageable. More tomorrow.

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024

@ibecav Since the upstream dependency has broken CRAN version of ggstatsplot, I will have to submit a newer version soon. I will be submitting it on the 5th of January.

If you get a chance, can you take a look at refactoring the purrr hack in grouped_ggpiestats, similar to what you did for grouped_ggscatterstats?

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

Work complete. Managed to not break any tests or change any doco. PR with reprex in a minute

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024

Awesome! Thanks, Chuck. Everything looked good and I've merged the PR.

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024

@ibecav Do you think you will have time to refactor the grouped_ggcorrmat function?
This is the only grouped_ function remaining that still relies on the purrr hack. I can push back submission date to 8th if you think you will have time to do this before then.

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

I'll take a look today. First issue is that I have to visit the base command ggcorrmat because this doesn't work...

# setting output = "ci"` will return the confidence intervals for unique correlation pairs

ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = "sleep_total:bodywt",
p.adjust.method = "BH",
output = "ci"
)`

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024

Shouldn't that be "sleep_total":"bodywt", and not "sleep_total:bodywt" as the former will basically mean that there is a single variable named "sleep_total:bodywt" in the dataframe?

At any rate, this function seems to work across a wide array of entry methods for this argument-

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = "sleep_total":"bodywt",
  p.adjust.method = "BH",
  output = "ci"
)

#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = c("sleep_total":"bodywt"),
  p.adjust.method = "BH",
  output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = sleep_total:bodywt,
  p.adjust.method = "BH",
  output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = c(sleep_total:bodywt),
  p.adjust.method = "BH",
  output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

Created on 2019-01-04 by the reprex package (v0.2.1)

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

Right not complaining just saying I need to start at the base and see what works then work my way up. All depends on how kind you want to be to the user's input.

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on May 18, 2024

No, I am not taking that as a complaint. I am just trying to understand if you think the users might enter a range of variables in the form of "x:y", which is clearly used for a single variable both inside and outside of tidyeval domain, and not "x":"y", or x:y, or c("x":"y"), or c(x:y).

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

Lol they're users they could do just about anything! Including the old fashioned 6:10 notation (which works. I'll sort it out anything is possible I just need to check the possibilities

from ggstatsplot.

ibecav avatar ibecav commented on May 18, 2024

Prepping the PR now. Went a slightly different route on this one.

from ggstatsplot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.