Planned date: early Feb 2019 (Goal for release date: last week of De

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Okay I have 3 things initially: The movies_long

Just commented on that PR. Yeah, tha

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="28

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

goals for `0.0.8` release about ggstatsplot HOT 16 CLOSED

indrajeetpatil commented on May 18, 2024 1

goals for `0.0.8` release

from ggstatsplot.

Comments (16)

IndrajeetPatil commented on May 18, 2024

@ibecav You can raise the ideas you had for 0.0.8 release here and we can discuss them in this thread.

from ggstatsplot.

ibecav commented on May 18, 2024

Okay I have 3 things initially:

The movies_long dataset needs a revamp. Right now the categories overlap so when you run most analyses you violate the assumption of independence because a single movie may appear in two different places for genre. I've already made the necessary changes in my development environment in data-raw just need to chase down all the downstream effects. movies_wide is fine. I'll send a PR today.
There's a small bug somewhere in ggbetweenstats (see reprex below). The output tibble seems to be breaking the genre "PG-13" in to two different groups "PG" and "13"
If you look at the output plot from the reprex you'll see that outliers actually are plotted twice. Once as part of the normal jitter of all the data. The other time as an outlier. So if you look clodely at the top of the PG-13 category you'll see 7 dots. 4 orangeish and three black. Those 7 dots actually all portray the same 4 movies, the 3 Lord of the Rings + "Cera un volta...". Is it possible to suppress the regular dot when a point is identified as an outlier? If cam be confusing

library(ggstatsplot)
str(movies_long)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    2433 obs. of  8 variables:
#>  $ title : Factor w/ 1599 levels "'Til There Was You",..: 1270 871 870 872 1139 1231 1360 1363 246 290 ...
#>  $ year  : int  1994 2003 2001 2002 1994 1993 1977 1980 1968 2002 ...
#>  $ length: int  142 251 208 223 168 195 125 129 158 135 ...
#>  $ budget: num  25 94 93 94 8 25 11 18 5 3.3 ...
#>  $ rating: num  9.1 9 8.8 8.8 8.8 8.8 8.8 8.8 8.7 8.7 ...
#>  $ votes : int  149494 103631 157608 114797 132745 97667 134640 103706 17241 25964 ...
#>  $ mpaa  : Factor w/ 3 levels "PG","PG-13","R": 3 2 2 2 3 3 1 1 2 3 ...
#>  $ genre : Factor w/ 6 levels "Action","Animation",..: 5 1 1 1 5 5 1 1 5 5 ...
##### Testing
ggstatsplot::ggbetweenstats(movies_long,
  mpaa,
  rating,
  type = "parametric",
  effsize.type = "biased",
  partial = FALSE,
  var.equal = TRUE,
  mean.ci = TRUE,
  pairwise.comparisons = TRUE,
  p.adjust.method = "holm",
  outlier.tagging = TRUE,
  outlier.label = title
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#> 
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> # tibble [5 × 8]
#>   group1 group2 mean.difference conf.low conf.high  p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>    <dbl> <chr>       
#> 1 PG     13              0.0513  -0.135      0.237 NA       <NA>        
#> 2 R      PG              0.256    0.0797     0.431  1.33e-3 **          
#> 3 R      PG              0.204    0.0781     0.331  1.33e-3 **          
#> 4 PG-13  PG             NA       NA         NA      5.18e-1 ns          
#> 5 R      PG-13          NA       NA         NA      4.52e-4 ***         
#> # ... with 1 more variable: p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#> 
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = < 0.001
#>

^{Created on 2018-12-10 by the reprex package (v0.2.1)}

from ggstatsplot.

IndrajeetPatil commented on May 18, 2024

Just commented on that PR.
Yeah, that shouldn't be happening. I will look into what's going on.
Yeah, unfortunately, I've made peace with this issue. The problem is that the ggrepel labels need to be jittered in the same way as the underlying raw data points and I couldn't figure out a way to do that. So instead I am using non-jittered data points, which are flagged by geom_boxplot(). I'll be happy to accept a PR that circumvents this issue.

from ggstatsplot.

ibecav commented on May 18, 2024

#1. I'm done with the PR.

#3 outlier labelling. I think I may have done a poor job explaining my issue. I'm content with the ggrepel labels I understand there is only so much you can do. My concern is that every outlier is being plotted twice once by geom_point (this happens first) then again by geom_boxplot. That means that there are extra points about.

I'm not toally sure but I think we can just use some of your existing condition code in front of the geom_point to set a filter to only plot points which are NOT outliers and then let boxplot handle plotting the ouliers. Here';s a very simple reprex to show you.

library(ggstatsplot)
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
theme_set(theme_bw())
xxx <- movies_long %>% 
  group_by(mpaa) %>% 
  mutate(
    isanoutlier = ifelse(
      ggstatsplot:::check_outlier(rating,coef = 1.5),
      yes = TRUE, 
      no = FALSE)
    )
ggplot(data = xxx, aes(x=mpaa, y=rating, label = title)) + 
  geom_point(data = filter(xxx, !isanoutlier), aes(color=mpaa), position = position_jitterdodge( jitter.width = 1)) + 
  stat_boxplot(alpha=.2, outlier.color = "black", outlier.alpha = 1.0) + 
  ggrepel::geom_label_repel(data = filter(xxx, isanoutlier))

^{Created on 2018-12-11 by the reprex package (v0.2.1)}

Notice that outliers no longer have both a color point and a black point.

from ggstatsplot.

ibecav commented on May 18, 2024

Okay I think I have a path to a proper PR sorted out for the outliers. I made break it into two separate PRs just to keep things manageable. More tomorrow.

from ggstatsplot.

IndrajeetPatil commented on May 18, 2024

@ibecav Since the upstream dependency has broken CRAN version of ggstatsplot, I will have to submit a newer version soon. I will be submitting it on the 5th of January.

If you get a chance, can you take a look at refactoring the purrr hack in grouped_ggpiestats, similar to what you did for grouped_ggscatterstats?

from ggstatsplot.

ibecav commented on May 18, 2024

Okay likely be Monday Sent from my mobile please forgive my brevity

…

On Dec 27, 2018, at 09:10, Indrajeet Patil ***@***.***> wrote: @ibecav Since the upstream dependency has broken CRAN version of ggstatsplot, I will have to submit a newer version soon. I will be submitting it on the 5th of January. If you get a chance, can you take a look at refactoring the purrr hack in grouped_ggpiestats, similar to what you did for grouped_ggscatterstats? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

from ggstatsplot.

ibecav commented on May 18, 2024

Work complete. Managed to not break any tests or change any doco. PR with reprex in a minute

from ggstatsplot.

IndrajeetPatil commented on May 18, 2024

Awesome! Thanks, Chuck. Everything looked good and I've merged the PR.

from ggstatsplot.

IndrajeetPatil commented on May 18, 2024

@ibecav Do you think you will have time to refactor the grouped_ggcorrmat function?
This is the only grouped_ function remaining that still relies on the purrr hack. I can push back submission date to 8th if you think you will have time to do this before then.

from ggstatsplot.

ibecav commented on May 18, 2024

I'll take a look today. First issue is that I have to visit the base command ggcorrmat because this doesn't work...

# setting output = "ci"` will return the confidence intervals for unique correlation pairs

ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = "sleep_total:bodywt",
p.adjust.method = "BH",
output = "ci"
)`

from ggstatsplot.

IndrajeetPatil commented on May 18, 2024

Shouldn't that be "sleep_total":"bodywt", and not "sleep_total:bodywt" as the former will basically mean that there is a single variable named "sleep_total:bodywt" in the dataframe?

At any rate, this function seems to work across a wide array of entry methods for this argument-

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = "sleep_total":"bodywt",
  p.adjust.method = "BH",
  output = "ci"
)

#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = c("sleep_total":"bodywt"),
  p.adjust.method = "BH",
  output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = sleep_total:bodywt,
  p.adjust.method = "BH",
  output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

# works
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  cor.vars = c(sleep_total:bodywt),
  p.adjust.method = "BH",
  output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#> 
#> # A tibble: 15 x 7
#>    pair       r   lower   upper         p lower.adj upper.adj
#>    <chr>  <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1 1      0.752  0.617   0.844  2.92e- 12   0.531     0.877  
#>  2 2     -0.474 -0.706  -0.150  6.17e-  3  -0.786     0.0302 
#>  3 3     -1.000 -1.000  -1.000  2.42e-226  -1.000    -1.000  
#>  4 4     -0.360 -0.569  -0.108  6.35e-  3  -0.653     0.0257 
#>  5 5     -0.312 -0.494  -0.103  4.09e-  3  -0.572     0.00539
#>  6 6     -0.338 -0.614   0.0120 5.84e-  2  -0.715     0.191  
#>  7 7     -0.752 -0.844  -0.617  2.91e- 12  -0.877    -0.531  
#>  8 8     -0.221 -0.476   0.0670 1.31e-  1  -0.580     0.209  
#>  9 9     -0.328 -0.535  -0.0826 9.95e-  3  -0.620     0.0452 
#> 10 10     0.474  0.150   0.706  6.17e-  3  -0.0302    0.786  
#> 11 11     0.852  0.709   0.927  2.42e-  9   0.603     0.950  
#> 12 12     0.418  0.0809  0.669  1.73e-  2  -0.0997    0.757  
#> 13 13     0.360  0.108   0.569  6.35e-  3  -0.0257    0.653  
#> 14 14     0.312  0.103   0.494  4.09e-  3  -0.00543   0.572  
#> 15 15     0.934  0.889   0.961  9.16e- 26   0.858     0.970

^{Created on 2019-01-04 by the reprex package (v0.2.1)}

from ggstatsplot.

ibecav commented on May 18, 2024

Right not complaining just saying I need to start at the base and see what works then work my way up. All depends on how kind you want to be to the user's input.

from ggstatsplot.

IndrajeetPatil commented on May 18, 2024

No, I am not taking that as a complaint. I am just trying to understand if you think the users might enter a range of variables in the form of "x:y", which is clearly used for a single variable both inside and outside of tidyeval domain, and not "x":"y", or x:y, or c("x":"y"), or c(x:y).

from ggstatsplot.

ibecav commented on May 18, 2024

Lol they're users they could do just about anything! Including the old fashioned 6:10 notation (which works. I'll sort it out anything is possible I just need to check the possibilities

from ggstatsplot.

ibecav commented on May 18, 2024

Prepping the PR now. Went a slightly different route on this one.

from ggstatsplot.

goals for `0.0.8` release about ggstatsplot HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent