Comments (16)
@ibecav You can raise the ideas you had for 0.0.8
release here and we can discuss them in this thread.
from ggstatsplot.
Okay I have 3 things initially:
-
The movies_long dataset needs a revamp. Right now the categories overlap so when you run most analyses you violate the assumption of independence because a single movie may appear in two different places for genre. I've already made the necessary changes in my development environment in data-raw just need to chase down all the downstream effects. movies_wide is fine. I'll send a PR today.
-
There's a small bug somewhere in ggbetweenstats (see reprex below). The output tibble seems to be breaking the genre "PG-13" in to two different groups "PG" and "13"
-
If you look at the output plot from the reprex you'll see that outliers actually are plotted twice. Once as part of the normal jitter of all the data. The other time as an outlier. So if you look clodely at the top of the PG-13 category you'll see 7 dots. 4 orangeish and three black. Those 7 dots actually all portray the same 4 movies, the 3 Lord of the Rings + "Cera un volta...". Is it possible to suppress the regular dot when a point is identified as an outlier? If cam be confusing
library(ggstatsplot)
str(movies_long)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2433 obs. of 8 variables:
#> $ title : Factor w/ 1599 levels "'Til There Was You",..: 1270 871 870 872 1139 1231 1360 1363 246 290 ...
#> $ year : int 1994 2003 2001 2002 1994 1993 1977 1980 1968 2002 ...
#> $ length: int 142 251 208 223 168 195 125 129 158 135 ...
#> $ budget: num 25 94 93 94 8 25 11 18 5 3.3 ...
#> $ rating: num 9.1 9 8.8 8.8 8.8 8.8 8.8 8.8 8.7 8.7 ...
#> $ votes : int 149494 103631 157608 114797 132745 97667 134640 103706 17241 25964 ...
#> $ mpaa : Factor w/ 3 levels "PG","PG-13","R": 3 2 2 2 3 3 1 1 2 3 ...
#> $ genre : Factor w/ 6 levels "Action","Animation",..: 5 1 1 1 5 5 1 1 5 5 ...
##### Testing
ggstatsplot::ggbetweenstats(movies_long,
mpaa,
rating,
type = "parametric",
effsize.type = "biased",
partial = FALSE,
var.equal = TRUE,
mean.ci = TRUE,
pairwise.comparisons = TRUE,
p.adjust.method = "holm",
outlier.tagging = TRUE,
outlier.label = title
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#>
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> # tibble [5 × 8]
#> group1 group2 mean.difference conf.low conf.high p.value significance
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 PG 13 0.0513 -0.135 0.237 NA <NA>
#> 2 R PG 0.256 0.0797 0.431 1.33e-3 **
#> 3 R PG 0.204 0.0781 0.331 1.33e-3 **
#> 4 PG-13 PG NA NA NA 5.18e-1 ns
#> 5 R PG-13 NA NA NA 4.52e-4 ***
#> # ... with 1 more variable: p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#>
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = < 0.001
#>
Created on 2018-12-10 by the reprex package (v0.2.1)
from ggstatsplot.
-
Just commented on that PR.
-
Yeah, that shouldn't be happening. I will look into what's going on.
-
Yeah, unfortunately, I've made peace with this issue. The problem is that the
ggrepel
labels need to be jittered in the same way as the underlying raw data points and I couldn't figure out a way to do that. So instead I am using non-jittered data points, which are flagged bygeom_boxplot()
. I'll be happy to accept a PR that circumvents this issue.
from ggstatsplot.
#1. I'm done with the PR.
#3 outlier labelling. I think I may have done a poor job explaining my issue. I'm content with the ggrepel labels I understand there is only so much you can do. My concern is that every outlier is being plotted twice once by geom_point (this happens first) then again by geom_boxplot. That means that there are extra points about.
I'm not toally sure but I think we can just use some of your existing condition code in front of the geom_point to set a filter to only plot points which are NOT outliers and then let boxplot handle plotting the ouliers. Here';s a very simple reprex to show you.
library(ggstatsplot)
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
theme_set(theme_bw())
xxx <- movies_long %>%
group_by(mpaa) %>%
mutate(
isanoutlier = ifelse(
ggstatsplot:::check_outlier(rating,coef = 1.5),
yes = TRUE,
no = FALSE)
)
ggplot(data = xxx, aes(x=mpaa, y=rating, label = title)) +
geom_point(data = filter(xxx, !isanoutlier), aes(color=mpaa), position = position_jitterdodge( jitter.width = 1)) +
stat_boxplot(alpha=.2, outlier.color = "black", outlier.alpha = 1.0) +
ggrepel::geom_label_repel(data = filter(xxx, isanoutlier))
Created on 2018-12-11 by the reprex package (v0.2.1)
Notice that outliers no longer have both a color point and a black point.
from ggstatsplot.
Okay I think I have a path to a proper PR sorted out for the outliers. I made break it into two separate PRs just to keep things manageable. More tomorrow.
from ggstatsplot.
@ibecav Since the upstream dependency has broken CRAN version of ggstatsplot
, I will have to submit a newer version soon. I will be submitting it on the 5th of January.
If you get a chance, can you take a look at refactoring the purrr
hack in grouped_ggpiestats
, similar to what you did for grouped_ggscatterstats
?
from ggstatsplot.
from ggstatsplot.
Work complete. Managed to not break any tests or change any doco. PR with reprex in a minute
from ggstatsplot.
Awesome! Thanks, Chuck. Everything looked good and I've merged the PR.
from ggstatsplot.
@ibecav Do you think you will have time to refactor the grouped_ggcorrmat
function?
This is the only grouped_
function remaining that still relies on the purrr
hack. I can push back submission date to 8th if you think you will have time to do this before then.
from ggstatsplot.
I'll take a look today. First issue is that I have to visit the base command ggcorrmat
because this doesn't work...
# setting
output = "ci"` will return the confidence intervals for unique correlation pairs
ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = "sleep_total:bodywt",
p.adjust.method = "BH",
output = "ci"
)`
from ggstatsplot.
Shouldn't that be "sleep_total":"bodywt"
, and not "sleep_total:bodywt"
as the former will basically mean that there is a single variable named "sleep_total:bodywt" in the dataframe?
At any rate, this function seems to work across a wide array of entry methods for this argument-
# works
ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = "sleep_total":"bodywt",
p.adjust.method = "BH",
output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#>
#> # A tibble: 15 x 7
#> pair r lower upper p lower.adj upper.adj
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.752 0.617 0.844 2.92e- 12 0.531 0.877
#> 2 2 -0.474 -0.706 -0.150 6.17e- 3 -0.786 0.0302
#> 3 3 -1.000 -1.000 -1.000 2.42e-226 -1.000 -1.000
#> 4 4 -0.360 -0.569 -0.108 6.35e- 3 -0.653 0.0257
#> 5 5 -0.312 -0.494 -0.103 4.09e- 3 -0.572 0.00539
#> 6 6 -0.338 -0.614 0.0120 5.84e- 2 -0.715 0.191
#> 7 7 -0.752 -0.844 -0.617 2.91e- 12 -0.877 -0.531
#> 8 8 -0.221 -0.476 0.0670 1.31e- 1 -0.580 0.209
#> 9 9 -0.328 -0.535 -0.0826 9.95e- 3 -0.620 0.0452
#> 10 10 0.474 0.150 0.706 6.17e- 3 -0.0302 0.786
#> 11 11 0.852 0.709 0.927 2.42e- 9 0.603 0.950
#> 12 12 0.418 0.0809 0.669 1.73e- 2 -0.0997 0.757
#> 13 13 0.360 0.108 0.569 6.35e- 3 -0.0257 0.653
#> 14 14 0.312 0.103 0.494 4.09e- 3 -0.00543 0.572
#> 15 15 0.934 0.889 0.961 9.16e- 26 0.858 0.970
# works
ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = c("sleep_total":"bodywt"),
p.adjust.method = "BH",
output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#>
#> # A tibble: 15 x 7
#> pair r lower upper p lower.adj upper.adj
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.752 0.617 0.844 2.92e- 12 0.531 0.877
#> 2 2 -0.474 -0.706 -0.150 6.17e- 3 -0.786 0.0302
#> 3 3 -1.000 -1.000 -1.000 2.42e-226 -1.000 -1.000
#> 4 4 -0.360 -0.569 -0.108 6.35e- 3 -0.653 0.0257
#> 5 5 -0.312 -0.494 -0.103 4.09e- 3 -0.572 0.00539
#> 6 6 -0.338 -0.614 0.0120 5.84e- 2 -0.715 0.191
#> 7 7 -0.752 -0.844 -0.617 2.91e- 12 -0.877 -0.531
#> 8 8 -0.221 -0.476 0.0670 1.31e- 1 -0.580 0.209
#> 9 9 -0.328 -0.535 -0.0826 9.95e- 3 -0.620 0.0452
#> 10 10 0.474 0.150 0.706 6.17e- 3 -0.0302 0.786
#> 11 11 0.852 0.709 0.927 2.42e- 9 0.603 0.950
#> 12 12 0.418 0.0809 0.669 1.73e- 2 -0.0997 0.757
#> 13 13 0.360 0.108 0.569 6.35e- 3 -0.0257 0.653
#> 14 14 0.312 0.103 0.494 4.09e- 3 -0.00543 0.572
#> 15 15 0.934 0.889 0.961 9.16e- 26 0.858 0.970
# works
ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = sleep_total:bodywt,
p.adjust.method = "BH",
output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#>
#> # A tibble: 15 x 7
#> pair r lower upper p lower.adj upper.adj
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.752 0.617 0.844 2.92e- 12 0.531 0.877
#> 2 2 -0.474 -0.706 -0.150 6.17e- 3 -0.786 0.0302
#> 3 3 -1.000 -1.000 -1.000 2.42e-226 -1.000 -1.000
#> 4 4 -0.360 -0.569 -0.108 6.35e- 3 -0.653 0.0257
#> 5 5 -0.312 -0.494 -0.103 4.09e- 3 -0.572 0.00539
#> 6 6 -0.338 -0.614 0.0120 5.84e- 2 -0.715 0.191
#> 7 7 -0.752 -0.844 -0.617 2.91e- 12 -0.877 -0.531
#> 8 8 -0.221 -0.476 0.0670 1.31e- 1 -0.580 0.209
#> 9 9 -0.328 -0.535 -0.0826 9.95e- 3 -0.620 0.0452
#> 10 10 0.474 0.150 0.706 6.17e- 3 -0.0302 0.786
#> 11 11 0.852 0.709 0.927 2.42e- 9 0.603 0.950
#> 12 12 0.418 0.0809 0.669 1.73e- 2 -0.0997 0.757
#> 13 13 0.360 0.108 0.569 6.35e- 3 -0.0257 0.653
#> 14 14 0.312 0.103 0.494 4.09e- 3 -0.00543 0.572
#> 15 15 0.934 0.889 0.961 9.16e- 26 0.858 0.970
# works
ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
cor.vars = c(sleep_total:bodywt),
p.adjust.method = "BH",
output = "ci"
)
#> Note: In the correlation matrix,
#> the upper triangle: p-values adjusted for multiple comparisons
#> the lower triangle: unadjusted p-values.
#>
#> # A tibble: 15 x 7
#> pair r lower upper p lower.adj upper.adj
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.752 0.617 0.844 2.92e- 12 0.531 0.877
#> 2 2 -0.474 -0.706 -0.150 6.17e- 3 -0.786 0.0302
#> 3 3 -1.000 -1.000 -1.000 2.42e-226 -1.000 -1.000
#> 4 4 -0.360 -0.569 -0.108 6.35e- 3 -0.653 0.0257
#> 5 5 -0.312 -0.494 -0.103 4.09e- 3 -0.572 0.00539
#> 6 6 -0.338 -0.614 0.0120 5.84e- 2 -0.715 0.191
#> 7 7 -0.752 -0.844 -0.617 2.91e- 12 -0.877 -0.531
#> 8 8 -0.221 -0.476 0.0670 1.31e- 1 -0.580 0.209
#> 9 9 -0.328 -0.535 -0.0826 9.95e- 3 -0.620 0.0452
#> 10 10 0.474 0.150 0.706 6.17e- 3 -0.0302 0.786
#> 11 11 0.852 0.709 0.927 2.42e- 9 0.603 0.950
#> 12 12 0.418 0.0809 0.669 1.73e- 2 -0.0997 0.757
#> 13 13 0.360 0.108 0.569 6.35e- 3 -0.0257 0.653
#> 14 14 0.312 0.103 0.494 4.09e- 3 -0.00543 0.572
#> 15 15 0.934 0.889 0.961 9.16e- 26 0.858 0.970
Created on 2019-01-04 by the reprex package (v0.2.1)
from ggstatsplot.
Right not complaining just saying I need to start at the base and see what works then work my way up. All depends on how kind you want to be to the user's input.
from ggstatsplot.
No, I am not taking that as a complaint. I am just trying to understand if you think the users might enter a range of variables in the form of "x:y"
, which is clearly used for a single variable both inside and outside of tidyeval domain, and not "x":"y"
, or x:y
, or c("x":"y")
, or c(x:y)
.
from ggstatsplot.
Lol they're users they could do just about anything! Including the old fashioned 6:10 notation (which works. I'll sort it out anything is possible I just need to check the possibilities
from ggstatsplot.
Prepping the PR now. Went a slightly different route on this one.
from ggstatsplot.
Related Issues (20)
- [ggbetweenstats] mtcars example: BA ($caption_data) not reported HOT 4
- ggbetweenstats: Welch's ANOVA producing NAs HOT 5
- zero-length inputs cannot be mixed with those of non-zero length with ggbetweenstats because of StatsExpression 1.5.2 HOT 3
- text = element_text(size = .) can not change sample size label size HOT 2
- package ‘ggstatsplot’ is not available (for R version 3.6.3) HOT 1
- Set group.var from string failed HOT 1
- p-value arrow-heads not being displayed in graph HOT 1
- Missing specify_decimal_p() function HOT 1
- Outlier values included within min-max range of boxplot HOT 4
- User question about dynamic names
- ggstatsplot installation issue. HOT 3
- Setting scale for histograms in ggscatterstats
- Significance is indicated by “*” instead of a specific value HOT 1
- Simpler statistical results HOT 3
- Pairwise comparisons not showing HOT 3
- "Removing" the violin plot from `ggbetweenstats` does not really remove it, but rather adds a thin line on the plot. HOT 5
- ggpiestats Cramer's V upper confidence intervals is always 1, and it shouldn't be HOT 2
- packages not installing in Rstudio HOT 1
- Invalid class "ddenseModelMatrix" object HOT 1
- Error in `filter()`: ! In argument: `!is.na(x)`. Caused by error: HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ggstatsplot.