Comments (4)
New recursion-based functions .omit_function_calls
and .simplify_formula
solve this issue. In the process I cleaned up a few of the other error checks.
from contrastable.
Quick note that the .is_dominated_by_identity
function was updated so that it can keep track of repeated operators more effectively. Consider two cases:
1) varName ~ sum_code + 1 + 2
2) varName ~ sum_code + 1 * 2 + 2
If you only look to see if a node is directly dominated by itself (=*Identity constraint) then 1 is invalid but 2 is not. To fix this we can keep track of the operators we've encountered while recursing through the tree.
from contrastable.
another note on this.. Given that .simplify_formula
already recurses through the structure it perhaps doesn't make sense to have to recurse through it a second time with .is_dominated_by_identity
. It's a bit faster to just use a regular expression on the simplified formula. With this, we don't ever need to deparse the raw formula, so we can remove function parameter and code entirely.
## Matrix call is included to give .simplify_formula something to work with
# Case where the repeated operator appears later
tst <- gear ~ matrix(c(0.75, -0.25, -0.25,
-0.25/4, -0.25*2, 0.75^2.3,
-0.25, -0.25, -0.25%in%c(1,2,3),
-0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 * 2 + - 2 + 2
# Case where the repeated operator appears immediately after the first ocurrence
tst2 <- gear ~ matrix(c(0.75, -0.25, -0.25,
-0.25/4, -0.25*2, 0.75^2.3,
-0.25, -0.25, -0.25%in%c(1,2,3),
-0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 + 4
# Case where there is no repeated operator
tst3 <- gear ~ matrix(c(0.75, -0.25, -0.25,
-0.25/4, -0.25*2, 0.75^2.3,
-0.25, -0.25, -0.25%in%c(1,2,3),
-0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 *2
# Case where the repeated operator appears MUCH later
tst4 <- gear ~ matrix(c(0.75, -0.25, -0.25,
-0.25/4, -0.25*2, 0.75^2.3,
-0.25, -0.25, -0.25%in%c(1,2,3),
-0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 * 2 - 1 | 3 / 3 & 2 ^ 5 + 4
test_1 <- function(f) {
s_f <- .simplify_formula(f)
.any_dominated_by_identity(s_f[[3]])
}
test_2 <- function(f) {
s_f <- .simplify_formula(f)
c_f <- deparse1(s_f)
grepl("([|+*-]).+(\\1)", c_f)
}
library(microbenchmark)
set.seed(111)
mb1 <- microbenchmark(test_1(tst), test_2(tst), times = 3000)
mb2 <- microbenchmark(test_1(tst2), test_2(tst2), times = 3000)
mb3 <- microbenchmark(test_1(tst3), test_2(tst3), times = 3000)
mb4 <- microbenchmark(test_1(tst4), test_2(tst4), times = 3000)
mb1;mb2;mb3;mb4
Unit: microseconds
expr min lq mean median uq max neval cld
test_1(tst) 358.2 371.4 536.008 379.7 445.15 20494.9 3000 b
test_2(tst) 319.0 330.4 474.503 337.8 402.70 29234.3 3000 a
Unit: microseconds
expr min lq mean median uq max neval cld
test_1(tst2) 230.3 243.45 343.1459 248.4 271.65 11962.1 3000 b
test_2(tst2) 191.4 202.50 283.7148 206.4 227.00 13673.7 3000 a
Unit: microseconds
expr min lq mean median uq max neval cld
test_1(tst3) 284.9 298.10 406.0027 303.7 335.8 10351.7 3000 b
test_2(tst3) 196.1 206.25 285.4946 210.7 233.0 14352.8 3000 a
Unit: microseconds
expr min lq mean median uq max neval cld
test_1(tst4) 503.3 520.7 778.1386 532.7 663.6 20459.4 3000 b
test_2(tst4) 316.6 331.7 512.9367 340.0 406.3 13944.8 3000 a
The overall speed is less than a millisecond in either case, but the regex method is a bit faster (line=if the time for both methods is equal). So the .is_dominated_by_identity
doesn't actually seem to be needed, which makes things simpler. A bit sad to see it not used in the end, but it might be useful elsewhere at a later point, we'll see.
from contrastable.
Last note on this & I'm surprised i didn't consider this earlier but just adding the error handling to .omit_function_calls
directly and allowing it to prematurely throw an error prevents it from needing to recurse through the whole structure 🤦♂️ This yields substantial improvement on the deeply nested case (tst4) above & an improvement on the simple case (tst3) since the final formula doesn't need to be checked
same tests as above (done on a better computer though), tryCatch added in for consistency across the three
test_1 <- function(f) {
s_f <- .simplify_formula(f)
.any_dominated_by_identity(s_f[[3]])
tryCatch(stop(1), error = \(e) TRUE)
}
test_2 <- function(f) {
s_f <- .simplify_formula(f)
c_f <- deparse1(s_f)
grepl("([|+*-]).+(\\1)", c_f)
tryCatch(stop(1), error = \(e) TRUE)
}
test_3 <- function(f) {
tryCatch(.simplify_formula2(f),error = \(e) TRUE)
}
library(microbenchmark)
set.seed(111)
mb1 <- microbenchmark(test_1(tst), test_2(tst), test_3(tst), times = 3000)
mb2 <- microbenchmark(test_1(tst2), test_2(tst2),test_3(tst2), times = 3000)
mb3 <- microbenchmark(test_1(tst3), test_2(tst3),test_3(tst3), times = 3000)
mb4 <- microbenchmark(test_1(tst4), test_2(tst4),test_3(tst4), times = 3000)
mb1;mb2;mb3;mb4
Unit: microseconds
expr min lq mean median uq max neval
test_1(tst) 142.9 150.3 196.2080 155.2 163.05 87013.7 3000
test_2(tst) 147.9 154.7 180.3039 159.5 167.70 4260.9 3000
test_3(tst) 31.8 34.9 41.5482 36.5 38.40 5217.0 3000
Unit: microseconds
expr min lq mean median uq max neval
test_1(tst2) 90.7 97.0 111.18980 99.1 101.9 4042.5 3000
test_2(tst2) 92.9 97.9 108.21883 99.9 102.8 3876.9 3000
test_3(tst2) 31.9 34.7 37.92533 36.0 37.3 3828.6 3000
Unit: microseconds
expr min lq mean median uq max neval
test_1(tst3) 114.3 121.2 136.9549 123.7 127.45 3937.6 3000
test_2(tst3) 92.9 98.3 109.9073 100.4 103.40 4052.2 3000
test_3(tst3) 32.2 34.8 37.7552 36.0 37.30 3680.9 3000
Unit: microseconds
expr min lq mean median uq max neval
test_1(tst4) 209.5 220.5 254.6805 224.5 230.4 5924.3 3000
test_2(tst4) 147.6 154.8 168.0258 157.4 161.5 4122.5 3000
test_3(tst4) 31.7 35.1 38.2686 36.5 37.8 3835.9 3000
from contrastable.
Related Issues (20)
- Evaluating reference level as NA for `contr.helmert` HOT 5
- Add `{hypr}` integration HOT 2
- additional formula operator for labels HOT 2
- Allow scores in polynomial contrasts HOT 4
- `contr.orthonorm` is transposed
- Error when applying contrasts to grouping variable in `grouped_df` HOT 1
- Add helper function to print fractional contrast matrices
- Improve error message for invalid reference level
- Change some function names HOT 1
- Pass `n_levels` to first function parameter
- Rename contrast scheme parameters from `n_levels` to `n`
- Allow multiple variable names or tidyselect helpers on left hand side of formulas
- Add warning when there are no dimnames when using `as_is` HOT 2
- Change default columns in glimpse_contrasts
- Add how to use/cite in publication vignette/section of readme
- Add dataframe attribute for how contrasts are set
- Decompose interactions with `decompose_contrasts`
- Swap helmert coding to match order in contr.helmert
- Better error message when model data is not provided
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from contrastable.