Giter Club home page Giter Club logo

Comments (4)

tsostarics avatar tsostarics commented on June 3, 2024

New recursion-based functions .omit_function_calls and .simplify_formula solve this issue. In the process I cleaned up a few of the other error checks.

from contrastable.

tsostarics avatar tsostarics commented on June 3, 2024

Quick note that the .is_dominated_by_identity function was updated so that it can keep track of repeated operators more effectively. Consider two cases:

1) varName ~ sum_code + 1 + 2
2) varName ~ sum_code + 1 * 2 + 2 

If you only look to see if a node is directly dominated by itself (=*Identity constraint) then 1 is invalid but 2 is not. To fix this we can keep track of the operators we've encountered while recursing through the tree.

from contrastable.

tsostarics avatar tsostarics commented on June 3, 2024

another note on this.. Given that .simplify_formula already recurses through the structure it perhaps doesn't make sense to have to recurse through it a second time with .is_dominated_by_identity. It's a bit faster to just use a regular expression on the simplified formula. With this, we don't ever need to deparse the raw formula, so we can remove function parameter and code entirely.

## Matrix call is included to give .simplify_formula something to work with
# Case where the repeated operator appears later
tst <- gear ~ matrix(c(0.75, -0.25, -0.25,
                                 -0.25/4, -0.25*2, 0.75^2.3,
                                 -0.25, -0.25, -0.25%in%c(1,2,3),
                                 -0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 * 2 + - 2 + 2

# Case where the repeated operator appears immediately after the first ocurrence
tst2 <- gear ~ matrix(c(0.75, -0.25, -0.25,
                        -0.25/4, -0.25*2, 0.75^2.3,
                        -0.25, -0.25, -0.25%in%c(1,2,3),
                        -0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 + 4

# Case where there is no repeated operator
tst3 <- gear ~ matrix(c(0.75, -0.25, -0.25,
                        -0.25/4, -0.25*2, 0.75^2.3,
                        -0.25, -0.25, -0.25%in%c(1,2,3),
                        -0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 *2

# Case where the repeated operator appears MUCH later
tst4 <- gear ~ matrix(c(0.75, -0.25, -0.25,
                        -0.25/4, -0.25*2, 0.75^2.3,
                        -0.25, -0.25, -0.25%in%c(1,2,3),
                        -0.25, 0.75, -0.25) %>% abs(), nrow = 4) + 4 * 2 - 1 | 3 / 3 & 2 ^ 5 + 4

test_1 <- function(f) {
  s_f <- .simplify_formula(f)

  .any_dominated_by_identity(s_f[[3]])
}

test_2 <- function(f) {
  s_f <- .simplify_formula(f)
  c_f <- deparse1(s_f)
  
  grepl("([|+*-]).+(\\1)", c_f)
}

library(microbenchmark)
set.seed(111)
mb1 <- microbenchmark(test_1(tst), test_2(tst), times = 3000)
mb2 <- microbenchmark(test_1(tst2), test_2(tst2), times = 3000)
mb3 <- microbenchmark(test_1(tst3), test_2(tst3), times = 3000)
mb4 <- microbenchmark(test_1(tst4), test_2(tst4), times = 3000)

mb1;mb2;mb3;mb4
Unit: microseconds
        expr   min    lq    mean median     uq     max neval cld
 test_1(tst) 358.2 371.4 536.008  379.7 445.15 20494.9  3000   b
 test_2(tst) 319.0 330.4 474.503  337.8 402.70 29234.3  3000  a 
Unit: microseconds
         expr   min     lq     mean median     uq     max neval cld
 test_1(tst2) 230.3 243.45 343.1459  248.4 271.65 11962.1  3000   b
 test_2(tst2) 191.4 202.50 283.7148  206.4 227.00 13673.7  3000  a 
Unit: microseconds
         expr   min     lq     mean median    uq     max neval cld
 test_1(tst3) 284.9 298.10 406.0027  303.7 335.8 10351.7  3000   b
 test_2(tst3) 196.1 206.25 285.4946  210.7 233.0 14352.8  3000  a 
Unit: microseconds
         expr   min    lq     mean median    uq     max neval cld
 test_1(tst4) 503.3 520.7 778.1386  532.7 663.6 20459.4  3000   b
 test_2(tst4) 316.6 331.7 512.9367  340.0 406.3 13944.8  3000  a 

The overall speed is less than a millisecond in either case, but the regex method is a bit faster (line=if the time for both methods is equal). So the .is_dominated_by_identity doesn't actually seem to be needed, which makes things simpler. A bit sad to see it not used in the end, but it might be useful elsewhere at a later point, we'll see.

from contrastable.

tsostarics avatar tsostarics commented on June 3, 2024

Last note on this & I'm surprised i didn't consider this earlier but just adding the error handling to .omit_function_calls directly and allowing it to prematurely throw an error prevents it from needing to recurse through the whole structure 🤦‍♂️ This yields substantial improvement on the deeply nested case (tst4) above & an improvement on the simple case (tst3) since the final formula doesn't need to be checked

same tests as above (done on a better computer though), tryCatch added in for consistency across the three

test_1 <- function(f) {
  s_f <- .simplify_formula(f)
  
  .any_dominated_by_identity(s_f[[3]])
  tryCatch(stop(1), error = \(e) TRUE)
}

test_2 <- function(f) {
  s_f <- .simplify_formula(f)
  c_f <- deparse1(s_f)
  
  grepl("([|+*-]).+(\\1)", c_f)
  tryCatch(stop(1), error = \(e) TRUE)
}

test_3 <- function(f) {
  tryCatch(.simplify_formula2(f),error = \(e) TRUE)
}

library(microbenchmark)
set.seed(111)
mb1 <- microbenchmark(test_1(tst), test_2(tst), test_3(tst), times = 3000)
mb2 <- microbenchmark(test_1(tst2), test_2(tst2),test_3(tst2), times = 3000)
mb3 <- microbenchmark(test_1(tst3), test_2(tst3),test_3(tst3), times = 3000)
mb4 <- microbenchmark(test_1(tst4), test_2(tst4),test_3(tst4), times = 3000)

mb1;mb2;mb3;mb4
Unit: microseconds
        expr   min    lq     mean median     uq     max neval
 test_1(tst) 142.9 150.3 196.2080  155.2 163.05 87013.7  3000
 test_2(tst) 147.9 154.7 180.3039  159.5 167.70  4260.9  3000
 test_3(tst)  31.8  34.9  41.5482   36.5  38.40  5217.0  3000
Unit: microseconds
         expr  min   lq      mean median    uq    max neval
 test_1(tst2) 90.7 97.0 111.18980   99.1 101.9 4042.5  3000
 test_2(tst2) 92.9 97.9 108.21883   99.9 102.8 3876.9  3000
 test_3(tst2) 31.9 34.7  37.92533   36.0  37.3 3828.6  3000
Unit: microseconds
         expr   min    lq     mean median     uq    max neval
 test_1(tst3) 114.3 121.2 136.9549  123.7 127.45 3937.6  3000
 test_2(tst3)  92.9  98.3 109.9073  100.4 103.40 4052.2  3000
 test_3(tst3)  32.2  34.8  37.7552   36.0  37.30 3680.9  3000
Unit: microseconds
         expr   min    lq     mean median    uq    max neval
 test_1(tst4) 209.5 220.5 254.6805  224.5 230.4 5924.3  3000
 test_2(tst4) 147.6 154.8 168.0258  157.4 161.5 4122.5  3000
 test_3(tst4)  31.7  35.1  38.2686   36.5  37.8 3835.9  3000

from contrastable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.