yjunechoe / ggtrace Goto Github PK

Programmatically inspect, debug, and manipulate ggplot internals

Home Page: https://yjunechoe.github.io/ggtrace

License: Other

R 99.51% CSS 0.49%

ggtrace's Introduction

I am a Ph.D. candidate in Linguistics at the University of Pennsylvania, studying psycholinguistics and language acquisition.

I'm also an R enthusiast and data visualization hobbyist. Outside of linguistics research, I develop open source software for statistical computing and graphics, data quality assurance, and (interfaces to) data APIs.

ggtrace's People

Contributors

Stargazers

Watchers

ggtrace's Issues

ggedit

New function ggedit() should work similarly ggtrace except it only takes the method/obj and call trace() inside with edit = TRUE.

ggtrace_aes wrapper

takes an aesthetic and gets its value at stage, after_stat, after_scale

stage could be nice if data is inherited but data arg in layer is a function, so value of aes at stage is not transparent

#19 can be motivation

early returns means trace at a particular step may not get triggered

Step 2 returns empty df if data is null, so breaks early here:

> ggtrace(
  ggplot2:::Layer$map_statistic,
  seq_len(length(ggbody(ggplot2:::Layer$map_statistic))),
  quote(1 + 1)
)
> ggplot()
Triggering trace on ggplot2:::Layer$map_statistic

[Step 1]> 1 + 1
[1] 2

[Step 2]> 1 + 1
[1] 2

Call `last_ggtrace()` to get the trace dump.
Untracing ggplot2:::Layer$map_statistic on exit.

Step 8 returns data if there are no calculated or staged aesthetics, so it returns early for StatIdentity as well

> ggtrace(
  ggplot2:::Layer$map_statistic,
  seq_len(length(ggbody(ggplot2:::Layer$map_statistic))),
  quote(1 + 1)
)
> ggplot(mtcars, aes(mpg, hp)) + geom_point()
Triggering trace on ggplot2:::Layer$map_statistic

[Step 1]> 1 + 1
[1] 2

[Step 2]> 1 + 1
[1] 2

[Step 3]> 1 + 1
[1] 2

[Step 4]> 1 + 1
[1] 2

[Step 5]> 1 + 1
[1] 2

[Step 6]> 1 + 1
[1] 2

[Step 7]> 1 + 1
[1] 2

[Step 8]> 1 + 1
[1] 2

Call `last_ggtrace()` to get the trace dump.
Untracing ggplot2:::Layer$map_statistic on exit.

Document replacing `self` with a modified ggproto copy

Document replacing self with a modified ggproto copy

Originally posted by @yjunechoe in #37 (comment)

remove all instances of ~line where it's not standalone

head(~line) appears in a couple places and should instead be passed in as ~line with .print = FALSE

Should also change the wording of ggtrace() documentation as well to say that only ~line alone will be substituted for the expression at the current step:

To simply run a step (or reference the expression at a step), you can use the ~line keyword. All instances of ~line will get substituted by the expression inside the debugging environment.

rename ~line to ~step

Also, think about changing ~line to ~step for consistent terminology

Originally posted by @yjunechoe in #11 (comment)

ggbody should check if method is already being traced

currently throws uninformative error that trace steps goes out of range since tracing collapses method body where it's inserted

Maybe add an arg like untrace_first = TRUE?

requires #32

error in ggbody docs example

The example with custom ggproto says object not found

https://yjunechoe.github.io/ggtrace/reference/ggbody.html

Works in console so unsure if this was just a problem with build

add a test that cleans up temp vars from inject via other means

ggtrace/tests/testthat/test-ggtrace-inject.R

Line 127 in 9a642e9

test_that("injections can clean up locally defined variables", {

rlang::env_unbind() - https://rlang.r-lib.org/reference/env_unbind.html
var <- local({...})

Better string conversion for ggproto objects

ggproto objects with long names get truncated. Full names (+ not enclosed in <>) would be nice for messages (ex: returning the corresponding gguntrace() code as a message when using ggtrace(once = FALSE))

The offending line from ggtrace(): obj_name <- rlang::as_label(obj)

Reprex:

rlang::as_label(ggplot2::StatBoxplot)
[1] "<SttBxplt>"
rlang::expr_deparse(ggplot2::StatBoxplot, width = Inf)
[1] "<SttBxplt>"

rlang::as_label(ggplot2::StatBin)
[1] "<StatBin>"
rlang::expr_deparse(ggplot2::StatBin, width = Inf)
[1] "<StatBin>"

The solution probably exists somewhere in ggplot2 docs(?)

quosures should be passed around instead of expressions

util functions need to be able to see the env in which the ggproto method was defined, in case it becomes inaccessible from the util function's local scope

.store overrides ggplot2::.store

should rename it to something like .ggtrace_storage

Finish first draft of ggtrace tests

Should minimally cover the documented usecases

General (naming, print & message, persistent trace)
Untracing (gguntrace())
Tracedumps (last_ggtrace()/global_ggtrace())
Inspect (expressions return values)
Capture (expressions return environments)
Inject (expressions modify the runtime environment)
Error handling (all explicit rlang::abort() cases)
~~Options (ggtrace.as_tibble, ggtrace.suppressMessages --- use {withr})~~

negative index support

trace_steps = -1 references the last step of the body

stress test as_label conversion of user-supplied trace_exprs

As a side note, as_label() seems too strict when trace_exprs can be arbitrary - maybe we need to use the same expr_deparse() logic here as well for deparsing user-supplied expressions

Originally posted by @yjunechoe in #16 (comment)

document injection + bang-bang combo

If new_data is a modified form of data retrieved from the same location, you inject it by ovverriding data with assign in the next trace

So this works:

ggtrace(
    method = PositionJitter$compute_layer,
    trace_steps = 12,
    trace_exprs = rlang::expr(data <- !!new_data),
    .print = FALSE
)

Remove purrr dependency

Factor out purrr::map() and purrr::map2_chr() inside ggtrace() - they aren't necessary

Document "filtering" on a condition with persistent tracing

Derive an example that looks something like the test at https://github.com/yjunechoe/ggtrace/blob/master/tests/testthat/test-ggtrace-persistent.R

add test for conditional injection

Can you ensure that the injection expression is only evaluated when a condition is met?

For example in this example from tests, if order of the two layers are switched and you want to change behavior of how staged aes are handled via Layer, you need once = FALSE to reach the second layer where this is relevant. But can you do this without modifying self$stat of the first layer by making the injection conditional?

ggtrace/tests/testthat/test-ggtrace-inject.R

Lines 264 to 267 in 94db757

 p <- ggplot(data.frame(value = 16)) + 

 geom_point(aes(stage(value, after_stat = x), 0), colour = "black", size = 10) + 

 geom_point(aes(value, 0), colour = "red", size = 10) + 

 scale_x_sqrt(limits = c(0, 16), breaks = c(0, 4, 16))

(also this should be another test but worth considering while resolving this one -- can you uniquely identify a self/layer by its position in the plot (code) w/o relying on its content? also should check whether the injection in the linked test is actually ephemeral by checking the state of p$layers[[2]]$stat$retransform or something like that, however you access layers from ggplot object. If you want it to be truly ephemeral, might as well copy the geom_point layer environment, change its stat property, and assign that whole thing to self)

wrap common trace workflows

Something like this?:

powertrace(template = "<<name>>", ...)
register_powertrace(trace_fn = ... )

Ex1: track down how aes gets resolved in stage(), after_stat(), after_scale() (#19)

Ex2: wraps this workflow - returns the data every time data changes inside ggplot_build.ggplot:

library(ggtrace)
library(ggplot2)
library(rlang)

# Bar plot using computed/"mapped" aesthetics with `after_stat()` and `after_scale()`
barplot_plot <- ggplot(data = palmerpenguins::penguins) +
  geom_bar(
    mapping = aes(
      x = species,                           # Discrete x-axis representing species
      y = after_stat(count / sum(count)),    # Bars represent count of species as proportions
      color = species,                       # The outline of the bars are colored by species
      fill = after_scale(alpha(color, 0.5))  # The fill of the bars are lighter than the outline color
    ),
    size = 3
  )
barplot_plot

ggbody(ggplot2:::ggplot_build.ggplot)

data_assigns <- vapply(ggbody(ggplot2:::ggplot_build.ggplot), function(x) {
  is_call(x) && !is.null(call_name(x)) && call_name(x) == "<-" && call_args(x)[[1]] == "data"
}, logical(1))

which(data_assigns)
inspection_exprs <- lapply(ggbody(ggplot2:::ggplot_build.ggplot)[data_assigns], function(x) { call_args(x)[[2]] })

ggtrace(
  method = ggplot2:::ggplot_build.ggplot,
  trace_steps = which(data_assigns),
  trace_exprs = inspection_exprs,
  use_names = FALSE,
  print_output = FALSE
)
is_traced(ggplot2:::ggplot_build.ggplot)

barplot_plot

tracedump <- last_ggtrace()
tracedump_layer1 <- lapply(tracedump, `[[`, 1)

names(tracedump)
ggbody(ggplot2:::ggplot_build.ggplot)[data_assigns]

Return only 1-length string with rlang::expr_deparse()

More of a TIL for myself but this is possible with width = Inf 74b0b57

add examples and tests for ggtrace state

examples in doc
tests for output and messages in test-last_ggtrace.R

turning printing on causes expr to be evaluated twice

library(ggtrace) # v0.4.1

aaa <- function() {
  a <- 1
  b <- 1
  c <- 1
  a + b + c
}
original <- aaa()

ggtrace(aaa, -1, quote(a <- a + 10), verbose = FALSE)
#> aaa now being traced.
no_print <- aaa()
#> Triggering trace on aaa
#> Untracing aaa on exit.

ggtrace(aaa, -1, quote(a <- a + 10))
#> aaa now being traced.
yes_print <- aaa()
#> Triggering trace on aaa
#> 
#> [Step 5]> a <- a + 10
#> [1] 11
#> 
#> Call `last_ggtrace()` to get the trace dump.
#> Untracing aaa on exit.

original
#> [1] 3
no_print
#> [1] 13
yes_print
#> [1] 23

ggtrace_generic for tracing s3/s4 methods

ggplot2:::ggplot_add.Layer
get("ggplot_add.Layer", envir = asNamespace("ggplot2"))
trace("ggplot_add.Layer", where = asNamespace("ggplot2"))

Some indirect heuristics

:: or ::: present
LHS of ^ passes rlang::is_installed()
$ absent

add ggbody tests with ggforce back in since it's been added to imports now

https://github.com/yjunechoe/ggtrace/blob/d05fddcecadb25ad95be752efe0a37fd52de638d/tests/testthat/test-ggbody.R

step_expr evaluating to NULL are removed or fail to be named

Removed when last element evaluates to NULL

> ggtrace(Stat$compute_layer, c(1, 1), list(hi = quote(1), bye = quote(NULL)))
> boxplot_plot

[Step 1]> 1
[1] 1

[Step 1]> NULL
NULL

> last_ggtrace()
$hi
[1] 1

if NULL is in middle, gets ignored and names are shifted up

> ggtrace(Stat$compute_layer, c(1, 2, 3), list(hi = quote(1), bye = quote(NULL), byebye = quote(2)), verbose = FALSE)
> boxplot_plot
> last_ggtrace()
$hi
[1] 1

$byebye
NULL

[[3]]
[1] 2

Problematic for conditional statements if you only care about the if case and you're returning NULL silently in else

add a warning about making assignments to environments/closures

For the Inject workflow, assigning to self$... while tracing will make modifications to that layer object, for example.

Interacting with self should be reserved for Inspect, like retrieving the name of the geom/stat/position that called the method (ex: ggplot2:::snakeize(class(self$geom)[[1]])) or an (inherited) property (ex: self$stat$retransform)

As an aside, if you really want to modify self$..., you could make a deep copy of the ggproto object (which is essentialy an environment) and give it the same classes. Then make changes to the method/properties of the copy and assign the copy to self

> rlang::env_label(geom_point()$stat)
[1] "0000019D1EC29B68"
> rlang::env_label(geom_text()$stat)
[1] "0000019D1EC29B68"
> identical(geom_point()$stat, geom_text()$stat)
[1] TRUE

> StatIdentity2 <- rlang::env_clone(geom_point()$stat)
> StatIdentity2
<environment: 0x0000019d2609cd90>

> class(geom_point()$stat)
[1] "StatIdentity" "Stat"         "ggproto"      "gg"          
> class(StatIdentity2) <- class(geom_point()$stat)

> StatIdentity2
<ggproto object: Class StatIdentity, Stat, gg>
    aesthetics: function
    compute_group: function
    compute_layer: function
    compute_panel: function
    default_aes: uneval
    extra_params: na.rm
    finish_layer: function
    non_missing_aes: 
    optional_aes: 
    parameters: function
    required_aes: 
    retransform: FALSE
    setup_data: function
    setup_params: function
    super:  <ggproto object: Class Stat, gg>

> StatIdentity
<ggproto object: Class StatIdentity, Stat, gg>
    aesthetics: function
    compute_group: function
    compute_layer: function
    compute_panel: function
    default_aes: uneval
    extra_params: na.rm
    finish_layer: function
    non_missing_aes: 
    optional_aes: 
    parameters: function
    required_aes: 
    retransform: FALSE
    setup_data: function
    setup_params: function
    super:  <ggproto object: Class Stat, gg>

> identical(StatIdentity, StatIdentity2)
[1] FALSE

Evaluate tracer function conditionally

An additional argument ggtrace() which takes an expression that evalutes to TRUE/FALSE.

This expression should just get evaluated at top in an if clause and just cause the tracer function to break if it fails, so as to not change the scope where the rest of the function gets evaluated

Refactor the fallback case when only method is provided

Is it possible to refactor this part of the code?

if (rlang::is_missing(obj)) {
  method_expr <- rlang::enexpr(method)
  split <- eval(rlang::expr(split_ggproto_method(!!method_expr)))
  method <- split[[1]]
  obj <- split[[2]]
}

To be something like this?

if (rlang::is_missing(obj)) {
  split <- some_function(method)
  method <- split[[1]]
  obj <- split[[2]]
}

Or is that too much metaprogramming enexpr-ception? Especially since it needs to wrap around the split_ggproto_method() helper

persistent tracing suppprt

untracing on exit is safe and a good default but it'd be nice to have an option to not untrace on exit.

would benefit from a mechanism like gguntrace(method = , obj = ) and gguntrace_all(), as well as something like ggcurtrace() to keep track.

this perhaps also calls for a bulkier last_ggtrace() if there's gonna be multiple trace dumps happening in same ggproto object/plot.

Minor readme edits

example 3 step 4 should showcase use_names = TRUE by actually using the names from to subset tracedump
example 4 step 4 should use ggplotGrob() to capture the output plot and store it into a variable, and demonstrate ability to render modified plot later

function to rebuild source code from callstack

To make reprex code from experimenting with ggedit().

This is close enough, could be slightly better:

cat(paste0(unlist(lapply(ggbody(StatSmooth$compute_group)[-1], rlang::expr_deparse, width = Inf)), collapse = "\n"))

data <- flip_data(data, flipped_aes)
if (length(unique(data$x)) < 2) {
  return(new_data_frame())
}
if (is.null(data$weight)) data$weight <- 1
if (is.null(xseq)) {
  if (is.integer(data$x)) {
    if (fullrange) {
      xseq <- scales$x$dimension()
    } else {
      xseq <- sort(unique(data$x))
    }
  } else {
    if (fullrange) {
      range <- scales$x$dimension()
    } else {
      range <- range(data$x, na.rm = TRUE)
    }
    xseq <- seq(range[1], range[2], length.out = n)
  }
}
if (identical(method, "loess")) {
  method.args$span <- span
}
if (is.character(method)) {
  if (identical(method, "gam")) {
    method <- mgcv::gam
  } else {
    method <- match.fun(method)
  }
}
if (identical(method, mgcv::gam) && is.null(method.args$method)) {
  method.args$method <- "REML"
}
base.args <- list(quote(formula), data = quote(data), weights = quote(weight))
model <- do.call(method, c(base.args, method.args))
prediction <- predictdf(model, xseq, se, level)
prediction$flipped_aes <- flipped_aes
flip_data(prediction, flipped_aes)

Document behavior of invisible()

invisible(ggplot-object) doesn't trigger trace.

Maybe there should be a formal option to suppress printing anything when trace is triggered? Although idk how comfortable I am with that --- tracing is dangerous and untracing is informative, so I'm sorta fine with messages being forced on people

named list of trace_exprs

Only substitute the name with the expression for unnamed elements.

when trace is triggered, send it as a message

For ggtrace: Triggering trace on PositionJitter$compute_layer

setup testthat

error if length of trace_steps and trace_exprs mismatches

currently just fails silently

option for ggbody to fetch inherited method

ggbody(..., inhert = FALSE) by default
Recurse through class(obj) and trycatch ggbody() until it returns something
Also return the corresponding ggbody() code like ggbody(Stat$compute_layer)

option in ggtrace() to return clean names if step_exprs is a named list

i.e., use the same names for tracedump

should also pass through same check for uniqueness

improve error handling for gguntrace

differentiate between error from not being able to find method and attempting to untrace a method not currently being traced

refactor the check for a method already being traced out of gguntrace

might want to use this for tests

~step keyword only substituted when it's by itself

Need new issue for the actual code part of it so that only the ~step keyword by itself is targeted for substitution

Originally posted by @yjunechoe in #11 (comment)

make trace_exprs argument optional

If only trace_steps is provided, just run the specified steps

test for incomplete trace

#44

function to translate ggtrace() code to base::trace() code

Might be useful for a final step of making a reprex once you've isolated the problem

Related to #8

More robust test of the ~line keyword

Should it be evaluated differently if the line involves assignment?

Does it break when you try to do assignment? (trace_exprs = { temp <- ~line })

A generalized trace function with all features of ggtrace

possible names:

anytrace()

Improved error handling for inherited methods

should be another internal checking function called on the enexpr (same logic as inherit = TRUE but break and suggest instead of fetching)

(re)move the obj argument

No where in the docs so we ever use the form ggtrace(method = , obj = ) where both are specified. I'm always just showing ggtrace(method = , trace_steps = , trace_exprs = )

Is it cumbersome that obj gets in the way between the trace_* arguments? ggtrace() is the only function that has more than method and obj as args., so it won't break much if I move obj to the end, maybe?

ggtrace(method, trace_steps, trace_exprs, obj, once, .print)?

This would allow really short code like ggtrace(StatBoxplot$compute_group, 2:4), which would just run steps 2-4 (requires #13 )

Makes more sense to me from convenience pov so should decide on this ASAP

write aes_eval usage vignette

best place to showcase ggtrace imo

use ggplot2 examples - https://ggplot2.tidyverse.org/reference/aes_eval.html

	p <- ggplot(data.frame(value = 16)) +
	geom_point(aes(stage(value, after_stat = x), 0), colour = "black", size = 10) +
	geom_point(aes(value, 0), colour = "red", size = 10) +
	scale_x_sqrt(limits = c(0, 16), breaks = c(0, 4, 16))