jimjam-slam / collateral Goto Github PK

View Code? Open in Web Editor NEW

40.0 2.0 2.0 1.43 MB

Map, find and isolate captured side effects

Home Page: https://collateral.jamesgoldie.dev

License: Other

R 95.00% CSS 5.00%

r tidyverse purrr side-effects

collateral's People

Contributors

Stargazers

Watchers

Forkers

rmsharp jeffzi

collateral's Issues

Helpers for extracting side effects

Extracting the results from a collateral list-column isn't too hard: map(my_safe_column, 'result').

Other side effects are a little trickier, though: the messages from errors are wrapped inside an object along with other text, and there can be several warnings, messages or other outputs for each element. Should collateral provide helpers to quickly provide companion columns with collapsed versions of these?

Knitted output

At the moment, it's possible to get collateral to print in knitted documents by using the native terminal output. But paged HTML tables of tibbles simply show the list-column class name. I'd like to fix this!

EDIT: I'm going to refocus this issue on coloured terminal output for knitr/rmarkdown for now and return to HTML table output later!

`pmap` variants not handling `.l` properly

It seems like the map variants are mishandling .l somehow. The following error suggests that purrr::pmap() is trying to iterate over .l, rather than iterate down the elements of .l in parallel:

Column newcol must be length 32 (the number of rows) or one, not 3

library(tidyverse)
library(collateral)

test <- mtcars %>% as_tibble()
test
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows

# native pmap works fine
test %>% mutate(
  newcol = pmap(
    select(., vs, am, gear),
    function(vs, am , gear) { vs + am + gear } ))
#> # A tibble: 32 x 12
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb newcol
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 <dbl ~
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 <dbl ~
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 <dbl ~
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 <dbl ~
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 <dbl ~
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 <dbl ~
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 <dbl ~
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 <dbl ~
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 <dbl ~
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 <dbl ~
#> # ... with 22 more rows
test %>% mutate(
  newcol = pmap_dbl(
    select(., vs, am, gear),
    function(vs, am , gear) { vs + am + gear } ))
#> # A tibble: 32 x 12
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb newcol
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4      5
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4      5
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1      6
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1      4
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2      3
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1      4
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4      3
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2      5
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2      5
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4      5
#> # ... with 22 more rows

# collateral variants do not
test %>% mutate(
  newcol = pmap_safely(
    select(., vs, am, gear),
    function(vs, am , gear) { vs + am + gear } ))
#> Error: Column `newcol` must be length 32 (the number of rows) or one, not 3
test %>% mutate(
  newcol = pmap_quietly(
    select(., vs, am, gear),
    function(vs, am , gear) { vs + am + gear } ))
#> Error in .f(...): argument "am" is missing, with no default
test %>% mutate(
  newcol = pmap_peacefully(
    select(., vs, am, gear),
    function(vs, am , gear) { vs + am + gear } ))
#> Error: Column `newcol` must be length 32 (the number of rows) or one, not 3

^{Created on 2019-07-08 by the reprex package (v0.3.0)}

Coloured output in README not being converted to HTML

Related to issue #2! Probably need to add fansi to the README/home page.

Integration with furrr

Not sure if this is possible (I haven't investigated it yet), but it would be interesting to see if collateral can supply safe/quiet mappers that work with multi-core or distributed environments via furrr.

Column classes don't survive dplyr::bind_rows (or joins?)

Some workflows I'm testing that involve using dplyr::bind_rows() on data frames that have collateral columns cause them to lose their formatting:

library(tidyverse)
library(collateral)

test =
  # tidy up and trim down for the example
  mtcars %>%
  rownames_to_column(var = "car") %>%
  as_tibble() %>%
  select(car, cyl, disp, wt) %>%
  # spike some rows in cyl == 4 to make them fail
  mutate(wt = dplyr::case_when(
    wt < 2 ~ -wt,
    TRUE ~ wt)) %>%
  # nest and do some operations quietly()
  nest(-cyl) %>%
  mutate(qlog = map_quietly(data, ~ log(.$wt)))

# now slice them up...
test1 = test %>% slice(1)
test2 = test %>% slice(2)
test3 = test %>% slice(3)

# ... and combine them. problem!
test_rejoined = bind_rows(test1, test2, test3)
#> Warning in bind_rows_(x, .id): Vectorizing 'quietly_mapped' elements may
#> not preserve their attributes

#> Warning in bind_rows_(x, .id): Vectorizing 'quietly_mapped' elements may
#> not preserve their attributes

#> Warning in bind_rows_(x, .id): Vectorizing 'quietly_mapped' elements may
#> not preserve their attributes
test_rejoined
#> # A tibble: 3 x 3
#>     cyl data              qlog      
#>   <dbl> <list>            <list>    
#> 1     6 <tibble [7 x 3]>  <list [4]>
#> 2     4 <tibble [11 x 3]> <list [4]>
#> 3     8 <tibble [14 x 3]> <list [4]>
class(test_rejoined$qlog)
#> [1] "list"

^{Created on 2019-07-09 by the reprex package (v0.3.0)}

This took me by surprise a little, as my understanding is that classes are supposed to be exempt from attributes getting wiped when a vector is modified. So I'm not 100% sure what's happening, but it may be that I need to handle collateral lists getting concatenated or subsetted a little more explicitly.

R/format.r has some extra S3 methods that I mostly added because I was trying to follow along with the pillar documentation, but I wonder if—for example—these two might deserve some more attention:

#' @rdname collateral_extras
#' @export
c.safely_mapped <- function(x, ...) {
  as_safely_mapped(NextMethod())
}

#' @rdname collateral_extras
#' @export
`[.safely_mapped` <- function(x, i) {
  as_safely_mapped(NextMethod())
}

Originally posted by @Rensa in #9 (comment)

Add safely() arguments

purrr::safely() has two othger arguments: otherwise, which replaces instances of result that would otherwise be NULL with the supplied value, and quiet (default TRUE), which supresses the default output of errors. These should definitely be passed along by map_safely() and map_quietly()!

collateral not on cran anymore?

I just noticed collateral is not on cran anymore. I am also facing issues using has_errors, such as: rror: Input must be a vector, not a safely_mapped object. Are these issues linked? Maybe the latter is linked to the developments in the vctrs package? What is the status anyway of the package?

Thanks!

feature request: handle errors and warnings together peacfully

Using safely or quietly, one faces a weird trade-off: do I want to allow for errors or warnings? This raises two issues:

One uses safely and has warnings: cannot catch them
One uses quietly and has an error: function fails

Ideally, on could just run one single function let's call its peacefully , that handles all of these ,i.e. return always slots like quietly plus error. I don't think this is available currently? Not sure whether this should be implemented at the purrr or collateral package though? Maybe start within collateral, and safely push to purrr later one? ;-)

Add example datasets

Although the current docs use mtcars as an example dataset, they have to manually alter some rows in order to get things to fail, which makes the things somewhat unclear. I think it would be better to provide an example dataset or two that I can include with the package and jump into straight away.

Refresh and solicit feedback on vignettes

As well as scanning the docs to ensure they're still current for 0.5.1, I need to rewrite the vignettes and see how readable they are. The current one could use some work, and I'd like to split up vignette topics (or sections) around these ideas:

Intro to list columns and associated workflows
Collateral basics:
- What am I looking at?
- Extracting side effects
More advanced workflows
- Filtering on and summarising side effects
- Successive operations with collateral

I'd really like to work out coloured output first, but I probably shouldn't make it a must.

Of course, if someone wants to take a swing at this, I'd be happy to accept a PR!

Integration with reticulate

I've tested collateral by mapping R functions that in turn call python functions, and that's fine to some extent (I haven't tested mapping on a python function directly—I suspect I would need to modify the source to make it work, but I could be wrong).

But side effect management leaves something to be desired at the moment:

Python output streams aren't captured at all unless the python function call is wrapped in reticulate::py_capture_output().
Output streams captured in the above way come out as one long stream—that is to say, individual warnings or outputs aren't separated into a character vector.
Although reticulate::py_capture_output() allows the user to specify either stdout or stderr (or both), types of side effects aren't otherwise separated. In fact, it's not entirely clear to me yet (as I learn more python) whether python's and R's side effects map to each other 1:1 at all.

Ideally collateral would get to a place where python functions "just work" in a predictable and fairly reasonable way.

how to combine `quietly_mapped` and `safely_mapped` outputs?

Is there a possibility to combine output from quietly_mapped and safely_mapped ? Maybe a c() function would be useful?

Currently, doing a rbind() on 2 data-frames containing a *_mapped column will lead to an inconsistency: the error/warning mark of the second data_frame will disappear?

library(collateral)
library(tidyverse)

df_warn <- data_frame(value = list(-1, 0, 1)) %>% 
  mutate(output = map_quietly(value, log ))
         
df_err <- data_frame(value = list("1", 4)) %>% 
  mutate(output = map_safely(value, log ))

df_err
#> # A tibble: 2 x 2
#>   value     output  
#>   <list>    <collat>
#> 1 <chr [1]> _ E     
#> 2 <dbl [1]> R _
df_warn
#> # A tibble: 3 x 2
#>   value     output  
#>   <list>    <collat>
#> 1 <dbl [1]> R O _ W 
#> 2 <dbl [1]> R O _ _ 
#> 3 <dbl [1]> R O _ _

# here the W mark disappears:
rbind(df_err, 
      df_warn)
#> # A tibble: 5 x 2
#>   value     output  
#>   <list>    <collat>
#> 1 <chr [1]> _ E     
#> 2 <dbl [1]> R _     
#> 3 <dbl [1]> R _     
#> 4 <dbl [1]> R _     
#> 5 <dbl [1]> R _

# reverse, here the E mark disappears:
rbind(df_warn, 
      df_err)
#> # A tibble: 5 x 2
#>   value     output  
#>   <list>    <collat>
#> 1 <dbl [1]> R O _ W 
#> 2 <dbl [1]> R O _ _ 
#> 3 <dbl [1]> R O _ _ 
#> 4 <chr [1]> _ _ _ _ 
#> 5 <dbl [1]> R _ _ _

^{Created on 2018-11-25 by the reprex package (v0.2.1)}

Exact the "output" as a new column from a `collateral` class column [question]

Hello,

I have created a function that tries to clean up text in (dd/mm/yyyy) to dates. I have included a test tibble test_data some bad inputs as well.

library("dplyr")
#> Warning: package 'dplyr' was built under R version 4.2.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library("magrittr")
#> Warning: package 'magrittr' was built under R version 4.2.2
library("purrr")
#> Warning: package 'purrr' was built under R version 4.2.2
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
library("collateral")
#> Warning: package 'collateral' was built under R version 4.2.2
library("lubridate")
#> Warning: package 'lubridate' was built under R version 4.2.2
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

convert_dmy_text_to_date <- function(input) {
  if(length(class(input)) == 1) {
    if(class(input) == "character") {
      return(as.Date.character(lubridate::dmy(input)))
    } else if(class(input) == "logical") {
      return(NA)
    }
  }
  return(as.Date.character(lubridate::ymd(input)))
}

test_data <- tibble::tibble(
  test_date = list(
    "25/10/2022",
    "620/12/2022",
    as.POSIXct(x= "2022-10-07", tz = "UTC"),
    NA,
    0,
    "28/22/2022"
  )
)

#test <- hk_data$`Date of scan`
test <- test_data %>%
  dplyr::mutate(
    converted_date_log = collateral::map_peacefully(
      .x = .data[["test_date"]],
      .f = convert_dmy_text_to_date
    )
  )

test
#> # A tibble: 6 × 2
#>   test_date  converted_date_log
#>   <list>     <collat>          
#> 1 <chr [1]>  R _ _ _ _         
#> 2 <chr [1]>  R _ _ W _         
#> 3 <dttm [1]> R _ _ _ _         
#> 4 <lgl [1]>  R _ _ _ _         
#> 5 <dbl [1]>  R _ _ W _         
#> 6 <chr [1]>  R _ _ W _

^{Created on 2023-03-13 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2023-03-13
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  collateral  * 0.5.2   2021-10-25 [1] CRAN (R 4.2.2)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr       * 1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
#>  fs            1.6.1   2023-02-06 [1] CRAN (R 4.2.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
#>  lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.2.2)
#>  magrittr    * 2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr       * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
#>  styler        1.9.1   2023-03-04 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
#>  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.2.2)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2   2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.37    2023-01-31 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
#> 
#>  [1] D:/Jeremy/PortableR/RPortableLibraries/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I have managed to create a column called converted_date_log to see the collateral output.
May I ask if there is a way to "unnest" to the output component of converted_date_log?

For example

test <- test_data %>%
  dplyr::mutate(
    converted_date = collateral::extract_output(.data[["converted_date_log"]])
  )

to give

I understand that it is possible to do it this way

test <- test_data %>%
  dplyr::mutate(
    converted_date = purrr::map_vec(
      .x = .data[["test_date"]],
      .f = convert_dmy_text_to_date
    ),
    converted_date_log = collateral::map_peacefully(
      .x = .data[["test_date"]],
      .f = convert_dmy_text_to_date
    )
  )

but it felt like I am using purrr twice.