Giter Club home page Giter Club logo

progressr's Introduction

CRAN check status R CMD check status Top reverse-dependency checks status Coverage Status Life cycle: maturing

progressr: An Inclusive, Unifying API for Progress Updates

The progressr package provides a minimal API for reporting progress updates in R. The design is to separate the representation of progress updates from how they are presented. What type of progress to signal is controlled by the developer. How these progress updates are rendered is controlled by the end user. For instance, some users may prefer visual feedback such as a horizontal progress bar in the terminal, whereas others may prefer auditory feedback.

Three strokes writing three in Chinese

Design motto:

The developer is responsible for providing progress updates but it's only the end user who decides if, when, and how progress should be presented. No exceptions will be allowed.

Two Minimal APIs - One For Developers and One For End-Users

Developer's API

1. Set up a progressor with a certain number of steps:

p <- progressor(nsteps)
p <- progressor(along = x)

2. Signal progress:

p()               # one-step progress
p(amount = 0)     # "still alive"
p("loading ...")  # pass on a message
    
End-user's API

1a. Subscribe to progress updates from everywhere:

handlers(global = TRUE)

y <- slow_sum(1:5)
y <- slow_sum(6:10)

1b. Subscribe to a specific expression:

with_progress({
  y <- slow_sum(1:5)
  y <- slow_sum(6:10)
})

2. Configure how progress is presented:

handlers("progress")
handlers("txtprogressbar", "beepr")
handlers(handler_pbcol(enable_after = 3.0))
handlers(handler_progress(complete = "#"))

A simple example

Assume that we have a function slow_sum() for adding up the values in a vector. It is so slow, that we like to provide progress updates to whoever might be interested in it. With the progressr package, this can be done as:

slow_sum <- function(x) {
  p <- progressr::progressor(along = x)
  sum <- 0
  for (kk in seq_along(x)) {
    Sys.sleep(0.1)
    sum <- sum + x[kk]
    p(message = sprintf("Adding %g", x[kk]))
  }
  sum
}

Note how there are no arguments in the code that specifies how progress is presented. The only task for the developer is to decide on where in the code it makes sense to signal that progress has been made. As we will see next, it is up to the end user of this code to decide whether they want to receive progress updates or not, and, if so, in what format.

Without reporting on progress

When calling this function as in:

> y <- slow_sum(1:10)
> y
[1] 55
>

it will behave as any function and there will be no progress updates displayed.

Reporting on progress

If we are only interested in progress for a particular call, we can do:

> library(progressr)
> with_progress(y <- slow_sum(1:10))
  |====================                               |  40%

However, if we want to report on progress from every call, wrapping the calls in with_progress() might become too cumbersome. If so, we can enable the global progress handler:

> library(progressr)
> handlers(global = TRUE)

so that progress updates are reported on wherever signaled, e.g.

> y <- slow_sum(1:10)
  |====================                               |  40%
> y <- slow_sum(10:1)
  |========================================           |  80%

This requires R 4.0.0 or newer. To disable this again, do:

> handlers(global = FALSE)

In the below examples, we will assume handlers(global = TRUE) is already set.

Customizing how progress is reported

Terminal-based progress bars

The default is to present progress via utils::txtProgressBar(), which is available on all R installations. It presents itself as an ASCII-based horizontal progress bar in the R terminal. This is rendered as:

SVG animation of the default "txtprogressbar" progress handler

We can tweak this "txtprogressbar" handler to use red hearts for the bar, e.g.

handlers(handler_txtprogressbar(char = cli::col_red(cli::symbol$heart)))

which results in:

SVG animation of the "txtprogressbar" progress handler with red hearts

Another example is:

handlers(handler_pbcol(
      adjust = 1.0,
    complete = function(s) cli::bg_red(cli::col_black(s)),
  incomplete = function(s) cli::bg_cyan(cli::col_black(s))
))

which results in:

SVG animation of the "pbcol" progress handler with text aligned to the right

To change the default, to, say, cli_progress_bar() by the cli package, set:

handlers("cli")

This progress handler will present itself as:

SVG animation of the default "cli" progress handler

To instead use progress_bar() by the progress package, set:

handlers("progress")

This progress handler will present itself as:

SVG animation of the default "progress" progress handler

To set the default progress handler, or handlers, in all your R sessions, call progressr::handlers(...) in your ~/.Rprofile startup file.

Auditory progress updates

Progress updates do not have to be presented visually. They can equally well be communicated via audio. For example, using:

handlers("beepr")

will present itself as sounds played at the beginning, while progressing, and at the end (using different beepr sounds). There will be no output written to the terminal;

> y <- slow_sum(1:10)
> y
[1] 55
>

Concurrent auditory and visual progress updates

It is possible to have multiple progress handlers presenting progress updates at the same time. For example, to get both visual and auditory updates, use:

handlers("txtprogressbar", "beepr")

Silence all progress

To silence all progress updates, use:

handlers("void")

Further configuration of progress handlers

Above we have seen examples where the handlers() takes one or more strings as input, e.g. handlers(c("progress", "beepr")). This is short for a more flexible specification where we can pass a list of handler functions, e.g.

handlers(list(
  handler_progress(),
  handler_beepr()
))

With this construct, we can make adjustments to the default behavior of these progress handlers. For example, we can configure the format, width, and complete arguments of progress::progress_bar$new(), and tell beepr to use a different finish sound and generate sounds at most every two seconds by setting:

handlers(list(
  handler_progress(
    format   = ":spin :current/:total (:message) [:bar] :percent in :elapsed ETA: :eta",
    width    = 60,
    complete = "+"
  ),
  handler_beepr(
    finish   = "wilhelm",
    interval = 2.0
  )
))

Sticky messages

As seen above, some progress handlers present the progress message as part of its output, e.g. the "progress" handler will display the message as part of the progress bar. It is also possible to "push" the message up together with other terminal output. This can be done by adding class attribute "sticky" to the progression signaled. This works for several progress handlers that output to the terminal. For example, with:

slow_sum <- function(x) {
  p <- progressr::progressor(along = x)
  sum <- 0
  for (kk in seq_along(x)) {
    Sys.sleep(0.1)
    sum <- sum + x[kk]
    p(sprintf("Step %d", kk), class = if (kk %% 5 == 0) "sticky", amount = 0)
    p(message = sprintf("Adding %g", x[kk]))
  }
  sum
}

we get

> handlers("txtprogressbar")
> y <- slow_sum(1:30)
Step 5
Step 10
  |====================                               |  43%

and

> handlers("progress")
> y <- slow_sum(1:30)
Step 5
Step 10
/ [===============>-------------------------]  43% Adding 13

Use regular output as usual alongside progress updates

In contrast to other progress-bar frameworks, output from message(), cat(), print() and so on, will not interfere with progress reported via progressr. For example, say we have:

slow_sqrt <- function(xs) {
  p <- progressor(along = xs)
  lapply(xs, function(x) {
    message("Calculating the square root of ", x)
    Sys.sleep(2)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

we will get:

> library(progressr)
> handlers(global = TRUE)
> handlers("progress")
> y <- slow_sqrt(1:8)
Calculating the square root of 1
Calculating the square root of 2
- [===========>-----------------------------------]  25% x=2

This works because progressr will briefly buffer any output internally and only release it when the next progress update is received just before the progress is re-rendered in the terminal. This is why you see a two second delay when running the above example. Note that, if we use progress handlers that do not output to the terminal, such as handlers("beepr"), then output does not have to be buffered and will appear immediately.

Comment: When signaling a warning using warning(msg, immediate. = TRUE) the message is immediately outputted to the standard-error stream. However, this is not possible to emulate when warnings are intercepted using calling handlers, which are used by with_progress(). This is a limitation of R that cannot be worked around. Because of this, the above call will behave the same as warning(msg) - that is, all warnings will be buffered by R internally and released only when all computations are done.

Support for progressr elsewhere

Note that progression updates by progressr is designed to work out of the box for any iterator framework in R. Below is an set of examples for the most common ones.

Base R Apply Functions

library(progressr)
handlers(global = TRUE)

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  lapply(xs, function(x) {
    Sys.sleep(0.1)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

my_fcn(1:5)
#  |====================                               |  40%

The foreach package

library(foreach)
library(progressr)
handlers(global = TRUE)

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  foreach(x = xs) %do% {
    Sys.sleep(0.1)
    p(sprintf("x=%g", x))
    sqrt(x)
  }
}

my_fcn(1:5)
#  |====================                               |  40%

The purrr package

library(purrr)
library(progressr)
handlers(global = TRUE)

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  map(xs, function(x) {
    Sys.sleep(0.1)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

my_fcn(1:5)
#  |====================                               |  40%

The plyr package

library(plyr)
library(progressr)
handlers(global = TRUE)

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  llply(xs, function(x, ...) {
    Sys.sleep(0.1)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

my_fcn(1:5)
#  |====================                               |  40%

Note how this solution does not make use of plyr's .progress argument, because the above solution is more powerful and more flexible, e.g. we have more control on progress updates and their messages. However, if you prefer the traditional plyr approach, you can use .progress = "progressr", e.g. y <- llply(..., .progress = "progressr").

The knitr package

When compiling ("knitting") an knitr-based vignette, for instance, via knitr::knit(), knitr shows the progress of code chunks processed thus far using a progress bar. In knitr (>= 1.42) [to be released as of 2022-12-12], we can use progressr for this progress reporting. To do this, set R option knitr.progress.fun as:

options(knitr.progress.fun = function(total, labels) {
  p <- progressr::progressor(total, on_exit = FALSE)
  list(
    update = function(i) p(sprintf("chunk: %s", labels[i])),
    done = function() p(type = "finish")
  )
})

This configures knitr to signal progress via the progressr framework. To report on these, use:

progressr::handlers(global = TRUE)

Replace any cli progress bars with progressr updates

The cli package is used for progress reporting by many several packages, notably tidyverse packages. For instance, in purrr, you can do:

y <- purrr::map(1:100, \(x) Sys.sleep(0.1), .progress = TRUE)

to report on progress via the cli package as map() is iterating over the elements. Now, instead of using the default, built-in cli progress bar, we can customize cli to report on progress via progressr instead. To do this, set R option cli.progress_handlers as:

options(cli.progress_handlers = "progressr")

With this option set, cli will now report on progress according to your progressr::handlers() settings. For example, with:

progressr::handlers(c("beepr", "rstudio"))

will report on progress using beepr and the RStudio Console progress panel.

To make cli report via progressr in all your R session, set the above R option in your ~/.Rprofile startup file.

Note: A cli progress bar can have a "name", which can be specfied in purrr function via argument .progress, e.g. .progress = "processing". This name is then displayed in front of the progress bar. However, because the progressr framework does not have a concept of progress "name", they are silently ignored when using options(cli.progress_handlers = "progressr").

Parallel processing and progress updates

The future framework, which provides a unified API for parallel and distributed processing in R, has built-in support for the kind of progression updates produced by the progressr package. This means that you can use it with for instance future.apply, furrr, and foreach with doFuture, and plyr or BiocParallel with doFuture. In contrast, non-future parallelization methods such as parallel's mclapply() and, parallel::parLapply(), and foreach adapters like doParallel do not support progress reports via progressr.

future_lapply() - parallel lapply()

Here is an example that uses future_lapply() of the future.apply package to parallelize on the local machine while at the same time signaling progression updates:

library(future.apply)
plan(multisession)

library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  future_lapply(xs, function(x, ...) {
    Sys.sleep(6.0-x)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

my_fcn(1:5)
# / [================>-----------------------------]  40% x=2

foreach() with doFuture

Here is an example that uses foreach() of the foreach package together with %dofuture% of the doFuture package to parallelize while reporting on progress. This example parallelizes on the local machine, it works alsof for remote machines:

library(doFuture)    ## %dofuture%
plan(multisession)

library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  foreach(x = xs) %dofuture% {
    Sys.sleep(6.0-x)
    p(sprintf("x=%g", x))
    sqrt(x)
  }
}

my_fcn(1:5)
# / [================>-----------------------------]  40% x=2

For existing code using the traditional %dopar% operators of the foreach package, we can register the doFuture adaptor and use the same progressr as above to progress updates;

library(doFuture)
registerDoFuture()      ## %dopar% parallelizes via future
plan(multisession)

library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  foreach(x = xs) %dopar% {
    Sys.sleep(6.0-x)
    p(sprintf("x=%g", x))
    sqrt(x)
  }
}

my_fcn(1:5)
# / [================>-----------------------------]  40% x=2

future_map() - parallel purrr::map()

Here is an example that uses future_map() of the furrr package to parallelize on the local machine while at the same time signaling progression updates:

library(furrr)
plan(multisession)

library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  future_map(xs, function(x) {
    Sys.sleep(6.0-x)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

my_fcn(1:5)
# / [================>-----------------------------]  40% x=2

Note: This solution does not involved the .progress = TRUE argument that furrr implements. Because progressr is more generic and because .progress = TRUE only supports certain future backends and produces errors on non-supported backends, I recommended to stop using .progress = TRUE and use the progressr package instead.

BiocParallel::bplapply() - parallel lapply()

Here is an example that uses bplapply() of the BiocParallel package to parallelize on the local machine while at the same time signaling progression updates:

library(BiocParallel)
library(doFuture)
register(DoparParam())  ## BiocParallel parallelizes via %dopar%
registerDoFuture()      ## %dopar% parallelizes via future
plan(multisession)

library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  bplapply(xs, function(x) {
    Sys.sleep(6.0-x)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

my_fcn(1:5)
# / [================>-----------------------------]  40% x=2

plyr::llply(..., .parallel = TRUE) with doFuture

Here is an example that uses llply() of the plyr package to parallelize on the local machine while at the same time signaling progression updates:

library(plyr)
library(doFuture)
registerDoFuture()      ## %dopar% parallelizes via future
plan(multisession)

library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")

my_fcn <- function(xs) {
  p <- progressor(along = xs)
  llply(xs, function(x, ...) {
    Sys.sleep(6.0-x)
    p(sprintf("x=%g", x))
    sqrt(x)
  }, .parallel = TRUE)
}

my_fcn(1:5)
# / [================>-----------------------------]  40% x=2

Note: As an alternative to the above, recommended approach, one can use .progress = "progressr" together with .parallel = TRUE. This requires plyr (>= 1.8.7).

Near-live versus buffered progress updates with futures

As of November 2020, there are four types of future backends that are known(*) to provide near-live progress updates:

  1. sequential,
  2. multicore,
  3. multisession, and
  4. cluster (local and remote)

Here "near-live" means that the progress handlers will report on progress almost immediately when the progress is signaled on the worker. For all other future backends, the progress updates are only relayed back to the main machine and reported together with the results of the futures. For instance, if future_lapply(X, FUN) chunks up the processing of, say, 100 elements in X into eight futures, we will see progress from each of the 100 elements as they are done when using a future backend supporting "near-live" updates, whereas we will only see those updated to be flushed eight times when using any other types of future backends.

(*) Other future backends may gain support for "near-live" progress updating later. Adding support for those is independent of the progressr package. Feature requests for adding that support should go to those future-backend packages.

Note of caution - sending progress updates too frequently

Signaling progress updates comes with some overhead. In situation where we use progress updates, this overhead is typically much smaller than the task we are processing in each step. However, if the task we iterate over is quick, then the extra time induced by the progress updates might end up dominating the overall processing time. If that is the case, a simple solution is to only signal progress updates every n:th step. Here is a version of slow_sum() that signals progress every 10:th iteration:

slow_sum <- function(x) {
  p <- progressr::progressor(length(x) / 10)
  sum <- 0
  for (kk in seq_along(x)) {
    Sys.sleep(0.1)
    sum <- sum + x[kk]
    if (kk %% 10 == 0) p(message = sprintf("Adding %g", x[kk]))
  }
  sum
}

The overhead of progress signaling may depend on context. For example, in parallel processing with near-live progress updates via 'multisession' futures, each progress update is communicated via a socket connections back to the main R session. These connections might become clogged up if progress updates are too frequent.

Progress updates in non-interactive mode ("batch mode")

When running R from the command line, R runs in a non-interactive mode (interactive() returns FALSE). The default behavior of progressr is to not report on progress in non-interactive mode. To reported on progress also then, set R options progressr.enable or environment variable R_PROGRESSR_ENABLE to TRUE. For example,

$ Rscript -e "library(progressr)" -e "with_progress(y <- slow_sum(1:10))"

will not report on progress, whereas

$ export R_PROGRESSR_ENABLE=TRUE
$ Rscript -e "library(progressr)" -e "with_progress(y <- slow_sum(1:10))"

will.

Roadmap

Because this project is under active development, the progressr API is currently kept at a very minimum. This will allow for the framework and the API to evolve while minimizing the risk for breaking code that depends on it. The roadmap for developing the API is roughly:

  • Provide minimal API for producing progress updates, i.e. progressor(), with_progress(), handlers()

  • Add support for global progress handlers removing the need for the user having to specify with_progress(), i.e. handlers(global = TRUE) and handlers(global = FALSE)

  • Make it possible to create a progressor also in the global environment (see 'Known issues' below)

  • Add support for nested progress updates

  • Add API to allow users and package developers to design additional progression handlers

For a more up-to-date view on what features might be added, see https://github.com/HenrikBengtsson/progressr/issues.

Appendix

Known issues

A progressor cannot be created in the global environment

It is not possible to create a progressor in the global environment, e.g. in the the top-level of a script. It has to be created inside a function, within with_progress({ ... }), local({ ... }), or a similar construct. For example, the following:

library(progressr)
handlers(global = TRUE)

xs <- 1:5
p <- progressor(along = xs)
y <- lapply(xs, function(x) {
  Sys.sleep(0.1)
  p(sprintf("x=%g", x))
  sqrt(x)
})

results in an error if tried:

Error in progressor(along = xs) : 
  A progressor must not be created in the global environment unless wrapped in a
  with_progress() or without_progress() call. Alternatively, create it inside a
  function or in a local() environment to make sure there is a finite life span
  of the progressor

The solution is to wrap it in a local({ ... }) call, or more explicitly, in a with_progress({ ... }) call:

library(progressr)
handlers(global = TRUE)

xs <- 1:5
with_progress({
  p <- progressor(along = xs)
  y <- lapply(xs, function(x) {
    Sys.sleep(0.1)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
})
#  |====================                               |  40%

The main reason for this is to limit the life span of each progressor. If we created it in the global environment, there is a significant risk it would never finish and block all of the following progressors.

The global progress handler cannot be set everywhere

It is not possible to call handlers(global = TRUE) in all circumstances. For example, it cannot be called within tryCatch() and withCallingHandlers();

> tryCatch(handlers(global = TRUE), error = identity)
Error in globalCallingHandlers(NULL) : 
  should not be called with handlers on the stack

This is not a bug - neither in progressr nor in R itself. It's due to a conservative design on how global calling handlers should work in R. If it allowed, there's a risk we might end up getting weird and unpredictable behaviors when messages, warnings, errors, and other types of conditions are signaled.

Because tryCatch() and withCallingHandlers() is used in many places throughout base R, this means that we also cannot call handlers(global = TRUE) as part of a package's startup process, e.g. .onLoad() or .onAttach().

Another example of this error is if handlers(global = TRUE) is used inside package vignettes and dynamic documents such as Rmarkdown. In such cases, the global progress handler has to be enabled prior to processing the document, e.g.

> progressr::handlers(global = TRUE)
> rmarkdown::render("input.Rmd")

Under the hood

When using the progressr package, progression updates are communicated via R's condition framework, which provides methods for creating, signaling, capturing, muffling, and relaying conditions. Progression updates are of classes progression and immediateCondition(*). The below figure gives an example how progression conditions are created, signaled, and rendered.

(*) The immediateCondition class of conditions are relayed as soon as possible by the future framework, which means that progression updates produced in parallel workers are reported to the end user as soon as the main R session have received them.

Figure: Sequence diagram illustrating how signaled progression conditions are captured by with_progress(), or the global progression handler, and relayed to the two progression handlers 'progress' (a progress bar in the terminal) and 'beepr' (auditory) that the end user has chosen.

Debugging

To debug progress updates, use:

> handlers("debug")
> with_progress(y <- slow_sum(1:3))
[23:19:52.738] (0.000s => +0.002s) initiate: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:52.739] (0.001s => +0.000s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:52.942] (0.203s => +0.002s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.145] (0.407s => +0.001s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.348] (0.610s => +0.002s) update: 1/3 (+1) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
M: Adding value 1
[23:19:53.555] (0.817s => +0.004s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.758] (1.020s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.961] (1.223s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.165] (1.426s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.368] (1.630s => +0.001s) update: 2/3 (+1) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
M: Adding value 2
[23:19:54.574] (1.835s => +0.003s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.777] (2.039s => +0.001s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.980] (2.242s => +0.001s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.183] (2.445s => +0.001s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.387] (2.649s => +0.001s) update: 3/3 (+1) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.388] (2.650s => +0.003s) update: 3/3 (+0) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=}
M: Adding value 3
[23:19:55.795] (3.057s => +0.000s) shutdown: 3/3 (+0) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=ok}

Installation

R package progressr is available on CRAN and can be installed in R as:

install.packages("progressr")

Pre-release version

To install the pre-release version that is available in Git branch develop on GitHub, use:

remotes::install_github("HenrikBengtsson/progressr", ref="develop")

This will install the package from source.

Contributing

To contribute to this package, please see CONTRIBUTING.md.

progressr's People

Contributors

askpascal avatar bisaloo avatar davisvaughan avatar henrikbengtsson avatar jan-glx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

progressr's Issues

Origin info

Add info on origin, e.g.

  • unique progress id,
  • unique session id,
  • pid,
  • hostname,
  • user name, ...

CAUTION: What will happen with with_progress() when there is forked processing inside

> handler <- function(c) {
    conds <<- c(conds, c)
    str(list(pid = Sys.getpid(), msg = conditionMessage(c)))
  }

> conds <- NULL
> withCallingHandlers(y <- lapply(1:2, message), message=handler)
List of 2
 $ pid: int 15267
 $ msg: chr "1\n"
1
List of 2
 $ pid: int 15267
 $ msg: chr "2\n"
2
> str(conds)
List of 4
 $ message: chr "1\n"
 $ call   : language FUN(X[[i]], ...)
 $ message: chr "2\n"
$ call   : language FUN(X[[i]], ...)


> library(parallel)
> options(mc.cores = 2)
> conds <- NULL
> withCallingHandlers(y <- mclapply(1:2, message), message=handler)
List of 4
 $ message: chr "1\n"
 $ call   : language FUN(X[[i]], ...)
 $ message: chr "2\n"
 $ call   : language FUN(X[[i]], ...)List of 2
 $ pid: int 16522
 $ msg:List of 2
 $ pid: chr "1\n"
1
 int 16523
 $ msg: chr "2\n"
2

> str(conds)
NULL


> cl <- makeCluster(2)
> conds <- NULL
> withCallingHandlers(y <- parLapply(cl, 1:2, message), message=handler)
> str(conds)
NULL

Add argument and option to control update frequency

options(progressr.times = 10L) ## 10 updates per progress sequence
options(progressr.times = 1L) ## 1 update per progress sequence, i.e. when it finishes
options(progressr.times = 2L) ## 2 updates per progress sequence, i.e. when it starts and finishes
options(progressr.frequency = 1.0) ## 100% all updates
options(progressr.frequency = 0.1) ## 10% of the updates
options(progressr.interval = 5) ## At most one update per 5 seconds

add_progress({ ... ])

Example:

with_progress(add_progress({
  x <- 1:100
  y <- slow_sum(x)
  z <- slow_sum(x^2)
  mu <- mean(z - y^2)
}), clear = FALSE)

could produce progress output that looks something like:

[==>---------]  25%: x <- 1:100         
[=====>------]  50%: y <- slow_sum(x)   
[========>---]  75%: z <- slow_sum(x^2) 
[============] 100%: mu <- mean(z - y^2)

or "cleverly" truncated, e.g.

[==>---------]  25%: x <- 1:1...
[=====>------]  50%: y <- slo...
[========>---]  75%: z <- slo...
[============] 100%: mu <- me...

BTW, we could also have a "step" progress reporter;

1/4: x <- 1:100         
2/4: y <- slow_sum(x)   
3/4: z <- slow_sum(x^2) 
4/4: mu <- mean(z - y^2)

or just

1: x <- 1:100         
2: y <- slow_sum(x)   
3: z <- slow_sum(x^2) 
4: mu <- mean(z - y^2)

BTW2, we might want to show what expression is currently being evaluated, e.g.

[>-----------]   0%: x <- 1:100 ...
[===>--------]  25%: y <- slow_sum(x) ...
[======>-----]  50%: z <- slow_sum(x^2) ...
[========>---]  75%: mu <- mean(z - y^2) ...
[============] 100%: DONE

First release: when?

What is a minimal, stable API that can be released? An API that allows us to additional features later without breaking backward compatibility.

Should probably start prototyping in future.apply to see how well the existing API works and to figure out what's missing.

DESIGN: Value of with_progress()?

Currently with_progress(expr) returns the value of expr, which allows us to do either:

with_progress(y <- slow_sum(x))

or

y <- with_progress(slow_sum(x))

However, if we restrict ourselves to the first case, then we can have with_progress() return other things, e.g.

progress_summary <- with_progress(y <- slow_sum(x))

The question is if that is useful beyond debugging/troubleshooting.

Auto-finish

Instead of:

  progress <- progressor(steps = 4L)
  relay_progress <- progress_aggregator(progress)
  progress()
  relay_progress(slow_sum(1:3))
  relay_progress(slow_sum(1:10))
  progress()
  progress(type = "done")

One could skip the last step via:

  progress <- progressor(steps = 4L, auto_finish = TRUE)
  relay_progress <- progress_aggregator(progress)
  progress()
  relay_progress(slow_sum(1:3))
  relay_progress(slow_sum(1:10))
  progress()

DESIGN: Pros and cons of different solutions

Progress updates via '.progress' argument

Example:

y <- future_lapply(..., .progress = function(...) { ... })

Pros:

  • Clear that function produces progress info

Cons:

  • Requires specifying progress argument
  • It is not possible to listen to progress info from nested "internal" functions

Progress updates via centralized "progress" subscription

Example:

progress_subscribe(function(name, step, ...) {
  # called each time there is a progress update
})

y <- future_lapply(...)

Pros:

  • Anyone can subscribe
  • No "progress" argument

Cons:

  • Not clear how to handled nested progress updated
  • Not clear who can/should subscribe

Progress updates via condition signaling

Example:

progress_relay({
  y <- future_lapply(...)
})

Pros:

  • Anyone can listen from anywhere upstream
  • No "progress" argument
  • No need to update code when a downstream function introduces progress info (correct?)
  • Progress conditions can be captured, filtered, summarized locally in functions and then resignaled in different forms and shapes.

Cons:

  • Requires wrapping whole expressions, e.g. relay_progress(y <- foo())
  • End user needs to wrap calls to, but we might be able to use a task callback handler to handle progress conditions that bubble up to the top level.

ROBUSTNESS: Future proof / stray progression updates from unknown sources

What if another_fun() all of a sudden starts signaling progression after a package update? We need to handle that and prevent it from confusing the handlers.

with_progress({

progress <- progressor(steps = 4L)

relay_progress <- progress_aggregator(progress)

progress()

x <- another_fun(x)

relay_progress(slow_sum(1:3))

relay_progress(slow_sum(1:10))

progress()
})

progressor(steps, props)

Add support for realitive sizes of progression steps, e.g.

progress <- progressor(steps = 6L, props = c(1/3,3/3,2/3))
progress <- progressor(steps = 4L, props = c(1,3,2))  ## automatically rescaled

Alternatively,

progress <- progressor(step_sizes = c(1L, 3L, 2L))

where steps = length(step_sizes) by default.

It would be neat if props can be modified along the way, e.g. if one of the steps end up finding a shortcut (because of input data) and only needing, say, 50% of its processing "time".

Make it easy to only get a progress update when done

Make it easy to only get a progress update when done. There are many examples where people / code / packages have a long running process produce a "ping" at the very end.

Currently, the best option is to use times = 2L. When used, we will create the progress reporter, which may or may not present it self when created, and then only report on updates when the last step is progress():ed.

We could allow for times = 1L, which now becomes times = 2L, and, where supported, skip the setup step, and only report on the last (="100%") progress update.

REPORTERS: List of possible backends

Auditory

Email

  • blastula: Easily Send HTML Email Messages
  • blatr: Send Emails Using 'Blat' for Windows
  • gmailr: Access the Gmail RESTful API
  • IMmailgun: Send Emails using 'Mailgun'
  • mail: Sending Email Notifications from R
  • mailR: A Utility to Send Emails from R
  • sendmailR: send email using R

Notification frameworks

  • notifier: Cross Platform Desktop Notifications
  • notifyme: Send Alerts to your Cellphone and Phillips Hue Lights
  • notifyR: Send push notifications to your smartphone via pushover.net (ACCOUNT REQUIRED) [package archived on CRAN on 2022-05-09]
  • pushoverr: Send Push Notifications using Pushover (ACCOUNT REQUIRED)
  • RPushbullet: R Interface to the Pushbullet Messaging Service (ACCOUNT REQUIRED)
  • ntfy (R package): ntfy (pronounce: notify) is a simple HTTP-based pub-sub notification service. It allows you to send notifications to your phone or desktop via scripts from any computer, entirely without signup, cost or setup. It's also open source if you want to run your own.
  • keep - Alerting. By developers, for developers.

Messages / Channels

  • slackr: Send Messages, Images, R Objects and Files to 'Slack' Channels/Users
  • streamR: Access to Twitter Streaming API via R
  • telegram: R Wrapper Around the Telegram Bot API
  • twitteR: R Based Twitter Client

Miscellaneous

  • txtq: A Small Message Queue for Parallel Processes
  • PingMe - supports many of the above message services (Discord, Email, Gotify, Line, Mastodon, Mattermost, Microsoft Teams, Pushbullet, Pushover, RocketChat, Slack, Telegram, Textmagic, Twillio, Zulip, and Wechat)
  • UnifiedPush: a set of specifications and tools that lets the user choose how push notifications* are delivered. All in a free and open source way.

Generalize to checkbox-ish progress states

Generalize to checkbox-ish progress states, i.e. instead of incremental updates, make it possible to do here-and-there updates. A textual representation of both kinds:

48%▕▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░▏

and

48%▕░▓▓░░▓░░▓░░░▓▓▓▓░▓▓▓▏

The former is a special case of the latter.

Time-based information

Time-based information can be inferred if we track the timestamp of each step:

  • duration
  • duration per step
  • duration of last step
  • steps per second
  • ETA

without_progress()

Is there a need to muffle progress? A simple without_progress() function?

PROOF OF CONCEPT: future.apply and relaying of conditions

Just a proof-of-concept that we can relay progression conditions.

p_future_sapply <- function(x, FUN, ...) {
  progress <- progressor(length(x))
  future.apply::future_sapply(x, FUN = function(x) { 
    progress()
    FUN(x)
  }, future.conditions = "condition")
}
library(future.apply)
plan(multisession, workers = 4)

library(progressr)
options(progressr.handlers = txtprogressbar_handler(clear = FALSE))

y <- p_future_sapply(1:10, FUN = function(x) { Sys.sleep(0.1); x })
print(y)
## [1]  1  2  3  4  5  6  7  8  9 10

with_progress(y <- p_future_sapply(1:10, FUN = function(x) { Sys.sleep(0.1); x }))
## |==============================================| 100%
print(y)
## [1]  1  2  3  4  5  6  7  8  9 10

Note that the progression conditions are only relayed after all the values have been collected, i.e. it's not very useful in this form. Somewhat better would be to relay the progression:s as soon as they arrive, cf. HenrikBengtsson/future#270

ROBUSTNESS: Too few steps - how to automatically finish up?

Moved from #40 (comment):

Likewise, if we take too few steps, a progressor may never signal "finish".

Can a progressor register itself an on.exit() in the parent environment? Second best is a finalizer that the garbage collector will trigger, but that is likely to occur much later and possibly in the order compared to other progressors.

Can/should with_progress() auto close/finish progression handlers upon exit?

with_progress: Add arguments `delay_stdout` and `delay_conditions`

With with_progress(..., delay_stdout=FALSE, delay_conditions=character(0L):

[=====>---------]  40% (x=4)O: x=5
M: x=5
O: x=4
M: x=4
[========>------]  60% (x=3)O: x=3
M: x=3
[===============] 100% (x=2)
O: x=2
M: x=2
O: x=1
M: x=1

With with_progress(..., delay_stdout=TRUE, delay_conditions="condition"):

[=====>---------]  40% (x=4)
[========>------]  60% (x=3)
[===============] 100% (x=2)
[===============] 100%
O: x=5    ## below is only outputted when everything is done
M: x=5
O: x=4
M: x=4
O: x=3
M: x=3
O: x=2
M: x=2
O: x=1
M: x=1

PERFORMANCE: Avoid progress updates if no one is listening

A quick thought: I wonder if it's possible for the progress publisher to request that subscribers to confirm they're interested in the progress updates or not. If not, the publisher won't have to waste time creating and signaling progression conditions that will not be listened too.

ROBUSTNESS: lost conditions

If we ever get to the point were we signal progression conditions between hosts/processes via some network or file channel, it might be that some signaled conditions are lost.

In such cases, we should set up the protocol such that the progression conditions that arrive later will be enough to recover the progress state.

This should be easy with (total, step) in each condition.

Make 'close' a first class option

Although not supported by, or it does not make sense for all progress reporter, pass argument 'close' to all.

If TRUE, when progress is completed, then the reporter should close/hide the progress indicator

Support clear = c("never", "complete", "always")

For filesize_handler() we don't want the file to removed (clear = TRUE) if there is an error. Right now with_progress(..., cleanup = TRUE) will cause the progress file to be removed upon exit, including when it exits due to an error.

Progression handler classes

Progression handlers should extend the progression_handler class with their own names. This will make it easier to identify what they are in a list of handlers. It'll probably become handy in other ways too.

FILTER: Progression update transformers

Make it possible to transform progression conditions, e.g. merge every 3 updates into one, or merge until interval (in seconds) has passed and resignal new.

If we can get this to work, there's no need for each handler to deal with 'times' and 'intervals'.

REPORTER: Debug/record/log/save handler

E.g. Save to File. Replay later

  • debug_handler
    • add support for debug_handler(..., record=env) where all received progression_conditions are recorded in environment env etc.
  • log_handler

Gather multiple progress sources and relay

foo <- function(x) {
  ## Four steps each with relative different amount of processing needed
  progress <- progressor(steps = 4L, props = c(1, 3*length(x), sqrt(length(x)), 1))
  relay_progress <- progress_relayer(progress)

  x <- transform(x)
  progress()

  relay_progress({
    y <- future_lapply(x, FUN = slow_something)
  })

  relay_progress({
    y <- future_lapply(y, FUN = slow_again)
  })

  z <- summarize(y)
  progress()

  progress(type = "done")

  z
}

ROBUSTNESS: Taking too many (or too few) steps

It's not hard to imagine bugs sneaking in over time such that too many progress updates are sent out, e.g.

foo <- function(x) {
  progress <- progressor(max_steps = 2)
  progress()
  y <- my_calc(x)
  if (y < 0) {
    y <- adjust(y)
    progress()  ## wasn't there in the first code iteration
  }
  progress()
  y
}

A progressor should probably be forgiving when it comes to "stepping" beyond max_steps. Producing an error would be too disruptive. A warning would be more appropriate.

Noise: a warning

> example("progress_aggregator", package = "progressr")

prgrs_> library(progressr)

prgrs_> message("progress_aggregator() ...")
progress_aggregator() ...

prgrs_> with_progress({
prgrs_+   progress <- progressor(steps = 4L)
prgrs_+   relay_progress <- progress_aggregator(progress)
prgrs_+   progress()
prgrs_+   relay_progress(slow_sum(1:3))
prgrs_+   relay_progress(slow_sum(1:10))
prgrs_+   progress()
prgrs_+   progress(type = "done")
prgrs_+ })
                                                                               
                                                                               
prgrs_> message("progress_aggregator() ... done")
progress_aggregator() ... done
Warning message:
In file.remove(pb_env$file) :
  cannot remove file '/tmp/hb/Rtmp1Zbpm9/file5ee415230c4b', reason 'No such file or directory'

REPORTER: filesize_handler

Report progress via the size of a file on the file system, e.g. in units of 0-100 bytes.

$ ls -l progressr/
-rw-rw-r--  1 hb   hb    100 Jan 17 18:00  task_001.progress
-rw-rw-r--  1 hb   hb   3021 Jan 17 18:00  task_001.progress.log
-rw-rw-r--  1 hb   hb     69 Jan 17 18:21  task_002.progress
-rw-rw-r--  1 hb   hb   1292 Jan 17 18:21  task_002.progress.log
-rw-rw-r--  1 hb   hb      0 Jan 17 18:21  task_003.progress
-rw-rw-r--  1 hb   hb   2010 Jan 17 18:21  task_003.progress.log

API: harmonize names with progress, dplyr, ...

Right now most names used are bc they were quickly made up while coding/thinking. To minimize friction for others, try to harmonize them where it make sense, e.g. plyr uses 'init' while here I use 'startup'.

DESIGN: Progress session setup

A function needs to communicate the total amount of progress it will go through in its progress "session". This will require a well-established protocol to set information such as:

  1. Set title
  2. Set max steps
  3. Set ETA
  4. Set (step, label, params)
  5. Done

These pieces of information can be used when rendering the progress in the UI.

Handlers should be reset upon start

> options(progressr.handlers = txtprogressbar_handler(clear = FALSE))
> library(progressr); with_progress(slow_sum(1:10))
  |======================================================================| 100%
> library(progressr); with_progress(slow_sum(1:10))
> library(progressr); with_progress(slow_sum(1:10))
> 

Nested progression info

In order to support refined progression info at the top level, aggregators should/could relay nested info that provide info on the current step and any nested steps, e.g. step = c(2, 23). Same for messages and max_steps.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.