In a multicore plan, if the expression fails (e.g., due to timeout here) then the retu

FYI, this was started initially in <a class="issue-link js-issue-link" data-error-text

Thank you for the prompt and detailed response! <code

Actually, I'm having a difficult time meeting my expectation of <code class="notransla

setTimeLimit(): varying results in multicore with failing job about future HOT 7 OPEN

r2evans commented on June 7, 2024

setTimeLimit(): varying results in multicore with failing job

from future.

Comments (7)

r2evans commented on June 7, 2024

FYI, this was started initially in rstudio/promises#86 in a related effort; the double-reporting issue there I think has been resolved in dev, but I think the variable failing is future-specific.

from future.

HenrikBengtsson commented on June 7, 2024

Thanks for this. Some quick initial comments. I think I understand what's going on. The time-out limit is not reset on the worker after the future expression has been finished. This means it'll also apply to the code wrapping up the results and sending it back to the parent R process.

Depending on exactly when this timeout kicks in, you'll get different types of errors. Your different examples show this and makes sense to me. For example, the "unexpected result" error message is from an internal sanity check that makes sure that the proper data structure has been received by the parent R process, but in your example it received nothing, indicating that the connection was interrupted. That error is actually of class UnexpectedFutureResultError, which is a special instance of class FutureError. To distinguish them from regular run-time evaluation errors, the FutureError class indicates an error in the "orchestration" of futures. When those happen, they can be quite disruptive, and depending on parallel backend, you might have to reset the backend, or even restart R.

I think that if you add setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE) to the end of your future expression, e.g.

thisfut <- future({
  on.exit(setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE))  ## TESTING ONLY!
  setTimeLimit(elapsed = 2, transient = TRUE)
  Sys.sleep(3)
  1L
})

then this problem should go away? Is that the case? If so, I'll look into possibly adding a setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE) automatically.

The reason why I say TESTING ONLY!, is that I have yet not decided if it is a good idea to use on.exit() in future expressions. It works because of how it's evaluated internally, but I'm not yet ready to promise that behavior, but it's good enough for our testing purposes here.

BTW, Sys.sleep() is very special when it comes to timeouts, and its behavior varies with OS, cf. HenrikBengtsson/Wishlist-for-R#47.

from future.

r2evans commented on June 7, 2024

Thank you for the prompt and detailed response!

Sys.sleep() is just a placeholder for longer-running expressions that I need to be bounded, ergo setTimeLimit. Note taken. (And I learned something, thanks!)
Especially with the explicit transient=TRUE, I'm surprised that the on.exit(..) changes any behavior, much less fixes this problem. My interpretation of transient=TRUE is that as soon as the current computation is complete (i.e., everything within the future({...}), it should automatically return to the default Inf limits. Do think think this could be a bug in setTimeLimit, or am I misinterpreting it?
Edit: on re-reading your comment, the transient= appears to be insufficient due to the extra code you're running for futures-management, I get it, so some of your code is also not being allowed to continue operation. Would it change any behavior to run setTimeLimit(.) and my real code within a local(.) or other further-constricting environment?

I see that this (on.exit(..)) trick is what you do in R.utils::withTimeout, which suggests to me that it's just a "feature" (perhaps not quite "known deficiency") of setTimeLimit.

I understand that you are concerned with imposing this as default behavior. For now I'm going to have to use it in production (so that failed futures don't leak workers, a problem I've been dealing with). Do you think the use of reg.finalizer is any more or less risky than on.exit?

from future.

HenrikBengtsson commented on June 7, 2024

I understand that you are concerned with imposing this as default behavior. For now I'm going to have to use it in production (so that failed futures don't leak workers, a problem I've been dealing with). Do you think the use of reg.finalizer is any more or less risky than on.exit?

I'm only hesitant because I haven't had a deep focus of thinking about what it would entail to allow on.exit() within a future expression. Quite simplified, you can think of the current implementation for evaluating futures to looks something like:

setup_future_code()

result <- tryCatch({
  local(<future expression here>)
}, error = function(ex) {
  ...
})

teardown_future_code()
send_results_back(result)

Since it's evaluated within a local() already, then on.exit() works already now. I doubt that will change, but I do not want to make a full commitment to it right now. An advantage of recommending/requiring an explicit local(), is that you need that in vanilla sequential code, e.g.

res <- local({
  on.exit(...)
  42L
})

You cannot just do:

res <- {
  on.exit(...)
  42L
}

So, for future() analogue to regular evaluation, requiring local() makes more sense, especially so when replacing <- with future's %<-% parallel assignment operator.

Would it change any behavior to run setTimeLimit(.) and my real code within a local(.) or other further-constricting environment?

Yes, using an explicit local() would guarantee that on.exit() will always work, with or without futures. So,

thisfut <- future(local{
  on.exit(setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)) 
  setTimeLimit(elapsed = 2, transient = TRUE)
  Sys.sleep(3)
  1L
}))

would be the most future-proof solution (pun intended), and it explicitly communicates the scope and intent.

That said, I will consider adding something like on.exit(setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)) internally, to protect against hidden timeouts, which might be a left-over from some deep-down dependency. Doing so would be inline with recent attempts to undo R options and environment set, so they do not spill over future (pun 2) futures. To clarify, with the current behavior, you can create a long-term timeout in one future, which will remain if run on a persistent worker (e.g. multisession), and might trigger a timeout, say, in one hour from now.

from future.

r2evans commented on June 7, 2024

Actually, I'm having a difficult time meeting my expectation of setTimeLimit even outside of future.

 print({
   setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
   e <- environment()
   reg.finalizer(e,
                 function(ign) {cat('fin1\n');setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE);},
                 onexit = TRUE)
   on.exit({cat('fin2\n');setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE);})
   setTimeLimit(cpu = 1, elapsed = 1, transient = TRUE)
   system("sleep 3")
   # Sys.sleep(3)
   message("hey")
   setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
   1L
 })

Also done with print(local({...})). Also done using Sys.sleep instead of system. Clearly overkill.

Some of the time I get Error: reached elapsed time limit, but usually it happily messages "hey" and returns 1L. I can't seem to find a way to get any expression to reliably fail on a time limit, either in future or not ... off-topic for this issue but I think I'd like to shore-up this issue so that it behaves as expected both inside and outside of a future. Do you have any insight?

My expectation is that this should always: fail, show the error message, not display "hey", and not return 1L. Clearly I'm confused about something :-/

from future.

HenrikBengtsson commented on June 7, 2024

Do you have any insight?

I only skimmed your comment here, but my first instinct is that also system() might not be interruptable, or at least have a delayed interruption. Try with for (kk in 1:300) Sys.sleep(0.01) instead and see if that makes a difference.

Also, it doesn't look like that finalizer makes a difference in this example, since it'll only kick in if you delete e, and then it's unpredictable when the GC will clean it up.

from future.

r2evans commented on June 7, 2024

Okay, I think it's finally sinking in: neither Sys.sleep nor system in this usage are good tests for setTimeLimit, locally or in a future. Replacing system("sleep 3") and Sys.sleep(3) with something like (function() replicate(300, Sys.sleep(0.1)))() (I was too lazy to define it as a named func) seems to remedy the symptoms. Incidentally, so does processx::run("sleep", "3") (I really don't like system/system2 :-/)

For reg.finalizer, I had assumed that once we extracted the value from a multicore worker, the forked process and therefore the environment would be harvested, so it should have been finalized. Perhaps, as with many things gc-related, it just isn't happening as quickly as I assumed.

I should have listened more closely to your first cautions: on.exit(..) does not seem necessary when using either the looped-Sys.sleep or the processx variant.

Thank you for the dialogue!

from future.

setTimeLimit(): varying results in multicore with failing job about future HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent