Giter Club home page Giter Club logo

Comments (3)

eddiebergman avatar eddiebergman commented on August 26, 2024 1

You could look at this answer using sched: https://stackoverflow.com/a/474543/5332072

From there dask has a way to essentially shutdown() the Client and the close() it.

from dehb.

eddiebergman avatar eddiebergman commented on August 26, 2024 1

Based on a lookover, the "hot-loop" is here, with the break condition here:

if self._is_run_budget_exhausted(fevals, brackets, total_cost):
break


To return on time

I would probably do something along the lines of this for the dask case, this should basically kill all jobs running in dask and wait for all of them to return. This wait part isn't fulllly necessary but in principal it should be fine.

self.client.close()
for future in self.futures:
	future.cancel()
	
concurrent.futures.wait(self.futures, "ALL_COMPLETED")

Dask has the property that you can cancel running jobs, but in the non-dask case (here), where you're just raw dogging the function, you can't cancel it because it's in the same process. Killing it would mean killing the whole thing.

else:
# skipping scheduling to Dask worker to avoid added overheads in the synchronous case
self.futures.append(self._f_objective(job_info))

To circumvent this, you would need to run it in a subprocess of some kind and use psutil to effectively kill the process.


To inform the process so you can save

This is much harder, especially when you don't control the target function. The first thing you need is the handle of the process that is running the target function. Then you can send a SIGTERM to the process with .terminate()).

process = psutil.Process(<process-id of the thing to signal>)
process.terminate() 

The correct procedure here by OS standards is to cleanup the program and finish soon. The way to do this is to use pythons signal module, more over, this function:

import signal

def callback(signal_num, framestack) -> None:
    # ... cleanup, save a model, whatever

signal.signal(signal.SIGTERM, callback)

The tricky part is that users have to specify this, i.e. their target function is going to be called and this callback has to be registered once inside the process that is running the target function. I do not know how you'd like to do that. I think your best approach is simply give an example and move on. Trying to automatically handle this stuff would be a nightmare to do and maintain.

P.s.

This won't work if using a custom remote dask server, as you have no way to send a signal to this other machine running the process (or maybe dask does?), only if things are done with local processes. Perhaps dask has some unified way of handling this

from dehb.

Neeratyoy avatar Neeratyoy commented on August 26, 2024

@eddiebergman what do we do with the interrupted evaluation?
assuming it is a deep learning model training as an evaluation, is it okay to still exceed the runtime to trigger saving the current state?
@Bronzila feel free to share your thoughts too

from dehb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.