Comments (3)
You could look at this answer using sched
: https://stackoverflow.com/a/474543/5332072
From there dask has a way to essentially shutdown()
the Client
and the close()
it.
from dehb.
Based on a lookover, the "hot-loop" is here, with the break condition here:
Lines 750 to 751 in 54ce41c
To return on time
I would probably do something along the lines of this for the dask case, this should basically kill all jobs running in dask and wait for all of them to return. This wait part isn't fulllly necessary but in principal it should be fine.
self.client.close()
for future in self.futures:
future.cancel()
concurrent.futures.wait(self.futures, "ALL_COMPLETED")
Dask has the property that you can cancel running jobs, but in the non-dask case (here), where you're just raw dogging the function, you can't cancel it because it's in the same process. Killing it would mean killing the whole thing.
Lines 572 to 574 in 54ce41c
To circumvent this, you would need to run it in a subprocess of some kind and use psutil
to effectively kill the process.
To inform the process so you can save
This is much harder, especially when you don't control the target function. The first thing you need is the handle of the process that is running the target function. Then you can send a SIGTERM
to the process with .terminate()
).
process = psutil.Process(<process-id of the thing to signal>)
process.terminate()
The correct procedure here by OS standards is to cleanup the program and finish soon. The way to do this is to use pythons signal
module, more over, this function:
import signal
def callback(signal_num, framestack) -> None:
# ... cleanup, save a model, whatever
signal.signal(signal.SIGTERM, callback)
The tricky part is that users have to specify this, i.e. their target function is going to be called and this callback has to be registered once inside the process that is running the target function. I do not know how you'd like to do that. I think your best approach is simply give an example and move on. Trying to automatically handle this stuff would be a nightmare to do and maintain.
P.s.
This won't work if using a custom remote dask server, as you have no way to send a signal to this other machine running the process (or maybe dask does?), only if things are done with local processes. Perhaps dask has some unified way of handling this
from dehb.
@eddiebergman what do we do with the interrupted evaluation?
assuming it is a deep learning model training as an evaluation, is it okay to still exceed the runtime to trigger saving the current state?
@Bronzila feel free to share your thoughts too
from dehb.
Related Issues (20)
- Continuously develop unit tests
- Keep active brackets in a dictionary HOT 1
- Implement ask-and-tell interface for DEHB HOT 1
- Use of deprecated np.int in dimension type checking HOT 3
- [Bug] `isinstance(self, Client)` will never be `True` HOT 6
- Configuration and budgets in each brackets HOT 6
- Using Dask ```Client``` as context manager
- No support for Constant type in vector_to_configspace() and configspace_to_vector() HOT 3
- Update CHANGELOG with the historic releases
- Updating and populating the documentation
- Implement versioning for documentation HOT 1
- Introduce IDs for evaluated configurations HOT 2
- Continuing configuration evolution when run budget is exhausted. HOT 2
- Implement parallel DE
- [Bug] Seeding is not applied to configspace HOT 3
- Is there any way to hide non-critical logs to stdout? HOT 8
- ERROR - Failed to communicate with scheduler during heartbeat. followed by TimeoutError: No valid workers found HOT 2
- Test backward compatibility with NumPy2.0 HOT 4
- Ask & Tell for DE
- Update numpy dependency as soon as ConfigSpace works with numpy 2.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dehb.