Giter Club home page Giter Club logo

Comments (3)

miekkasarki avatar miekkasarki commented on July 30, 2024

I support this feature. We already have a "worker" thread that monitors the simulation and whose only job right now is to print that progress file. In principle, we could signal that thread that "please stop my simulation ASAP" and that worker thread then sets MAX_CPU_TIME end condition for all markers that are currently being simulated or whose simulation has not yet started. Then the whole particle queue would be flushed within a minute or so, and you would get your intermediate results stored in the HDF5 file.

As for the signal, what you suggest would probably work in your case where the progress meter can be trusted. However, I would prefer that the job would be terminated gracefully in two cases:

  1. SLURM signals that the job is approaching its time limit where it would be forcefully terminated
  2. User wants to terminate the run earlier and sends the signal him/herself

These would work for you, right? The signal could be something as simple as creating a file called "stop" in the same folder where the job was launched. However, there are two open questions:

  1. How can we make SLURM to generate such a file or could we pass the time limit from SLURM to ascot somehow at the beginning of the simulation?
  2. What if the user is running multiple simulations in same folder? In this case the file should be something like "stop_" and again we would have to communicate JOBID from SLURM to ascot somehow.

from ascot5.

rui-coelho avatar rui-coelho commented on July 30, 2024

I was thinking of something really much more basic. Imagine we have an initial value code to simulate the time evolution of an instability. We know we are going overboard in terms of maximum runtime and if we are on MARCONI this mans typically 24h. I first need to have a rough estimate of how much time steps this translates to and then i can set the number of time steps accordingly. I then set the first run to do 1M time steps (1-1,000,000), the second run will do from 1,000,000 to 2,000,000 and so on and so forth.
Now, if ASCOT runs the markers "sequentially" i.e. dispatching let's say 1000 markers until the end condition is met, then the next 1000 and so on......one could "trivially" instruct the code to only dispatch the "first" 1,000,000 markers and then store the result in the HDF5 file. The next call to ASCOT, however, would have to know which set of 1,000,000 markers was dealt with and then dispatch the next set of 1,000,000 markers. Very likely, for this to work, one should have an extra OPTIONS key to specify what "sequence number" of the multi-stage run we are running so that the stupid code could know which set of 1,000,000 markers to launch in the run.....and of course the number of markers to "push" on each "sequence" should also be an OPTION key in the dictionary......

from ascot5.

rui-coelho avatar rui-coelho commented on July 30, 2024

....since ASCOT does not do beam-beam reaction it should be doable to implement since in reality once a given marker meets it's end (poor guy....) it R.I.P right ?

So....we could potentially break up a run that has 10M markers in 10 runs of 1M each or 20 runs of 500k each (the number of markers i have been using more frequently...) in sequence and update the hdf5 file as the sequences evolved....and since the markers are all "tagged" with metadata we could even check which ones have met their fate and which ones are waiting to go to the slaughter.....(too much jambon hanging and eating while at Salamanca....apologies for the analogies...)

from ascot5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.