Giter Club home page Giter Club logo

geos-chem-schedule's People

Contributors

bennewsome avatar kilicomu avatar tsherwen avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

tsherwen

geos-chem-schedule's Issues

Rename repository to reflect job submission is no longer linked to months?

Currently, the functionality is to split runs by various numbers of days, weeks, and months. Originally this code was only used to split up runs by months. The name of the repository no longer reflects the main functionality/use so a new name may be better.

An example could be "GC_job_split".

Restore ability to call from a terminal command

It used to be possible to call the scheduler script from the command line (e.g. with the command below).

python geos-chem-schedule.py --job-name=hello_world --step=day --queue-name=test --submit=yes

However, this now results in the error message below:

Invalid argument --submit=yes
                     Try --help for more info.

Overhaul command line argument parsing

geos-chem-schedule/core.py

Lines 80 to 145 in f543a09

def get_arguments(inputs, debug=False):
"""
Get the arguments supplied from command line
Parameters
-------
inputs (GC_Job class): Class containing various inputs like a dictionary
debug (bool): Print debugging output to the screen
Returns
-------
(GC_Job class)
"""
# If there are no arguments then run the GUI
if len(sys.argv) > 1:
for arg in sys.argv:
if "geos-chem-schedule" in arg:
continue
if arg.startswith("--setup"):
setup_script()
elif arg.startswith("--job-name="):
inputs.job_name = (arg[11:].strip())[:9]
elif arg.startswith("--step="):
inputs.step = arg[7:].strip()
elif arg.startswith("--queue-name="):
inputs.queue_name = arg[13:].strip()
elif arg.startswith("--queue-priority="):
inputs.queue_priority = arg[17:].strip()
elif arg.startswith("--submit="):
inputs.run_script_string = arg[9:].strip()
elif arg.startswith("--out-of-hours="):
inputs.out_of_hours_string = arg[15:].strip()
elif arg.startswith("--wall-time="):
inputs.wall_time = arg[12:].strip()
elif arg.startswith("--cpus-need="):
inputs.cpus_need = arg[12:].strip()
elif arg.startswith("--submit_jobs_together="):
inputs.cpus_need = arg[23:].strip()
elif arg.startswith("--memory-need="):
inputs.memory_need = arg[14:].strip()
elif arg.startswith("--help"):
print("""
geos-chem-schedule.py
For UI run without arguments
Arguments are:
--job-name=
--step=
--queue-name=
--queue-priority=
--submit=
--out-of-hours=
--wall-time=
--submit_jobs_together=
--memory-need=
--cpus-need=
e.g. to set the queue name to 'bob' write --queue-name=bob
""")
else:
print("""Invalid argument {arg}
Try --help for more info.""".format(arg=arg)
)
sys.exit(2)
else:
inputs = get_variables_from_cli(inputs)
return inputs

Python provides a nice way to do this via argparse. Custom sys.argv handling is messy and will inevitably result in unforeseen errors!

Update README to reflect current state of project

The README appears to be written for running this on earth0, so doesn't make for easy reading when thinking about how to use this on Viking, e.g.

The script has a UI to chose job name, queue name, priority, if you want to start the jobs outside of work hours, and if you want to have the script submit the job to the queue.

Refactor repo (e.g. split off testing suite)

Refactor the code from the current setup where all functionality is in a single file, to functionality by file names with appropriate names (e.g. testing, core, scripts etc).

Ensure split jobs stop if a proceeding job fails

Currently, split jobs can be submitted with a dependency to proceed if the preceding job is completed successfully. However, when jobs exit with a model crash the rest of the queued jobs submitted together are proceeding.

This is [create_SLURM_run_script2submit_together](https://github.com/wacl-york/geos-chem-schedule/blob/main/core.py#L1033-L1065) which uses the SLURM option --dependency=afterok.

TODO: work out how to capture all the job/model fail codes via SLURM and then abort the following model runs in the queue.

def create_SLURM_run_script2submit_together(times):
    """
    Create the script that can set the 1st scheduled job running
    Parameters
    -------
    time (str): string time to run job script for in the format YYYYMMDD
    Returns
    -------
    (None)
    """
    print(times)
    FileName = 'run_geos_SLURM_queue_all_jobs.sh'
    run_script = open(FileName, 'w')
    Line0 = "#!/bin/bash \n"
    Line1 = """job_num_{time}=$(sbatch --parsable SLURM_queue_files/{time}.sbatch) \n"""
    Line2 = """echo "$job_num_{time}" \n"""
    Line3 = """job_num_{time2}=$(sbatch --parsable --dependency=afterok:"$job_num_{time1}" SLURM_queue_files/{time2}.sbatch) \n"""
    for n_time, time in enumerate(times[:-1]):
        #
        if time == times[0]:
            run_script.write(Line0)
            run_script.write(Line1.format(time=time))
            run_script.write(Line2.format(time=time))
        else:
            run_script.write(Line3.format(time1=times[n_time-1], time2=time))
            run_script.write(Line2.format(time=time))
    run_script.close()
    # Change the permissions so it is executable
    st = os.stat(FileName)
    os.chmod(FileName, st.st_mode | stat.S_IEXEC)
    return

Example emailed error codes are (1) and a model run abort output of (2).

(1) Slurm Job_id=17908287 Name=Iso.UnlimAll.2 Ended, Run time 1-09:03:48, COMPLETED, ExitCode 0

(2)

---> DATE: 2018/06/05  UTC: 09:30  X-HRS:   3729.500000
===============================================================================
WETDEP: ERROR at   42  23  71 for species  128 in area RESUSPENSION in middle levels
 LS          :  T
 PDOWN       :   0.000000000000000E+000
 QQ          :   0.000000000000000E+000
 ALPHA       :   0.000000000000000E+000
 ALPHA2      :   0.000000000000000E+000
 RAINFRAC    :   0.000000000000000E+000
 WASHFRAC    :   0.000000000000000E+000
 MASS_WASH   :   0.000000000000000E+000
 MASS_NOWASH :   0.000000000000000E+000
 WETLOSS     :   0.000000000000000E+000
 GAINED      :   0.000000000000000E+000
 LOST        :   0.000000000000000E+000
 DSpc(NW,:)  :   0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000
 Spc(I,J,:N) :   0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000 -1.198463418495579E-013
 -3.378856226737475E-014 -2.314320910996734E-015 -3.019816125666191E-017
 -2.520870547180549E-019 -2.822882195877689E-020 -1.445442763521664E-020
 -6.900126364332953E-022 -8.767709715535021E-024 -4.400381547465612E-025
 -1.720033035554861E-026 -4.865030418189842E-028 -1.209928241506788E-029
 -2.093866091759672E-031 -5.970450354644719E-033 -5.162779587092464E-035
 -3.748736158226715E-037 -6.734895733485428E-040 -1.499747125810236E-039
 -1.842168830893617E-038 -1.148225302626263E-037 -8.781446960823125E-037
 -5.560642728382409E-034 -3.794567594400997E-031 -1.821084089043769E-029
 -4.488053629617312E-028 -1.680343276761365E-026 -1.155873919999522E-024
 -3.060986958260714E-022 -1.313308483959904E-021 -1.574449846171001E-021
 -1.338100164273038E-021 -6.956426779262325E-022 -3.506180178581743E-022
 -6.266069886329827E-022 -1.353555184532673E-021 -4.106777266211684E-021
 -1.014640714706731E-020 -1.134631269064776E-020 -9.833174509975247E-021
 -7.707150917708346E-021
===============================================================================
===============================================================================
GEOS-Chem ERROR: Error encountered in wet deposition!
 -> at SAFETY (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "Safety"!
 -> at Do_Complete_Reevap (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-Chem ERROR:
 -> at WetDep (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "Wetdep"!
 -> at Do_WetDep (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-CHEM ERROR: Error encountered in "Do_WetDep"!
STOP at  -> at GEOS-Chem (in GeosCore/main.F90)
===============================================================================
srun: error: node112: task 0: Exited with exit code 159

Improve installation process

def setup_script():
"""
Creates a symbolic link to allow running "geos-chem-schedule" from any directory
"""
print("\n",
"geos-chem-schedule setup complete. Change your default settings in settings.json\n",
"To run the script from anywhere with the geos-chem-schedule command,",
"copy the following code into your terminal. \n")
script_location = os.path.realpath(__file__)
# make sure the script is excecutable
print("chmod 755 {script}".format(script=script_location))
# Make sure there is a ~/bin file
print("mkdir -p $HOME/bin")
# Create a symlink from the file to the bin
print("ln -s {script} $HOME/bin/geos-chem-schedule".format(script=script_location))
# Make sure the ~/bin is in the bashrc
# with open('$HOME/.bashrc','a') as bashrc:
# bashrc.write('## Written by geos-chem-schedule')
# bashrc.write('export PATH=$PATH:$HOME/bin')
print('echo "## Written by geos-chem-schedule " >> $HOME/.bashrc')
print('echo "export PATH=\$PATH:\$HOME/bin" >> $HOME/.bashrc')
# Source the bashrc
print("source $HOME/.bashrc")
print("\n")
sys.exit()

The current installation process for this package is quite invasive (see above). A rework to follow a simple package structure (see Packaging Python Projects) would go a long way to smoothing this out.

Remove lines of scheduler info at end of geos.log file?

Example of lines currently output geos.log pasted below. To aid reading of GEOS-Chem information, could theses lines be output to a separate file?

Also, the job submission template should be updated so that the the command not found SLURM messages are not presented on reading BASH variables strings.

**************   E N D   O F   G E O S -- C H E M   **************
/var/spool/slurmdspool/job8817675/slurm_script: line 123: last_line: command not found
/var/spool/slurmdspool/job8817675/slurm_script: line 124: complete_last_line: command not found
Submitted batch job 8856961

============================
 Job utilisation efficiency
============================

Job ID: 8817675
Cluster: viking
User/Group: ts551/clusterusers
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 20
CPU Utilized: 27-00:30:01
CPU Efficiency: 98.91% of 27-07:40:00 core-walltime
Job Wall-clock time: 1-08:47:00
Memory Utilized: 8.82 GB
Memory Efficiency: 22.04% of 40.00 GB
 Requested wall clock time: 2-00:00:00
    Actual wall clock time: 1-08:47:00
Wall clock time efficiency: 68.3%
           Job queued time: 00:00:01

Add option to loop a single year a number of times

GCST uses a single year run twice, with the 1st as starting input for second, for the 1 year benchmarking. This would be a useful functionality for comparisons with benchmarks. It also may be a better approach in future to avoid the "is the atmosphere in equilibrium" question as a single year rather than contiguous years are used for spin-up/analysis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.