Giter Club home page Giter Club logo

storage_aware-slurm-plugin's Introduction

How-to-use instructions

To use our plugin in a multi-tiered storage cluster, we describe how to install and use our plugin as follows. We have considered two storage tiers, called low-performance storage (LPS) and high-performance storage (HPS).

Installation guide

Follow the steps on https://slurm.schedmd.com/add.html under Adding a Plugin to Slurm to install the provided plugin (tested with Slurm version 20.02.5). The provided Vagrantfile automates this process for our virtual testbed.

User guide

When submitting jobs with the sbatch command, users can use the --bb argument to specify job storage requirements. The plugin expects two arguments passed to the --bb argument in quotation marks:

  • capacity: the high-performance storage capacity required by the job
  • io: the intermediate data read and written to the high-performance storage during the runtime of the job

The current version of the plugin implements the scheduling mechanism published by our paper [1].

To We provide a simple job script called sample_job.sh as follows:

#!/bin/bash
#
#SBATCH --job-name=sample_job
#SBATCH --output=sample_job_%j.out
#SBATCH --ntasks=1
#SBATCH --time=05:00
#SBATCH --mem-per-cpu=512

dd if=/dev/zero of="$SLURM_STORAGE_TIER/testfile_$SLURM_JOBID.dat" bs=1M count=1024
rm "$SLURM_STORAGE_TIER/testfile_$SLURM_JOBID.dat"

#end of job script

This job creates and removes a 1.0 GiB test file called testfile_<Slurm_Job_ID>.dat on the storage tier selected by the plugin. Submit the job using sbatch as follows:

sbatch --bb="capacity=1024 io=8192" sample_job.sh

Setup the virtual testbed

We have used a virtual cluster for our development and test purposes. To make this environment easily usable for everybody, we automated the setup process by Vagrant Software. To complete all requirements for producing the virtual cluster and testing the plugin, the following steps are sufficient:

  1. Install both VirtualBox and Vagrant software inside a Linux system.
  2. Clone the repository of the project, to setup the virtual cluster.
  3. Clone Slurm (https://github.com/SchedMD/slurm.git) in the same folder.
  4. Checkout the Slurm version slurm-20-02-5-1
  5. Run vagrant up.
  6. Run vagrant ssh controller.
  7. Run start_slurm.
  8. Run vagrant ssh server1 in a separate session.
  9. Run start_slurm.
  10. Repeat steps 7 and 8 for server2 VM.
  11. Make sure that server1 and server2 are in idle state and ready to accept jobs, by running sinfo on the controller VM.

With this setup, Vagrant creates three machines (two compute nodes and a single control daemon server) and two 5 GiB shared storage tiers (LPS and HPS). Jobs are submitted using the sbatch command. By passing the job storage requirements of jobs (e.g. sbatch --bb="capacity=1024 io=8192" sample_job.sh), as discussed in the previous section, Slurm will decide which compute and data resources are assigned the job.

References

[1] Leah E. Lackner, Hamid Mohammadi Fard, Felix Wolf: Efficient Job Scheduling for Clusters with Shared Tiered Storage. In Proc. of the 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Larnaca, Cyprus, pages 321โ€“330, IEEE, May 2019.

storage_aware-slurm-plugin's People

Contributors

hamidfard avatar oezden avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.