Giter Club home page Giter Club logo

transitioning_to_batch's Introduction

Scaling Up Work with Batch (Non-interactive) Jobs

August 2021

RCS Presents... seminar on how to transition to batch jobs (from interactive sessions) on HPC systems

Working with code using GUI tools such as NoMachine is usually a default choice for many users. However, as the datasets grow larger, the task grows more complex, or the number of necessary repetitions increases, this approach is no longer scalable. Running batch (or background) jobs, where one does not interact with the program that is running, allows the user to scale one's work. On the HBSGrid cluster, for example, one can run hundreds of scripts at once, analyzing numerous data files simultaneously, or performing other parallelizable or automatable jobs.

This transition to background-only, non-GUI work may seem daunting: how does the program know what files to work on or write out? How do I monitor its progress? How do I even start the program, let alone hundreds of them? This session will demystify the process of transitioning to running batch jobs, give you several approaches to make this transition, and highlight a few useful tools.

Narrative

Setup: You a have a folder with a number of data files (5? 20?) and a script file. Or you have a parameter sweep that will run over 100s of combinations of values. Doing this via a GUI program is cumbersome on slow. I'd like to streamline my approach for running this code repeatedly on different files.

Code:

My script file process_data.py takes an input file with numbers and will output a running sum after 10 values, printing out status line of how many #s have been seen so far every 1000 lines summed.

Questions for attendees

  • What do you like about using the GUI for running your script file?
  • What do you (or would you) find limiting??

Objectives:

  • Know how to launch batch jobs from the terminal
  • Know how to set up inputs and outputs
  • Monitor the progress of your job
  • Scaling up to larger numbers of files

Steps:

For further investigation

  • LSF documentation
    • Job control
    • Notifications
    • Log files
    • Job arrays
    • Writing submission scripts
  • FASRC write-up on "Submitting Large Numbers of Files" (somewhat outdated)

transitioning_to_batch's People

Contributors

devbioinfoguy avatar

Stargazers

Marina Vallejo avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.