Giter Club home page Giter Club logo

condor-examples's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

condor-examples's Issues

[2015-09-30] Installing JAGS on cluster

This bash script works great to automate it with parallel-ssh

if ! type jags > /dev/null; then 

wget http://downloads.sourceforge.net/project/mcmc-jags/JAGS/3.x/Source/JAGS-3.4.0.tar.gz

tar xfvs JAGS-3.4.0.tar.gz

cd JAGS-3.4.0

./configure
make
make install

fi

2015-09-07 Managing multiple linux worker nodes

Background

One of the challenges I've struggled with in many Condor pools is managing multiple computers to be consistent. For many mid-sized institutions, you might have 5 or 10 or 50 individual HTCondor worker nodes. This is often too few to implement an enterprise configuration management system, but too many to be administered easily on an individual basis.

I've implemented plenty of crappy solutions. Last one was a shell script that iterated over a host list and SSH-ed into each running various commands. I knew there must be another way.

Install and define

After digging into it a bit (and having about 20 new nodes to configure), I found pssh (parallel-ssh). This is exactly what I was looking for. You define a list of hosts in a text file.
image

On ubuntu, it's as easy to install as apt-get install pssh, which then shows up on your path as parallel-ssh. (yum install pssh on Centos)

Example Commands

Commands look like this.

parallel-ssh -h hosts.txt -l root -i -A -t 120 yum install netcdf-devel
parallel-ssh -h hosts.txt -l root -i -A -t 120 uptime

and so on. I'm using -l to define the user, -A to have it prompt for the password, and -t to increase the timeout to 2 minutes.

Note

I had some trouble with ssh at first. Getting past the "accept this new SSH server identity" issue, I used -O and the info I found here. Further info on this blog post and the original project itself.

-Luke

Some experimentation with Rmpi and Slurm

Some notes to write up more later.

module load mpi/openmpi-1.5.5-gcc
install.packages('Rmpi', repos='https://cran.rstudio.com', configure.args='--with-mpi=/cxfs/projects/root/opt/mpi/gcc/openmpi-1.5.5')

suppressMessages(library(Rmpi))
suppressMessages(library(snow))

mpirank <- mpi.comm.rank(0)    # just FYI
ndsvpid <- Sys.getenv("OMPI_MCA_ns_nds_vpid")

cat("Launching master (OMPI_MCA_ns_nds_vpid=", ndsvpid, " mpi rank=",     mpirank, ")\n")

if (mpirank == "0") {                   # are we master ?
   cat("Launching master (OMPI_MCA_ns_nds_vpid=", ndsvpid, " mpi rank=",     mpirank, ")\n")
   makeMPIcluster()
} else {                                # or are we a slave ?
   cat("Launching slave with (OMPI_MCA_ns_nds_vpid=", ndsvpid, " mpi rank=", mpirank, ")\n")
   sink(file="/dev/null")
   slaveLoop(makeMPImaster())
   q()
}

## a trivial main body, but note how getMPIcluster() learns from the
## launched cluster how many nodes are available
cl <- getMPIcluster()
clusterEvalQ(cl, options("digits.secs"=3))      ## use millisecond
## granularity
res <- clusterCall(cl, function() {Sys.sleep(10);paste(Sys.info()["nodename"], format(Sys.time()))})
print(do.call(rbind,res))
stopCluster(cl)

Seems to be working fairly well with this.

salloc -t 10:00 -p exper -A cida -n 20 Rscript mpi.r

I think all that OMPI_MCA_ns_nds_vpid can go. Didn't seem to be working, though using mpirank worked.

[2015-09-14] Creating temporary ramdisks to accelerate IO bound jobs

If you have a job that takes relatively little ram and can benefit from faster hard drive IO, it may be helpful to create a ramdisk mount where temporary files can be stored. It has to be managed well as the temporary space will invariably not be super large, but it can be helpful in certain circumstances.

Using parallel-ssh as discussed in the previous post #3, we can very simply set this up

parallel-ssh -h hosts.txt -l root -i -A -t 120 mkdir /mnt/ramdisk
parallel-ssh -h hosts.txt -l root -i -A -t 120 mount -t tmpfs -o size=16g tmpfs /mnt/ramdisk

I have a bit of extra RAM, so I set them to be 16 gigs. This is a per-machine temp directory, so you need to make sure your different jobs won't trample each other.

Then in my R code, I do something like this to be robust to the existence of a ramdisk.

  fastdir = tempdir()
  if(file.exists('/mnt/ramdisk')){
    fastdir = paste0('/mnt/ramdisk/', sample(1:1e6, 1))
    dir.create(fastdir)
  }

Then, somewhere towards the end of the sim (maybe in a try-catch finally area) I put an unlink(run_dir, recursive=TRUE) to get rid of temporary files before going to the next job.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.