Giter Club home page Giter Club logo

Comments (18)

fabothch avatar fabothch commented on July 19, 2024

@debruinb
what I changed from version 1.27 to 2.0 is using multiprocessing.process instead of threading.Thread. In my opinion this should not affect the program terminating correctly.
A quick and dirty solution is to write sys.exit(0) at the end of the script:

if __name__ == "__main__":
    main(argv=None)
    sys.exit(0)

I am currently working on a rewrite of enso and will update this soon.
Please reply if this fixed the issue and I will add it to the current version.

best,

Fabian

from enso.

debruinb avatar debruinb commented on July 19, 2024

Thanks for the suggestion. However, unfortunately adding sys.exit(0) did not help.
Actually I do seem to have the same problem on the command line (enso.py -run > enso.out 2> enso.error). After enso has finished (checking with top) the command line remains unresponsive until typing crtl+C (except if I run in the background with "enso.py -run > enso.out 2> enso.error $", but I cannot use that in my slurm scripts because then other command get executed before enso is finished and the files don't get copied back anyway).

from enso.

debruinb avatar debruinb commented on July 19, 2024

I should maybe add the information that I'm using enso with turbomole 7.5 (with which enso version 1.2.7 seems to work fine).

from enso.

fabothch avatar fabothch commented on July 19, 2024

Ok, that was worth a try!
Just to clarify. The output in the file enso.out is complete and no line is missing?
Are you using export PYTHONUNBUFFERED=1 ?
Do you see any python processes still running in top? I am looking into it, and try to reproduce it.

Can you use something like this in slurm?

   enso.py -run > enso.out 
   pid=$!
   wait $pid

from enso.

fabothch avatar fabothch commented on July 19, 2024

I should maybe add the information that I'm using enso with turbomole 7.5 (with which enso version 1.2.7 seems to work fine).

Ok, I have not tested TM 7.5 so far. I will give it a try later on.
Do you see any turbomole related processes running in the background?

from enso.

debruinb avatar debruinb commented on July 19, 2024

No the funny thing is that enso steers the calcualtions correctly (as it seems). Turbomole calculates the shifts and coupling constants correctly. If I login on he node (or run enso standalone from the command line) and copy back the directory to my home folders, all expected files are generated and anmr works fine. No remaining ghost jobs of turbomole or anything if I login to the node and check with top (during the running calculation they are running of course). The only problem seems to be that enso somehow doesn't return a term signal and hence the files are not copied back when using slurm.

from enso.

debruinb avatar debruinb commented on July 19, 2024

The slurm file to submit is this one:

#!/bin/bash
#SBATCH --mem=MaxMemPerNode
#SBATCH --export=ALL
#SBATCH --cpus-per-task=16
#SBATCH -p short
#SBATCH --time=00:05:00

wait
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_STACKSIZE=1000m
ulimit -s unlimited
export PARA_ARCH=SMP
source $TURBODIR/Config_turbo_env
export PARNODES=$SLURM_CPUS_PER_TASK

wait
WORKDIR=/scratch/$USER/ethane_enso_only_standalone-${SLURM_JOBID%%.}
mkdir -p $WORKDIR
wait
cd $WORKDIR
wait
cp -rf $SLURM_SUBMIT_DIR/
.
wait
crest -nmr -g chcl3 -chrg 0
enso.py
export PYTHONUNBUFFERED=1
enso.py -run > enso.out 2> enso.error
wait
sleep 30
cp -rf * $SLURM_SUBMIT_DIR/
wait
rm -rf $WORKDIR
wait
cd $SLURM_SUBMIT_DIR
act_tag=date|sed "s/ / 0/g"|cut -d" " -f2,3,6 --output-delimiter="_"
echo $SLURM_SUBMIT_DIR>> /home/whoami/Jobs_finished.$act_tag
wait

from enso.

debruinb avatar debruinb commented on July 19, 2024

The above script doesn't copy back results to my home folder.

Running the above script in steps, the following does work:
crest -nmr -g chcl3 -chrg 0
enso.py
(removing enso.py -run > enso.out 2> enso.error)

It goes wrong in a subsequent step with:
export PYTHONUNBUFFERED=1
enso.py -run > enso.out 2> enso.error

If the enso line is included no data are copied back anymore.

from enso.

fabothch avatar fabothch commented on July 19, 2024

To be honest, I have never worked with slurm and can only guess if it is enso or slurm related.
I can not reproduce the 'missing' term signal after execution of
enso.py -run > enso.out
with either TM version 7.4.1 or version 7.5 (I only checked part1) .
My terminal does correspond instantly.
Which python version are you using?

from enso.

debruinb avatar debruinb commented on July 19, 2024

Hmm, that's strange. I'm using python 3.6.6.
I can confirm that on the command line (no slurm) there is no problem with only part 1 (terminal is responsive after job is finished).
But with part1-part 4 it's different:
After the job finishes top shows no running jobs, but the terminal remains non-responsive.
"ps -ef | grep bdebruin" gives me:
bdebruin 30795 11776 0 14:13 pts/49 00:00:00 python3 /home/bdebruin/software/XTB_633/enso.py -run
So enso is still running in the background, while the calculations are done. After ctrl+c the ghost job dispears (ps -ef | grep bdebruin).
I will test part 2-4 separately.

from enso.

debruinb avatar debruinb commented on July 19, 2024

part 1+2+3 work fine. The problem seems to occur in part 4.

from enso.

fabothch avatar fabothch commented on July 19, 2024

ok, that narrows it down! I am looking at this now!

from enso.

fabothch avatar fabothch commented on July 19, 2024

The escf.out output has changed from TM 7.4.1 to TM 7.5 this affects the reading of the coupling constants. This is done to get the files nmrprop.dat which are written to the NMR folders and only contain shielding constants and coupling constants. Can you have a look if these files nmrprop.dat are written?

from enso.

debruinb avatar debruinb commented on July 19, 2024

Looks like you found the problem. I can't find nmrprop.dat in my NMR folders.

from enso.

fabothch avatar fabothch commented on July 19, 2024

perfekt! I seperated the calculation and the readout (since a change in the printout can easily make the readout routine flawed) this explains why your calculations run smoothly and the printout is there but enso doesn't terminate.

This is easy to fix! Thanks for your patience and reporting the bug!

from enso.

debruinb avatar debruinb commented on July 19, 2024

Great! Looking forward to test further once fixed (no hurry).

from enso.

fabothch avatar fabothch commented on July 19, 2024

I updated the master branch (not the release).

from enso.

debruinb avatar debruinb commented on July 19, 2024

Great! This solved everything! Version 2.0.3 works fine. Thanks a lot for the fix.

from enso.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.