Comments (7)
Hi,
oh yeah, the slurm_init
call in pyslurm/__init__.py
only exists on the most recent commit on the main
branch (or 23.2.x branch), which I recommend to use (as it already includes a bit of API rework, there will be a new release soon though).
Removing slurm_init
and then making an API call that potentially segfaults is indeed expected - I just wanted to make sure that the slurm_init
call is actually the point where the lookup errors is brought up.
I will try to reproduce it also on my test cluster and do some tests
from pyslurm.
Just jumping in to say that the branch you linked worked great, thanks a lot for the quick fix!
from pyslurm.
Hi,
mh interesting...
If you do nm -D /usr/lib64/slurm/cli_filter_lua.so | grep data_init
, it's really not showing up right?
Could you check to see when you manually remove the slurm_init
call from pyslurm/__init__.py
and reinstall whether the error is gone?
from pyslurm.
Thanks for the help.
Running nm
does find it:
➜ nm -D /usr/lib64/slurm/cli_filter_lua.so | grep data_init
U data_init
I couldn't find a slurm_init
call in pyslurm/__init__.py
, but there was one at the last line in pyslurm/pyslurm.pyx
, removing that seems to let me load the library, but then a call to pyslurm.slurmdb_jobs()
causes a segfault (maybe not unexpected?).
Cheers,
Kris
from pyslurm.
Hi again,
As I found out, that error was introduced with slurm 23.02.
Basically, in 23.02, they now explicitly load any client plugins in slurm_init
, such as cli_filter
, that may be required to interact with the API. Problem is however, as the error indicates, a symbol called data_init
is expected to be somewhere in a shared-library (as indicated by the U
(undefined), it isn't in cli_filter_lua.so
directly).
This symbol is in libslurmfull.so
, which basically contains the public API + all internal functions, and every slurm tool like squeue, sbatch, slurmctld, slurmdbd, ...
links to that one. Thats why no error appears when using the slurm tools.
It is however not in libslurm.so
, which is usually the recommended library to link against to interact with slurm. And because of that, basically any client application linking with libslurm.so
in 23.2, like pyslurm, and calling slurm_init
(which is mandatory when doing API calls) is broken. If you have some of the tools from the slurm-contribs package installed, like seff
, that should also yield the same error.
The bug however has already been reported: https://bugs.schedmd.com/show_bug.cgi?id=16503
(Not sure if its already fixed in 23.02.2, but I don't think so)
But I have been thinking about switching back to libslurmfull
for pyslurm anyway actually, as it might make certain things a bit easier to implement in the future.
from pyslurm.
You can build from this branch for now if you want, it links with libslurmfull
and the error should go away
from pyslurm.
Hi @KrisDavie ,
just wanted to let you know that the issue with data_init
symbol missing should be fixed in Slurm 23.02.2
(by this commit)
If your cluster already updated to this version, you can continue to use the normal pyslurm releases instead of the branch I made where it links to libslurmfull
Also a note on that: I planned on actually merging the change where we link back with libslurmfull
to the main branch, but I noticed a specific test was failing.
The issue can be triggered with this for example:
python -c "import pyslurm; gg = pyslurm.utils.nodelist_from_range_str('node[001:002]'); print(gg)";
You should probably see some weird unknown error if you are still using the branch and 23.02.1. Well I have absolutely no idea why its happening with libslurmfull
and not libslurm
- it also only happens in a python context (can't reproduce with a simple c program that does the same)
So just a heads up: The version I provided via the branch might not be 100% stable in some cases and slurm 23.02.2 is the minimum requirement to use the normal pyslurm 23.2.x releases if the cluster uses the cli_filter
functionality.
from pyslurm.
Related Issues (20)
- pyslurm throw KeyError 5039 when statistics().get() called HOT 2
- Installation fails for v22.5.x HOT 2
- Submittion of a batch job will be failed when argument "work_dir" contains a "_" HOT 1
- Unclear that `Jobs.load()` returns a dictionary. HOT 2
- Missing `Job` fields compared to the docs HOT 2
- slurmdbd: Implement Association API
- ValueError: invalid literal for int() with base 10: 'milton.m' HOT 1
- Copy/Paste error in Job class memory property HOT 1
- Cannot seem to get allocated gres info from Node HOT 4
- Node in Nodes collection do not report a MIXED state, only IDLE or ALLOCATED HOT 5
- A job is found with JobFilter if it is running on start_time HOT 6
- question: install on cluster without slurm-devel package and without root access HOT 2
- job mem returns as "None" HOT 3
- error: ‘KILL_JOB_ARRAY’ undeclared HOT 1
- Slurm 23.11 support HOT 3
- pyslurm is installation failed HOT 2
- Particular value of gres is causing cstr to crash HOT 1
- [guidance request] 'gres' : 'gpu' does not have the same behavior as when using bash HOT 2
- Problem in setup.py get_version() - packaging.version object is missing .major and .minor attributes in earlier module versions under Python 3.6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyslurm.