Comments (11)
Yep, this is has been raised by Jonathon Anderson on the Slurm mailing list and I'm currently working on it. How many nodes in your cluster ? The reason I ask is so I can recreate the issue.
from pyslurm.
I'll have to check but we're above 40 nodes at that point.
Aurélien
On Dec 9, 2015, at 2:28 PM, Gingergeek [email protected] wrote:
Yep, this is has been raised by Jonathon Anderson on the Slurm mailing list and I'm currently working on it. How many nodes in your cluster ? The reason I ask is so I can recreate the issue.
—
Reply to this email directly or view it on GitHub #52 (comment).
from pyslurm.
Thanks, sometime ago, I changed the code so I could track and manage the change in data (various classes) from the Slurm API so that looped monitoring programs could be used. Clearly I have a memory issue here but this was not showing up on my small test VM.
from pyslurm.
Aurélien,
I have traced a possible cause to a defined node in the config which does not truly exist. The Slurm API returns a record for it but with no data content. A quick check for a node name of NULL in the record was all that was necessary, does this match your configuration ?
Mark
from pyslurm.
Mark,
I'll have to get back to you on this. Are you asking if one of the node has a name 'NULL'?
Aurélien
On Dec 9, 2015, at 4:17 PM, Gingergeek [email protected] wrote:
Aurélien,
I have traced a possible cause to a defined node in the config which does not truly exist. The Slurm API returns a record for it but with no data content. A quick check for a node name of NULL in the record was all that was necessary, does this match your configuration ?Mark
—
Reply to this email directly or view it on GitHub #52 (comment).
from pyslurm.
Aurélien,
It was a statement that we have traced the problem to a bug in PySlurm and not handling a node record with an empty nodename entry. I think this is where nodes have been entered in the slurm config but do not actually exist in the hosts file or DNS. My question was if it is possible that you have nodes in your config that do not actually exist ?
Mark
from pyslurm.
Aurélien,
If you pull the latest 14.11.5 the patch as discussed previously has been committed. Please let me know if this now works and close the ticket if you are happy.
Mark
from pyslurm.
Maybe... I do not have a complete understanding of how the cluster is managed, unfortunately.
Aurélien
On Dec 10, 2015, at 1:17 AM, Gingergeek [email protected] wrote:
Aurélien,
It was a statement that we have traced the problem to a bug in PySlurm and not handling a node record with an empty nodename entry. I think this is where nodes have been entered in the slurm config but do not actually exist in the hosts file or DNS. My question was if it is possible that you have nodes in your config that do not actually exist ?Mark
—
Reply to this email directly or view it on GitHub.
from pyslurm.
Aurélien,
This should now be resolved with the latest pyslurm-14.11.5 commits. So please test and close the ticket if it is fixed.
Mark
from pyslurm.
Aurélien,
Have you had chance to test and confirm it is now resolved ?
Mark
from pyslurm.
Closing this as resolved
from pyslurm.
Related Issues (20)
- pyslurm throw KeyError 5039 when statistics().get() called HOT 2
- symbol lookup error: undefined symbol: data_init HOT 7
- Installation fails for v22.5.x HOT 2
- Submittion of a batch job will be failed when argument "work_dir" contains a "_" HOT 1
- Unclear that `Jobs.load()` returns a dictionary. HOT 2
- Missing `Job` fields compared to the docs HOT 2
- slurmdbd: Implement Association API
- ValueError: invalid literal for int() with base 10: 'milton.m' HOT 1
- Copy/Paste error in Job class memory property HOT 1
- Cannot seem to get allocated gres info from Node HOT 4
- Node in Nodes collection do not report a MIXED state, only IDLE or ALLOCATED HOT 5
- A job is found with JobFilter if it is running on start_time HOT 6
- question: install on cluster without slurm-devel package and without root access HOT 2
- job mem returns as "None" HOT 3
- error: ‘KILL_JOB_ARRAY’ undeclared HOT 1
- Slurm 23.11 support HOT 3
- pyslurm is installation failed HOT 2
- Particular value of gres is causing cstr to crash HOT 1
- [guidance request] 'gres' : 'gpu' does not have the same behavior as when using bash HOT 2
- Problem in setup.py get_version() - packaging.version object is missing .major and .minor attributes in earlier module versions under Python 3.6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyslurm.