Giter Club home page Giter Club logo

auks's People

Contributors

3xx0 avatar arno avatar fdiakh avatar fihuer avatar hautreux avatar kenshin33 avatar pllopis avatar rezib avatar robberteggermont avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

auks's Issues

Auks API request failed : krb5 cred : unable to read credential cache

this is the same as hautreux/auks#61

I've just hit this or a similar problem (on 0.5.3).
starting any job I get :
Auks API request failed : krb5 cred : unable to read credential cache

and auks -R loop fails i.e no cache renewal, the rest works just fine

AFAICS, it stems from the fact krb5_cc_default_name (which is called by krb5_cc_default) calls secure_getenv(KRB5_ENV_CCNAME) and at the point where auks is started from slurmstepd euid(uid of the user starting the job ) != uid (root), same goes for gid/egid.

either adding -C $auks_credcache to the arglist or droping the privileges to the normal user (setgid(),setuid()) inside this if fixes the problem.

File based ticket caches on RHEL8 with gssproxy running

We have not been able to get this combination to work. With 0.4.4 and some patches, we have had it working on RHEL7 for several years by using the AUKS_PRIV_CCACHE_APPEND environment variable to force auks to use a different cache than root's default (/tmp/krb5cc_0). The problem with auks using root's cache is that gssproxy also uses it and overwrites the file with something incompatible. So having auks use /tmp/krb5cc_0_auks avoids that issue.

0.4.4 won't compile on RHEL8. Or at least not easily. It looks like gcc jumped so many versions since RHEL7 that there is a problem with compatibility.

With 0.5.0, 0.5.3, and 0.5.4 it seems like AUKS_PRIV_CCACHE_APPEND was removed (though it is still in the man page for aukspriv).

Has anyone gotten this combination working? We could really use some advice. I think our next step might be to try and patch the code ourselves to restore AUKS_PRIV_CCACHE_APPEND but we are really hoping someone has already done that work.

"Stale file handle" accessing a kerberized NFS drive

Hello,
first of all, thanks for writing AUKS!

I have installed AUKS plugin in Slurm, and apparently all is going well, I submit a job to Slurm with "--auks=yes", and I can verify that AUKS adds the ticket to the cache, and that when the job runs it succeeds with a "get request" to get the ticket. But the problem is that in our environment we have our HOME directores as NFS drives that are accessed via Kerberos. When connecting via "ssh" to a machine, these drives are automounted. When running with Slurm + AUKS, despite having a valid TGT, the NFS drive is not automounted and I get "Stale file handle" error if I try to list its contents.

I suspect this is not a problem really with AUKS, but perhaps somebody has come across something similar and has pointers on where I should look to solve this?

auks cred: input buffer is too large

We ran into a similar issue as was reported a few years back
https://github.com/hautreux/auks/issues/3

We are using an Active Directory setup, and there are users that have a lots of groups
The user with the issue, had a cache ticket of 8308 bytes

We created a custom build, and changed the AUKS_CRED_DATA_MAX_LENGTH to 16384, and that solved the issue for us

Build RPM for Slurm built from source

First of all, I need to admit that I have very limited knowledge of autoconf etc., so it might be that I just doing it wrong ...

My understanding of the build process for auks is that I would create an RPM using make rpmand deploy it on my cluster. I can successfully run autoreconf -fvi && ./configure && make to build the binaries, but unfortunately, I cannot build the RPMs.

As I need to support scheduling of GPUs in Slurm, I build it from source against NVIDIA's current libraries. If I do so and run make rpm instead of make, I get the error message

error: Failed build dependencies:
slurm-devel >= 20.11.0 is needed by auks-UNKNOWN-UNKNOWN.el7.x86_64

I assume that is because I do not have slurm-devel installed from yum, because I build Slurm from source. Is there any way to work around this?

Service file for auksd should be "After=network-online.target"

When we build RPMs for RHEL8, we get a service file with "After=network.target". I assume this would be true for any distro but haven't checked. That is not late enough in the boot process and we kept having to manually restart auksd after a reboot. Changing to "After=network-online.target" seems to have fully resolved that problem. I'm not sure exactly why (maybe because slurmctld and munge come later?), but it is 100% reproducible in our environment.

Anyway, we would suggest changing from "After=network.target" to "After=network-online.target" for the auksd service file.

Problems with RHEL8 and gssproxy

The issue asking about file based caches in this scenario hasn't garnered any responses so we have been trying to use KCM caches. Unfortunately, a similar (maybe the same) problem exists.

We have been using auks (0.4.4-1 with some patches) for a couple years on RHEL7 with Kerberized NFS homes. We are having lots of trouble with auks 0.5.4 on RHEL8. The issue is that when the gssproxy service is running it creates a cache in root's collection for the principal HOSTNAME$@realm in addition to the one for host/hostname.domain@REALM that auks uses. Here is an example.

[root@dcompute ~]# klist -A
Ticket cache: KCM:0
Default principal: host/compute01.dartmouth.edu@REALM

Valid starting Expires Service principal
09/06/2023 14:39:49 09/07/2023 00:39:48 krbtgt/REALM@REALM
renew until 10/06/2023 14:39:48

Ticket cache: KCM:0:57660
Default principal: COMPUTE01$@realm

Valid starting Expires Service principal
12/31/1969 19:00:00 12/31/1969 19:00:00 Encrypted/Credentials/v1@X-GSSPROXY:

When that gssproxy cache exists on a compute node, it cannot retrieve a credential using auks. If we delete the gssproxy cache (with kdestroy -c as root), compute nodes can retrieve credentials. e.g. jobs can be submitted with that cache deleted and seem to run normally until gssproxy puts the cache back. It's not clear exactly when gssproxy creates this cache (which makes testing this a little frustrating) but eventually (less than an hour) it always puts it back.

We have tried disabling gssproxy, but it seems to start regardless of the enabled/disabled status of the gssproxy service or the GSS_USE_PROXY setting in /etc/sysconfig/nfs or the existence of /etc/gssproxy/99-nfs-client.conf. Uninstalling gssproxy wants to uninstall nfs-utils but we definitely need NFS! So that is a no-go.

This also causes problems on the system running auksd and auksdrenewer. When the gssproxy cache is there, "auks -D" fails. If we delete the cache, "auks -D" starts working again. It also seems like auksdrenewer is unable to dump credentials when the gssproxy cache is in place.

When the gssproxy ticket is deleted from the compute nodes and the auksd node, things in auks otherwise seem to be working correctly. e.g. Submitting a job successfully adds the TGT to auksd. auksdrenewer renews credentials (verified by looking in /var/cache/auks at the tickets stashed there), jobs run and have access to Kerberized NFS home directories.

We would really love to hear from people using auks on RHEL8 and understand how you got it to work. We can post configurations etc. if that would be useful but it does seem like we have zeroed in on the core issue here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.