Qdel delete all user job

Hi Admin’s
Recently I have started work with PBS Version 19.1.1 everything running good. I have multiple user e.g User1, User2
Issue is user1 can qdel job of user2 and vice-versa. Why it’s getting I’m not getting why. Please help to resolve same.

By default standard users cannot delete other users jobs , only “root” user can delete other users jobs.
If user1 and user2 are added as managers / operators then there is a possibility.

If you can share the output of the below commands

  • qmgr -c ‘p s’ output
  • cat /etc/sudoers | grep user
  • qstat -Bf

I have check /etc/sudoers there is no entry for any user.
1. qmgr -c "p s"

Create resources and set their properties.

Create and define resource ngpus

create resource ngpus
set resource ngpus type = long
set resource ngpus flag = hn

Create and define resource gpu_id

create resource gpu_id
set resource gpu_id type = string
set resource gpu_id flag = h

Create and define queue workq

create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True

Create and define queue gpu

create queue gpu
set queue gpu queue_type = Execution
set queue gpu enabled = True
set queue gpu started = True

Set server attributes.

set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server eligible_time_enable = False
set server max_concurrent_provision = 5

2. qstat -Bf
Server: master
server_state = Active
server_host = master.hpc
scheduling = True
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
default_queue = workq
log_events = 511
mail_from = adm
query_other_jobs = True
resources_default.ncpus = 1
default_chunk.ncpus = 1
resources_assigned.mem = 0gb
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
scheduler_iteration = 600
FLicenses = 20000000
resv_enable = True
node_fail_requeue = 310
max_array_size = 10000
pbs_license_min = 0
pbs_license_max = 2147483647
pbs_license_linger_time = 31536000
license_count = Avail_Global:10000000 Avail_Local:10000000 Used:0 High_Use:
0
pbs_version = 18.1.4
eligible_time_enable = False
max_concurrent_provision = 5
power_provisioning = False

Thank you for sharing the output, the configuration is correct and you are running 18.1.4 .
If users are able to delete each other’s job by default, then it seems a bug to me. Please increase the log level , submit jobs as user1 , user2 , delete jobs as user1 and user2 , collect/upload the PBS logs and create a ticket.

It is worth performing an ruserok() test to see what the OS says about the users/systems involved. When checking to see if remote_user1@remote_host can delete local_user2’s job, PBS will run the ruserok() system function (which consults hosts.equiv/rhosts, etc.) to allow/deny this. It can be tested with the following program run on the PBS server host:

/*
   Two use cases:
    1) User submitting job from remote host to server getting unexpected
        "Bad UID" message. That is, user doesn't have access when he thinks
        he should.
    2) User(s) can delete, etc other user(s) jobs. That is, one user is able
        to act as what he thinks is a different user, server sees them as
        being equivalent.

Build with "cc ruserok.c -o ruserok"

Usage (run on the PBS server system):

ruserok remote_host remote_user1 local_user2

where:

remote_host:  the host from which the job is being submitted, or where the PBS client command is issued

remote_user1: the username of the user submitting the job, or issuing the client command

local_user2: the username of the user remote_user1 is trying to submit the job as, or owner of the job that remote_user1 is trying to act on with the client command

*/


#include <errno.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
        int rc;
        char hn[257];

        if (argc != 4) {
                fprintf(stderr, "Usage: %s remote_host remote_user1 local_user2\n", argv[0]);
                return 1;
        }
        if (gethostname(hn, 256) < 0) {
                perror("unable to get hostname");
                return 2;
        }
        hn[256] = '\0';

        printf("on local host %s, from remote host %s\n", hn, argv[1]);
        rc = ruserok(argv[1], 0, argv[2], argv[3]);
        if (rc == 0)
                printf("remote user %s is allowed access as local user %s\n", argv[2], argv[3]);
        else
                printf("remote user %s is denied access as local user %s\n", argv[2], argv[3]);

        return 0;
}
1 Like

Yes… I just roll back with old version to test…
Issue is resolved: Actually every user coming from central ldap & there is issue in nsswitch. After removing entry of (sss) everything now working fine.
Thanks !!

1 Like

I have tried this too also and its was also working good… I have authenticated every user from master to node and it work fine

1 Like