I’m experimenting with OpenPBS nearly top-of-tree (commit 072689ab). I found that the scheduler would not start jobs, reporting “No available resources on nodes”. Digging into the scheduler, it turns out this is due to commit b5459cbc Node buckets does not check for unlicensed nodes by arungrover · Pull Request #2248 · openpbs/openpbs · GitHub. This is because all nodes end up with lic_lock == 0, and the commit now causes the scheduler to ignore them.
My question is what mechanism will allow lic_lock to get set non-zero on an OSS build-from-scratch? I possibly have something configured incorrectly, but I cannot figure out what. There are also local mods involved, so I could have broken something.
qstat -Bf | grep -i licen
pbs_license_min = 0
pbs_license_max = 2147483647
pbs_license_linger_time = 31536000
license_count = Avail_Global:1000000 Avail_Local:1000000 Used:0 High_Use:0
I have a hack work-around, so this is not a show-stopper.
Thanks.
Could you please share the output of the below commands:
pbsnodes -av
qstat -Bf
$ pbsnodes -av
node3
Mom = node3.local
Port = 15002
pbs_version = 20.0.0_nas_6bcbb
ntype = PBS
state = free
pcpus = 2
resv = R11.server2
resources_available.arch = linux
resources_available.bigmem = False
resources_available.host = node3
resources_available.mem = 1014600kb
resources_available.ncpus = 2
resources_available.vnode = node3
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Sat May 15 11:20:24 2021
last_used_time = Sat May 15 11:20:24 2021
server_instance_id = server2.local:15001
node4
Mom = node4.local
Port = 15002
pbs_version = unavailable
ntype = PBS
state = state-unknown,down
resources_available.host = node4
resources_available.vnode = node4
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
server_instance_id = server2.local:15001
$ qstat -Bf
Server: server2
server_state = Active
server_host = server2.local
scheduling = True
total_jobs = 29
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
managers = dtalcott@*
default_queue = workq
log_events = 511
mailer = /usr/sbin/sendmail
mail_from = adm
query_other_jobs = True
resources_default.ncpus = 1
resources_default.walltime = 01:00:00
default_chunk.ncpus = 1
resources_assigned.mem = 0mb
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
scheduler_iteration = 600
flatuid = True
resv_enable = True
node_fail_requeue = 310
max_array_size = 10000
pbs_license_min = 0
pbs_license_max = 2147483647
pbs_license_linger_time = 31536000
license_count = Avail_Global:1000000 Avail_Local:1000000 Used:0 High_Use:0
pbs_version = 20.0.0_nas_6bcbb
eligible_time_enable = True
job_history_enable = True
job_history_duration = 672:00:00
max_concurrent_provision = 5
power_provisioning = False
max_job_sequence_id = 9999999
This is on CentOS-7.
Thank you for sharing the information here. I might be on a different page here
Looking at your pbsnodes -av output, one node (node3) is part of reservation queue resv = R11.server2 and node4 is down. This might be the reason for the scheduler message .
Good thinking, but does not apply in this case. The reservation starts way out in July, but my test jobs ask for only 2000 seconds. With a hack that sets lic_lock to 1, the jobs run okay.
My problem is that I cannot figure out how lic_lock is supposed to get set for non-commercial builds of OpenPBS. I tried using qmgr to set the license type, but that is rejected:
/opt/pbs/bin/qmgr -c 'set node node3 license=l'
qmgr obj=node3 svr=default: Undefined attribute
qmgr: Error (15002) returned from server
I’m not finding anything relevant in the Licensing Guide. Should I be running a license server, even for OpenPBS?
Thanks.
I’ve only seen this issue when installing commercial PBS and forgetting to point it to the right license server, but I was not using the buckets code path. Looking at the code, I think I agree with you, this looks like a bug in OpenPBS. You certainly don’t need to set up a license server for OpenPBS. Do you mind filing a Github Issue for this (Issues · openpbs/openpbs · GitHub)?