I have been using SGE until now, but now trying to use PBS Pro with an OpenHPC cluster. Out of the box i am getting a default queue.
# qmgr
Max open servers: 49
Qmgr: print server
#
# Create queues and set their attributes.
#
#
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
#
# Set server attributes.
#
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server resources_default.place = scatter
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server default_qsub_arguments = -V
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server license_count = Avail_Global:1000000 Avail_Local:1000000 Used:0 High_Use:0 Avail_Sockets:1000000 Unused_Sockets:1000000
set server eligible_time_enable = False
set server job_history_enable = True
set server max_concurrent_provision = 5
I have a few questions:
a) The computer nodes are number from c01 to c20 and all are part of workq. How can i allocate only c01 to c10 to workq and c11 to c20 to a new queue called cfdq.
b) Are there any pbs pro setup guides for ABAQUS and ANSYS Fluent?
c) Is there any quick howto/cheat sheet for Pbs Pro administration?
Hello this is something I have been using recently. Very simple and crude, but it works:
set node compute-1-0-3 queue = training
users then submit with -W group_list like this:
qsub -I -lselect=1 -lWalltime=00:01:00 -q training -W group_list=itea_lille-kurs
Create and define queue training
create queue training
set queue training queue_type = Execution
set queue training resources_max.walltime = 10:00:00
set queue training acl_group_enable = True
set queue training acl_groups = imf_lille-tma4280
set queue training acl_groups += itea_lille-kurs
set queue training enabled = True
set queue training started = True
Qmgr: print node compute-1-0-3
Create nodes and set their properties.
Create and define node compute-1-0-3
create node compute-1-0-3 Mom=compute-1-0-3
set node compute-1-0-3 state = free
set node compute-1-0-3 resources_available.arch = linux
set node compute-1-0-3 resources_available.host = compute-1-0-3
set node compute-1-0-3 resources_available.mem = 131746108kb
set node compute-1-0-3 resources_available.ncpus = 20
set node compute-1-0-3 resources_available.vnode = compute-1-0-3
set node compute-1-0-3 queue = training
set node compute-1-0-3 resv_enable = True
set node compute-1-0-3 sharing = default_shared
Soln: The below setup would make sure , any jobs submitted to workq would be scheduled on c01-c10 nodes.
Any jobs submitted to cfdq would be scheduled on to c11-c20 nodes
for i in {01…10};do qmgr -c “set node c$i resources_available.queue_name+=workq”;done
for i in {11…20};do qmgr -c “set node c$i resources_available.queue_name+=cfdq”;done
Soln: The GUI environment should be integrated with PBS , the configuration files within these applications have to updated to reflect the path to PBS binaries. If you have the batch command line and parameters that go with these applications, then they can be executed using batch + PBS Directives.
Soln 1: Continuation of the above solution
qmgr -c “c q gpuq queue_type=e,enabled=t, started=t”
qmgr -c “s q gpuq default_chunk.queue_name=gpuq”
for i in {09…10};do qmgr -c “set node c$i resources_available.queue_name+=gpuq”;done
Submit jobs to gpuq , then they will go to cn09 and cn10
Soln2:
Create resource
qmgr -c “c r enablegpu type=boolean,flag=h”
source /etc/pbs.conf ; edit $PBS_HOME/sched_priv/sched_config
add queue_name to the resources: “…,enablegpu”
kill -HUP
For all the GPU nodes set the below
qmgr -c ‘s n c09 resources_available.enablegpu=true’
qmgr -c ‘s n c10 resources_available.enablegpu=true’
For rest of the nodes which does not have gpus
qmgr -c ‘s n NODENAME resources_available.enablegpu=false’
submit a job to go on the gpu node as below
qsug -l select=1:ncpus=1:gpuenable=true – /bin/sleep 100
Thanks, I went with Soln 2. The nodes c09, c10 have 1 gpu each. How can i make the gpu as a consumable resource so that if a job is already using a gpu on c09 and c10, it doesnt start but gets into a queue?
Also, I noticed couple of typos in your instructions. qsub -l select=1:ncpus=1:enablegpu=true -- /bin/sleep 100
and
only double periods are needed in for i in {09..10};
add “ngpus” to the resources: line of the sched_config file
kill -HUP (PID of the scheduler)
For all the nodes which has GPUs
qmgr -c “set node GPUNODE resources_available.ngpus=1”
For all the nodes which has no GPUs:
qmgr -c “set node COMPUTENODE resources_available.ngpus=0”
6 Submit a job as below:
qsub -l select=1:ncpus=1:ngpus=1:engablegpu=true – /bin/sleep 100
When it runs on the gpu node, pbsnodes , then it should give the resource_available.ngpus and resources_assigned.ngpus
I have followed the excellent steps described earlier by Adarsh to make two queues. However, whenever I submit my multi-node job to one queue it will get assigned nodes from the other queue…
Here are the steps I followed to make the two queues: