Hi All,
We are facing an intermitent issue with our pbs installation.
We have following queues configured:
Normal (default)
Large
Xlarge
XXlarge
Each queue has separate nodes configured to receive jobs only from that queue. This is achieved with the following command:
qmgr -c “set node queue=”
Problem: Sometimes when a job is submitted to the large queue, PBS places the job in a machine configured to accept jobs only from the normal(default queue).
The command qstat -a shows that the job is running in the large queue, however, pbsnodes -a and qstat -f <job_id> shows that the machine which is running the job has its queue parameter set to Normal.
While PBS works properly most of the time, this issue appears once in a while and then gets fixed automatically, and we are not able to reproduce the issue.
Any idea what may be causing this? What can be done to prevent this from happening?
Here is the server configuration:
Create resources and set their properties.
Create and define resource slot_type
create resource slot_type
set resource slot_type type = string
set resource slot_type flag = h
Create and define resource cpuf
create resource cpuf
set resource cpuf type = string
set resource cpuf flag = h
Create and define resource ndisks
create resource ndisks
set resource ndisks type = string
set resource ndisks flag = h
Create and define resource SPEED
create resource SPEED
set resource SPEED type = string
set resource SPEED flag = h
Create and define resource physical_srv
create resource physical_srv
set resource physical_srv type = string
set resource physical_srv flag = h
Create and define resource OSNAME
create resource OSNAME
set resource OSNAME type = string
set resource OSNAME flag = h
Create and define resource model
create resource model
set resource model type = string
set resource model flag = h
Create queues and set their attributes.
Create and define queue xxlarge
create queue xxlarge
set queue xxlarge queue_type = Execution
set queue xxlarge resources_default.slot_type = xxlarge
set queue xxlarge enabled = True
set queue xxlarge started = True
Create and define queue xlarge
create queue xlarge
set queue xlarge queue_type = Execution
set queue xlarge resources_default.slot_type = xlarge
set queue xlarge enabled = True
set queue xlarge started = True
Create and define queue normal
create queue normal
set queue normal queue_type = Execution
set queue normal resources_default.slot_type = execute
set queue normal enabled = True
set queue normal started = True
Create and define queue large
create queue large
set queue large queue_type = Execution
set queue large resources_default.slot_type = large
set queue large enabled = True
set queue large started = True
Set server attributes.
set server scheduling = True
set server managers = root@*
set server default_queue = normal
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server resources_default.slot_type = execute
set server default_chunk.ncpus = 1
set server scheduler_iteration = 15
set server flatuid = True
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server eligible_time_enable = False
set server job_history_enable = True
set server max_concurrent_provision = 5