Hi,
I wrote a python PBS Hook to manage the scratch space disk on each host (i.e., each compute node). That means that each host has a given maximum allowed space on the scratch disk filesystem. Each job can reserve a portion of such disk space. If no hosts have the requested disk space while job submitting, that job is queued.
I created the scratch custom resource as followings:
qmgr -c “create resource scratch type=size, flag=h”
and I added it in the sched_config file.
The scratch custom resource is properly updated on the EXECHOST_PERIODIC event.
Everything works well with the exception of a very specific test case (in my simulation environment):
Suppose to have 2 jobs which reserve a scratch space each, so as the overall scratch space exceeds the maximum available scratch space for a given host (the other requested compute resources can be accomplished by one host instead).
Also, suppose that such jobs are submitted almost simultaneously.
Sometimes, the two jobs are wrongly executed on the same host. While scheduling the latter submitted job, the scheduler checks the scratch custom resource which has not updated yet on the EXECHOST_PERIODIC event.
For hosts with only one vnode I found the following workaround (which seems to me be fine): the scratch resource is also updated on the EXECJOB_BEGIN event. Note that the pbs.event().vnode_list contains the current vnode which is the unique available vnode on the host.
If the latter submitted job is wrongly scheduled, it is rejected and considered for execution in the next scheduling cycle.
For hosts with more vnodes the pbs.event().vnode_list contains the current vnode which does not correspond to the host, but the scratch custom resource is host level. Thus, I am not able to update properly such custom resource.
Do you have any hints?
Thanks in advance for helping.