Static server level custom boolean resource

We want to be able to flag when a file system goes down and avoid running jobs that need that file system. The plan was to use a server boolean custom resource. It would be true most of the time and we would set it to false when the file system is down. Here is what I tried to do:

#
# Create and define resource home_fs
#
create resource home_fs
set resource home_fs type = boolean
set resource home_fs flag = h
qmgr -c "set server resources_available.home_fs=true"

I edited /var/spool/pbs/sched_priv/sched_config and added home_fs to the resources line and then HUPed the scheduler.

I then ran:
qsub -l home_fs=true -l walltime=05:00 -- /usr/bin/hostname

qstat -f showed:
Resource_List.select = 1:home_fs=True:ncpus=1
Submit_arguments = -l home_fs=true -l walltime=05:00 -- /usr/bin/hostname
comment = Can Never Run: Insufficient amount of resource: home_fs (True !=False)

Some questions:

  1. why is home_fs listed in Resource_List.select? I did not do a select, so that should have been a job wide resource, should it not? I assume that is why I am getting the True != False because it isnā€™t defined on the vnode. We donā€™t want to have to define it on every node as changing it would then be extremely inconvenient.

  2. The examples in the manual for custom resources, both static and dynamic, were consumable (counts of licenses) and either PBS keeps track of the count (static) or you need a script (dynamic). Can I use a boolean at all? If so, can I just use qmgr to change the value? If not, do I have to treat this as a dynamic resource and write a script that returns true or false?

  3. Is there a better way to do this?

Thanks,

Bill

Can you try by setting it to flag q
set resrouce home_fs flag = q

and try it again.

Thanks for the suggestion. Some additional questions:

  1. Your suggestion made me look at table 5-9 Resource Accumulation Flags on page AG-261. In this case, do I want to set the flag to q or do I want no flag at all? This is a boolean resource. The description of q says it is going to incremented by one and it must be consumable or time based. No flag is that it is not consumable which seems to be a better match?
  2. If I want no flag, how do I clear it? will qmgr -c "set resource home_fs flag = '' " work?
  3. I tried to make the change to a q and this is what I got:
(base) [allcock@edtb-01 20220216-16:57:04]> qmgr -c "set resource home_fs flag = q"
qmgr obj=home_fs svr=default: Resource busy on job

There is no job running or queued and I restarted PBS on the thought that maybe the resource was ā€œstuckā€. Same result. Any idea how I ā€œunstickā€ the resource?

Thanks,

Bill

Apologies, my suggestion was not correct here.

  1. qmgr -c "unset resource home_fs " # if this did not work
  2. a. source /etc/pbs.conf ; qterm -t quick
     b.  update or delete the resource line with respect to home_fs  in the $PBS_HOME/server_priv/resourcedef
     c.  $PBS_EXEC/sbin/pbs_server
    

Hope this works out for you.

I think it is good idea to use the node health check script ( mom perioidic hook) to periodically check the file system on the nodes and if there is node health issue (file system not accessible or full or mount not available), then set the node offline.

The reason we donā€™t want to go that route is because we have multiple file systems. If one is down, we want to avoid running jobs that need that one, but can run jobs that donā€™t.

Thanks for your help with this. I did get the resource ā€œunstuckā€, though I would love to understand why that happened in the first place. However, it is not working as I had hoped. I thought it was. The value was True, I ran a job and it worked. I set the value to False and it didnā€™t, but now I set it back to True and queued jobs didnā€™t run, but neither are new jobs being submitted. I will continue to poke at this.

Thanks again for your help.

I believe you are right, you want no flags on the resource. The reason it showed up in the select is because of flag=h and the fact that you didnā€™t submit a select. When you donā€™t submit a select, the server will create one for you based on all of the flag=h resources you have requested. If you do submit a select, you canā€™t submit any flag=h resources as job wide resources.

The reason you got the ā€˜resource is busy on jobā€™ message is that you can change very little about resource definitions when they are requested by a job in the system (or even in history). This means a submitted/history job, not just running.

You were right when you said that the job didnā€™t run because the nodes didnā€™t have it set. A node is considered to have 0 of a resource or will not match (as is the case here) if it isnā€™t set.

What Adarsh said to do by shutting down the server and changing the resourcedef file will work. It also can have some undesired effects because the server/scheduler isnā€™t going to expect things like a no-flag resource in a select.

Now onto the current problem at hand. Requesting a boolean resource at the job wide level should work fine. If the job doesnā€™t run because of it, youā€™ll see a comment like 'Insufficient resources at server level XXX (True != False)". If you donā€™t see ā€œserver levelā€ or donā€™t see the ā€œ(True != False)ā€ then there is some other reason the job is not running. What is the comment on the jobs that arenā€™t running?

Once we have the current issue ironed out, something you could consider doing is write a server_dyn_res script for each file system doing a health check on it. This means you wonā€™t have to manually set the filesystem resource as True/False. The server_dyn_res script will do it iself.

Bhroam

1 Like

FWIW, Where I used to work, we had a similar requirement, but for scratch file systems. This was with a much older version of PBS, and my unreliable memory is that I tried first using server boolean resources, but could not get the behavior we wanted. I switched to consumable resources, where a queuejob hook looked up the default scratch file system for the user and added a ā€˜-l scratchX=1ā€™ to the job. (If the job already specified some scratchX=Y values, the hook did nothing.) To enable use of file system scratchX, we set the server resources_available.scratchX to a large number (5000). To block starting jobs that requested the file system, we set resources_available.scratchX=0.

This had a few minor advantages over a simple boolean. First, some jobs did not need scratch space, so they could specify scratchX=0 and the job would not be blocked no matter what state scratchX was in. (Specifying a boolean scratchX=false for a job blocks the job until scratchX is down.) Second, the resources_assigned.scratchX values told you how many jobs were (probably) using a given file system. Third, you could set resources_available.scratchX to a small number as a coarse limit on the load to allow on the file system. Fourth, you could create a reservation to start at a specific time that requested all 5000 of the scratchX resources, thus scheduling a dedtime for just that file system. (Iā€™m not sure we ever used thisā€“needs testing.)

Interesting idea. My first thought was ā€œWhy would anyone set it to Falseā€, but then it occurred to me they might be thinking they were explicitly saying ā€œI donā€™t need this file systemā€ which, as you say, would not work as intended.

The how many jobs are using the filesystem might come in handy, though I donā€™t recall us ever needing to know that. I doubt we would ever use it as a throttle. The reservation idea might be useful for benchmarking, but I could also see a user setting it to 5000 on their own and blocking other users when we didnā€™t want them to. If we wrote a hook I guess we could overwrite any value greater than zero to be one unless it was a manager or something along those lines. I would also have to think about how that figures into the prioritization calculation.

Thanks for the input!

1 Like

It is sort of working, but not really in a practical way? I thought adding an explicit select so that it would consider home_fs as a job wide resource would get this to work, but not such luck:

(base) [allcock@edtb-01 20220222-22:39:49]> qsub -l home_fs=true -l walltime=05:00 -l select=ncpus=8 -- /usr/bin/hostname
2943.edtb-01.mcp.alcf.anl.gov
(base) [allcock@edtb-01 20220222-22:40:34]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:40:36]> qstat -f 2943
Job Id: 2943.edtb-01.mcp.alcf.anl.gov
    Job_Name = STDIN
    Job_Owner = allcock@edtb-01.mcp.alcf.anl.gov
    job_state = Q
    queue = workq
    server = edtb-01.mcp.alcf.anl.gov
    Checkpoint = u
    ctime = Tue Feb 22 22:40:34 2022
    Error_Path = edtb-01.mcp.alcf.anl.gov:/home/allcock/STDIN.e2943
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Tue Feb 22 22:40:34 2022
    Output_Path = edtb-01.mcp.alcf.anl.gov:/home/allcock/STDIN.o2943
    Priority = 0
    qtime = Tue Feb 22 22:40:34 2022
    Rerunable = True
    Resource_List.home_fs = True
    Resource_List.ncpus = 8
    Resource_List.nodect = 1
    Resource_List.place = free
    Resource_List.preempt_targets = Queue=preemptable
    Resource_List.select = ncpus=8
    Resource_List.walltime = 00:05:00
    schedselect = 1:ncpus=8
    substate = 10
    Variable_List = PBS_O_HOME=/home/allcock,PBS_O_LANG=en_US.UTF-8,
	PBS_O_LOGNAME=allcock,
	PBS_O_PATH=/opt/miniconda3/bin/:/home/allcock/miniconda3/bin:/home/all
	cock/miniconda3/condabin:/opt/miniconda3/bin:/home/allcock/bin:/usr/loc
	al/bin:/usr/bin:/bin:/opt/pbs/bin:/home/allcock/bin,
	PBS_O_MAIL=/var/spool/mail/allcock,PBS_O_SHELL=/bin/bash,
	PBS_O_WORKDIR=/home/allcock,PBS_O_SYSTEM=Linux,PBS_O_QUEUE=workq,
	PBS_O_HOST=edtb-01.mcp.alcf.anl.gov
    euser = allcock
    egroup = users
    queue_rank = 1645569634105414
    queue_type = E
    comment = Can Never Run: Insufficient amount of queue resource: home_fs (Tr
	ue != False)
    etime = Tue Feb 22 22:40:34 2022
    Submit_arguments = -l home_fs=true -l walltime=05:00 -l select=ncpus=8 -- /
	usr/bin/hostname
    executable = <jsdl-hpcpa:Executable>/usr/bin/hostname</jsdl-hpcpa:Executabl
	e>
    project = _pbs_project_default
    Submit_Host = edtb-01.mcp.alcf.anl.gov
    server_instance_id = edtb-01.mcp.alcf.anl.gov:15001

(base) [allcock@edtb-01 20220222-22:40:50]> qmgr -c "list home_fs"
qmgr: Illegal object type: home_fs.
(base) [allcock@edtb-01 20220222-22:42:29]> qmgr -c "list resource home_fs"
Resource home_fs
    type = boolean

(base) [allcock@edtb-01 20220222-22:42:38]> qmgr -c "print server" | grep home_fs
# Create and define resource home_fs
create resource home_fs
set resource home_fs type = boolean
set server resources_available.home_fs = True

I assume the reason it is saying True != False is because it is looking for the resource on the node, rather than the server?

And then things got interesting. I decided maybe putting the flag on and taking it off was the problem, so I created a new resource called test_fs:

(base) [allcock@edtb-01 20220222-22:44:33]> qmgr -c "create resource test_fs type=boolean"
(base) [allcock@edtb-01 20220222-22:44:41]> qmgr -c "set server resources_available.test_fs=True"
(base) [allcock@edtb-01 20220222-22:48:25]> sudo emacs /var/spool/pbs/sched_priv/sched_config
(base) [allcock@edtb-01 20220222-22:55:49]> sudo kill -HUP 102603

Then I started testing (2943 is left from before and is using home_fs rather than test_fs). Here is what I did:

  • Set test_fs to false
  • submitted job 2946 depending on test_fs=True, it did not start and had test_fs )True != False)
  • I set test_fs=True and checked to see if it started; It did not
  • I submitted job 2947 also requiring test_fs=True in an attempt to force a scheduling cycle; None of the jobs started, all saying True != False
  • Then I restarted the PBS server to see if that would make things work. It did not immediately, but I came back 30 minutes later and all the jobs had run, including the one depending on home_fs

Thoughts?

(base) [allcock@edtb-01 20220222-22:55:54]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:56:02]> qmgr -c "print server" | grep test_fs
# Create and define resource test_fs
create resource test_fs
set resource test_fs type = boolean
set server resources_available.test_fs = False
(base) [allcock@edtb-01 20220222-22:56:41]> qsub -l test_fs=true -l walltime=05:00 -l select=ncpus=8 -- /usr/bin/hostname
2946.edtb-01.mcp.alcf.anl.gov
(base) [allcock@edtb-01 20220222-22:56:51]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
2946.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:56:54]> qstat -f 2946 | grep comment
    comment = Can Never Run: Insufficient amount of queue resource: test_fs (Tr
(base) [allcock@edtb-01 20220222-22:57:15]> qmgr -c "set server resources_available.test_fs=True"
(base) [allcock@edtb-01 20220222-22:57:35]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
2946.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:57:39]> qsub -l test_fs=true -l walltime=05:00 -l select=ncpus=8 -- /usr/bin/hostname
2947.edtb-01.mcp.alcf.anl.gov
(base) [allcock@edtb-01 20220222-22:57:55]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
2946.edtb-01      STDIN            allcock                  0 Q workq
2947.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:57:58]> qstat -f 2947 | grep comment
    comment = Can Never Run: Insufficient amount of queue resource: test_fs (Tr
(base) [allcock@edtb-01 20220222-22:58:11]> sudo systemctl restart pbs
(base) [allcock@edtb-01 20220222-22:58:41]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
2946.edtb-01      STDIN            allcock                  0 Q workq
2947.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:58:44]> stat
stat: missing operand
Try 'stat --help' for more information.
(base) [allcock@edtb-01 20220222-22:59:25]> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2943.edtb-01      STDIN            allcock                  0 Q workq
2946.edtb-01      STDIN            allcock                  0 Q workq
2947.edtb-01      STDIN            allcock                  0 Q workq
(base) [allcock@edtb-01 20220222-22:59:30]> qstat
(base) [allcock@edtb-01 20220222-23:26:18]> qstat -f 2947 | grep comment
qstat: 2947.edtb-01.mcp.alcf.anl.gov Job has finished, use -x or -H to obtain historical job information
(base) [allcock@edtb-01 20220222-23:26:51]> qstat -xf 2947 | grep comment
    comment = Job run at Tue Feb 22 at 23:08 on (edtb-01[0]:ncpus=8) and finish
(base) [allcock@edtb-01 20220222-23:26:57]> qstat -xf 2946 | grep comment
    comment = Job run at Tue Feb 22 at 23:08 on (edtb-01[0]:ncpus=8) and finish
(base) [allcock@edtb-01 20220222-23:27:13]> qstat -xf 2943 | grep comment
    comment = Job run at Tue Feb 22 at 23:08 on (edtb-01[0]:ncpus=8) and finish

I just caught that it said insufficient queue resource. Why was it looking for a queue resource rather than a server resource?

You found a bug. Nice catch. For server and queue level resources, if a value is unset, the resource should be ignored. This is the case for all resource types other than booleans. It should be true for booleans.

The PBS scheduler is getting a C++ facelift. In the past couple of years we compiled our C code with a C++ compiler. Ever since then, weā€™ve been updating the scheduler to use C++ constructs. Recently, the resource comparison code was refactored. Thatā€™s where this bug slipped in.

Please make the following change in the function find_check_resource() (check.cpp):
Existing code:

		if (resreq->type.is_boolean)
			res = fres;

Change it to:

		if (resreq->type.is_boolean && (flags & UNSET_RES_ZERO))
			res = fres;

This should fix your problem.

Either that or use a consumable resource like @dtalcott suggested. It doesnā€™t run into the bug.

In any case, Iā€™ll file the bug and see about getting it fixed.

Actually another option is to just set the resource at the queue level as well. Thatā€™s probably unmanageable since you want one boolean per fileserver.

Bhroam

Got it. We will patch locally until it is in master. I really appreciate your help on this and your timely responses on the forum in general.

On another note, why did the jobs run 30 minutes after I bounced the server?

I donā€™t understand this part. Restarting the server should have no effect on the bug in question. It should always be the case. The scheduler will think that an unset queue resource is false. If it didnā€™t run once, it should never run (as the comment said).

Bhroam

Iā€™ve finished the fix and it has been checked into master. It is slightly different than the one I gave you (but not by much). Hereā€™s the commit:

The commit has the code fix and a PTL test if you wish to run it to test after you patch.

Bhroam