I’m running some diagnostic jobs to gather runtime information from different hardware that we have deployed with PBS Pro. In short, we have 4 different architectures that we identify with a resources_available.plist tag. What I’m trying to do is create a base PBS script and then pass only the plist parameter to qsub on the command line. For example:
qsub -l plist=broadwell job.pbs
And then in my job.pbs file, I have:
qsub -l select=1:ncpus=1:mem=2gb
However, I’m receiving this error when submitting:
qsub: "-lresource=" cannot be used with "select" or "place", resource is: plist
It seems I cannot split the select statement up – I either have to specify all of the select parameters on the command line or all of them in the script. Is there a way around this? For example, can environment variables be interpolated within PBS comments inside the script, such as:
#PBS -l select=1:ncpus=1:mem=2gb:plist=$VARIABLE
Admittedly, this isn’t a major issue, just an inconvenience. Any tips about how to accomplish what I’m after more elegantly would be greatly appreciated.
I think it should work just fine the way you are specifying the jobs. PBS does not allow mixing the "-l " syntax with “-lselect=<num_of_chunks>:” syntax.
Can you please check if there are any default_qsub_arguments set on server object that might be causing this problem. To check this, please issue command - qmgr -c “print server”
There are no default_qsub_arguments specified on the server. But in any case, your response doesn’t really answer my question. I’m trying to find a way to split up the resource specification so that I can specify some resource parameters (i.e. “-l” parameters) on the command line and some in the job.
Are you saying if I use the word “select” at all in my specification, I can’t specify any other separate resources? They all have to be on the select line?
If this is the case, is there another syntax I can use that will allow me to split up parameters the way I want? It seems that the “select” syntax is overly restrictive and I don’t understand what the practical reason for this is.
Sorry about not being clear in my response. I was just trying to say that you can do a qsub inside a job script of another qsub because both jobs are treated independently. Now I assume, you are getting error because you are specifying select specification with #PBS directive and plist in command line.
PBS does not interpolate environment variables into PBS directive present inside the job script. But there can be different ways to deal with this situation.
“-lselect” syntax is used to specify all host level resources that a job needs. All other job wide resources (like walltime, job placement) is specified outside of “-lselect” syntax.
example, qsub -lselect=1:ncpus=2:mem=2gb -lwalltime=300 <job script>" is a legit way of submitting job.
In your case since plist is a host level resource you could submit a job like this -
qsub -lselect=1:ncpus=2:mem=2gb:plist=Broadwell <job script>
In order to make value of plist variable, you can write a submission wrapper around qsub and pass in the value of plist as an argument to it.
You get that message because of how you defined your resource. The resource has flag=h. When a resource has flag=h, it either needs to appear in the select statement, or the job can’t be submitted with a select statement. When a job is submitted without a select statement, one will be created for it with one chunk of all the flag=h resources (e.g. ncpus or mem).
If you want to match plist on your nodes, then you need to include plist in your select statement. You can do something like -l select=4:ncpus=2:plist=broadwell. This means 4 chunks of 2 cpus on broadwell nodes. As a note, multiple chunks can be placed on a single node. PBS will try and pack your chunks on nodes if possible.
Thank you for the insights. If I unset flag=h for the plist resource, then can I specify it separately from the select statement? Or is flag=h a feature of node-level resources in general?
I guess what I’m trying to accomplish is to be able to have a standard job template that uses the same number of chunks, ncpus, and mem each time, but a different CPU architecture (plist). To me, the ideal way to do this would be to put everything except for plist in the job script, since those things do not change, and then submit the jobs with plist specified as a command line argument.
Now, I realize that this is somewhat trivial, because I can always specify every parameter on the command line if I want. I’m just trying to understand if there is any way to decouple certain parameters from the select statement, and if so, find the settings that determine which parameters can be separated and which cannot. It sounds like what I’m looking for then is flag=h, is that correct? In other words, if I create a resource that does not have flag=h, it won’t need to be tied to a select statement?
You will need flag=h if it is a node based resource. If you remove flag=h, it will be considered at the queue and server level only. This is not what you want.
I do have a solution for you, but it is not pretty. You can create a queue and set several default_chunk resources.
So let’s say you wanted 2:ncpus=4:mem=2gb and then you want to add the plist on yourself.
You do the following to your queue:
set queue default_chunk.nchunk=2
set queue default_chunk.ncpus=4
set queue default_chunk.mem=2gb
Now you submit a job as qsub -l select=plist=broadwell. This will turn into a select of 2:ncpus=4:mem=2gb:plist=broadwell.
The ugly part is this requires you to create one queue per template you want.
The only other way I can think of doing this is to create N different job scripts, one per value of plist with the plist embedded in the job script. This is pretty much the opposite of what you are asking for though.