Hi, please excuse my simple questions - I’m struggling to understand PBSPro, it’s terminology and how it’s documentation is structured.
I would like to reboot a handful of nodes. I would like to mark them offline so that no new jobs are submitted to them. I don’t want to do anything to the running jobs - I’m happy to wait until those jobs finish naturally before rebooting the machines.
I’m struggling to find a couple of what I consider relatively simple tasks.
First, I’d really like to get a one line status of every node along with it’s node name. pbsnodes -a | grep ' state ' feels clumsy and lacks the node name.
As per a comment on my previous topic, I guess I’ll need to install jq and start writing a bunch of one liners to put into /usr/local/bin
Second, I’d like to take a select few offline as mentioned. On page AG-546 (section 14.4 “Administration, Managing machines” of the big guide pdf) we find references on to how to take a vhost offline but nothing for hosts, so I’ll presume that they’re the same. There are references to machine state, but those aren’t defined anywhere in the section “managing machines” nor is the definition linked so it’s hard to parse the difference between offline and down.
The docs do explain how to take a machine offline and suspend the jobs using it, but not how to stop the scheduler from sending more jobs to the node and letting the running jobs finish. Then those docs finish.
There are references to the hooks section of the docs, so I follow that link to page 898 Section, 5.2.12 Offlining Bad Vnodes which is not exactly what I want, and sure enough none of those options seem appropriate.They do mention setting comments per node, which I thought was interesting. When I look into how to read the comments set on any particular node, or all the comments on all the nodes, I can’t find that documentation either.
Any tips on the following would be appreciated:
how to mark a node as “draining”
how to list the states of all nodes with the node name
where the machine states are defined
how to list all the comments against any/all nodes
Please try this command to list all the nodes and their status pbsnodes -aSj
To get details about individual node or job:
pbsnodes -v nodename
qstat -fx jobid
You can offline / clear all the nodes are set of nodes with these commands:
to offline the nodes
pbsnodes -o node1 node2 node3
for i in {list of nodes}; do qmgr -c “set node $i state=offline” ; done Note: if you offline a node that is running a job, PBS Pro will allow the running job to run to completion and would make sure no new jobs are scheduled on to this off-lined nodes
To clear their status
pbsnodes -r node1 node2 node3
for i in {list of nodes}; do qmgr -c “set node $i state=free” ; done
in pbsnodes man manual, Are they not exactly the same?
pbsnodes -o //for non-cray host with multi-vnode,means
//offline all vodes
qmgr -c “set node state=offline” //just offline single vnode
Question:
1.May be most hosts are non-Cray?
2.What is the difference between vnode and natural node
thank you!
In a multi vnode (created using configversion2 file)
For example: cnode1 , cnode1[0] , cnode1[1]
cnode1 : is the natural node or parent vnode
cnode1[0] and cnode2[0]: child vnodes