Hi,
I deployed openpbs from git master branch and I activated pbs_cgroups hook in order to get vnodes.
At some point during the day, I could see a natural node with 2 vnodes (2 sockets node) but now, I can’t get the vnodes back (I tried kill -HUP pbs_mom and restarting it).
On the mom side, everything looks ok so I guess there’s something wrong on the server.
On the natural node, resources_available.mem has been set resources_available.mem = 196694128kb
but still no vnodes…
I looked at my server logs and found these entries:
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120.cm.nrw;set_all_state;txt=(null) mi_modtime=1637337960
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120;set_vnode_state;vnode.state=0x0 vnode_o.state=0x0 vnode.last_state_change_time=1637338007 vnode_o.last_state_change_time=1637337963 state_bits=0xfffffffffffffe7f state_bit_op_type_str=Nd_State_And state_bit_op_type_enum=2
11/19/2021 11:06:47;0002;Server@nwpbs10;Node;nwpn1120.cm.nrw;update2 state:0 ncpus:24
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120.cm.nrw;set_all_state;txt=(null) mi_modtime=1637337960
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120;set_vnode_state;vnode.state=0x20 vnode_o.state=0x0 vnode.last_state_change_time=1637338007 vnode_o.last_state_change_time=1637337963 state_bits=0x20 state_bit_op_type_str=Nd_State_Or state_bit_op_type_enum=1
11/19/2021 11:06:47;0002;Server@nwpbs10;Node;nwpn1120.cm.nrw;Mom reporting 1 vnodes as of Fri Nov 19 11:06:00 2021
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120;set_vnode_state;vnode.state=0x0 vnode_o.state=0x20 vnode.last_state_change_time=1637338007 vnode_o.last_state_change_time=1637338007 state_bits=0xfffffffffffffedd state_bit_op_type_str=Nd_State_And state_bit_op_type_enum=2
To me, it looks like mom is send information about vnode but then server isn’t doing anything with it…
Could you explain to me how to interpret these entries ? (Maybe this topic belongs to the developers forum…)
Thank you !