Vnodes creation

I deployed openpbs from git master branch and I activated pbs_cgroups hook in order to get vnodes.
At some point during the day, I could see a natural node with 2 vnodes (2 sockets node) but now, I can’t get the vnodes back (I tried kill -HUP pbs_mom and restarting it).
On the mom side, everything looks ok so I guess there’s something wrong on the server.

On the natural node, resources_available.mem has been set resources_available.mem = 196694128kb
but still no vnodes…

I looked at my server logs and found these entries:
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;;set_all_state;txt=(null) mi_modtime=1637337960
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120;set_vnode_state;vnode.state=0x0 vnode_o.state=0x0 vnode.last_state_change_time=1637338007 vnode_o.last_state_change_time=1637337963 state_bits=0xfffffffffffffe7f state_bit_op_type_str=Nd_State_And state_bit_op_type_enum=2
11/19/2021 11:06:47;0002;Server@nwpbs10;Node;;update2 state:0 ncpus:24
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;;set_all_state;txt=(null) mi_modtime=1637337960
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120;set_vnode_state;vnode.state=0x20 vnode_o.state=0x0 vnode.last_state_change_time=1637338007 vnode_o.last_state_change_time=1637337963 state_bits=0x20 state_bit_op_type_str=Nd_State_Or state_bit_op_type_enum=1
11/19/2021 11:06:47;0002;Server@nwpbs10;Node;;Mom reporting 1 vnodes as of Fri Nov 19 11:06:00 2021
11/19/2021 11:06:47;0100;Server@nwpbs10;Node;nwpn1120;set_vnode_state;vnode.state=0x0 vnode_o.state=0x20 vnode.last_state_change_time=1637338007 vnode_o.last_state_change_time=1637338007 state_bits=0xfffffffffffffedd state_bit_op_type_str=Nd_State_And state_bit_op_type_enum=2

To me, it looks like mom is send information about vnode but then server isn’t doing anything with it…

Could you explain to me how to interpret these entries ? (Maybe this topic belongs to the developers forum…)

Thank you !

I would check the mom_logs on the node to see if there are messages from pbs_cgroups. Perhaps it cannot run for some reason, so the vnodes don’t get created?

pbs_cgroups looks like it can run if I look at mom logs:

11/19/2021 19:06:30;0800;pbs_python;Hook;pbs_python;_discover_numa_nodes: {0: {‘devices’: [], ‘cpus’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], ‘MemTotal’: ‘97630800k’, ‘HugePages_Total’: 0, ‘hpmem’: 0, ‘mem’: 98363326464, ‘vmem’: 98363326464}, 1: {‘devices’: [], ‘cpus’: [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], ‘MemTotal’: ‘99063320k’, ‘HugePages_Total’: 0, ‘hpmem’: 0, ‘mem’: 99830226944, ‘vmem’: 99830226944}}

and at the end of hook execution:
11/19/2021 19:06:30;0800;pbs_python;Hook;hook_perf_stat; action=run_code walltime=0.174910 cputime=0.170000
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list; al_resc=null al_value=nwpn1120 al_flags=0
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;al_name=nwpn1120.pbs_version al_resc=null al_value=20.0.2 al_flags=0
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;al_name=nwpn1120.pcpus al_resc=null al_value=24 al_flags=0
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;al_name=nwpn1120.resources_available al_resc=mem,size al_value=196694128kb al_flags=0
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;al_name=nwpn1120.resources_available al_resc=ncpus,long al_value=24 al_flags=0
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;al_name=nwpn1120.resources_available al_resc=arch,string al_value=linux al_flags=0
11/19/2021 19:06:30;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>

This is from a exechost_periodic. The vnodes should be created by a exechost_startup, I tried to restart mom/hup it but no still no vnodes… Perhaps something’s wrong in the exechost_startup…

so I tried with latest stable openpbs release (20.0.1) and I get vnodes automatically and correctly set regarding memory !

I guess there’s something wrong (or I did something wrong) with master branch.