How to install pbs on compute node and configure the server and compute node?

Hi guys,
I am new to HPC and PBS or torque. I am able to install PBS pro from source code on my head node . But not sure how to install the compute node and cconfigure it. I didn’t see any documentation in the github either. Can anyone give me some help? Thanks

Install is pretty similar on the compute nodes - however, you do not need the “server” parts.
There are OK docs on the Altair “pro” site, see answer to previous question “documentation-is-missing/81”.

In short, you the Altair docs for v13, and/or the INSTALL file procedure. (Or install from pre-build binaries).
Actual method will depend on your system type etc.

I prefer to install using pre-compiled RPMs (CentOS72 systems), which presently means that I will compile these from tarball+spec-file (slightly modified spec-file).

Hope this helps.
/Bjarne

@Joey thanks for joining the pbspro forum.

You can find the documentation about pbspro here: https://pbspro.atlassian.net/wiki/display/PBSPro/User+Documentation

Kindly do not hesitate to post questions about any specific issues you are facing.

Thanks,
Subhasis

Thanks for your reply.

I rebuild the CentOS72 rpm with the src from Centos7.zip
installed pbspro-server-14.1.0-13.1.x86_64.rpm on mye headnode
installed pbspro-execution-14.1.0-13.1.x86_64.rpm on my compute node.
On the head node
create /var/spool/pbs/server_priv/nodes with following:

computenode1 np=1

/etc/pbs.conf:
PBS_SERVER=headnode
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

on the compute node

/var/spool/pbs/mom_priv/config as following

$logevent 0x1ff
$clienthost headnode
$restrict_user_maxsysid 999

/etc/pbs.conf
PBS_SERVER=headnode
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

after that I start the pbs on headnode and compute node without error:
#/etc/init.d/pbs start
But when I try to run pbsnodes -a, it tells me:
pbsnodes: Server has no node list
If I run a script it will just Queue there.

Both server firewalld are turned off and pingable.

Can anyone give me some help? Thanks

Hi @Joey,

Unlike torque, pbspro uses a real relational database underneath to store information about nodes, queues, jobs etc. Thus creating a nodes file is not supported under pbspro.

To add a node to pbs cluster, use the qmgr command as follows:

qmgr -c “create node hostname”

HTH
regards,
Subhasis

Thanks for your reply. I thought PBS and torque are the same except one is open source and one is commerical.

Hi @Joey

They might feel similar since Torque was based on the OpenPBS codebase. OpenPBS was a version of PBS released as opensource many years back.

Post that, Altair engineering has put in a huge amount of effort towards PBS Professional and added tons of features and improvements in terms of scalability, robustness and ease of use over decades which resulted in it becoming the number one work load manager in the HPC world. Altair has now open-sourced PBS Professional.

So, pbspro is actually very different from torque in terms of capability and performance, and is actually a completely different product.

Let us know if you need further information in switching to pbspro.

Thanks and Regards,
Subhasis

Hi Subhasis,

To add a node to pbs cluster, use the qmgr command as follows:

qmgr -c “create node hostname”

if a site has a few hundreds of compute nodes, the above method is very tedious.
would there be any easy/quick ways to register computer nodes with pbs server like the nodes file in torque?

Thanks,

Sue

This is one way to accomplish it…

while read line; do [ -n "$line" ] && qmgr -c "create node $line"; done <nodefile

where nodefile contains the list of nodes, one per line.

1 Like

I have two nic (Network adapter) and my OS is CentOs7
I would like running PBS with two node which are in 192.168.1.1 and 192.168.1.2 (two separate machines).
I also installed torque and pbs and all of dependencies
but when I exec this command : “pbsnodes -a” I find out it just one of my machines state is free but my headnode (master) is down
In this case when I disable one of my network adapter on headnode (master) everything is OK.
my question is : is there any solutions to solve this problem without disabling ?
Thanks

Please take a look at PBS_LEAF_NAME and PBS_SERVER_HOST_NAME in the admin guide.
https://www.altair.com/pbs-works-documentation/

Unfortunately, I don’t have any file in "/etc/ " which is entitled by : pbs.conf in head node. Something has got wrong?

That generally means that something went wrong with the installation. Are you installing from RPMs or from source code?

Yep! I installed it from source code…

Please refer to the installation instructions that came with your source code in the INSTALL file in the top level directory. These should be similar to those in the master branch located here: https://github.com/PBSPro/pbspro/blob/master/INSTALL

Refer to step 10, running pbs_postinstall.