How to submit jobs from a remote host to a cluster

I am running a cluster called archer. Its running fine. I have a remote machine called Topgun. I installed the server software on Topgun and pointed to Archer as the server. I can do a “qstat” and see jobs running on archer. I would like to submit jobs on Topgun to be run on Archer, but it just sits there.

Please share the the below command output

**Archer**:
1. qstat -Bf
2. pbsnodes -aSjv
3. cat /etc/pbs.conf
4. qstat -answ1

**Topgun**
1. qstat -Bf
2. pbsnodes -av
3. cat /etc/pbs.conf
4. qstat -answ1

Hard to provide that info since I am in a classified environment and the commands are providing fully qualified domain names.

If I try to submit an interactive job on topgun I get this:

-bash-4.2$ qsub -I

qsub: waiting for job 194734.archer.xxx.xxx to start

Qstat on both machines see the jobs running on archer. pbsnodes -a shows only the nodes on Archer, which is what I want. I don’t want jobs to run on Topgun, I only want to run jobs on the Archer cluster,

08/02/2022 17:51:53;0080;Server@archer;Req;req_reject;Reject reply code=15139, aux=0, type=8, from ramos@topgun.xxx.xxx.xx

Archer see the request, just refuses it

PBS_SERVER=archer.xx.xx.xx
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

pbs.conf is identical on both machines.

I am finally able to submit a job after setting up the ssh keys, but interactive jobs fail.

Thank you for all the above details and tests.

  1. Interactive job might be failing due to firewalls or ports being blocked
    Please refer:
    Interactive Job errors out with 'apparently deleted' - #13 by scc
    Interactive Job errors out with 'apparently deleted' - #16 by adarsh

Please refer: Where to submit a job instead of pbs server - #10 by scc

Thank you

Then the /etc/pbs.conf file on the Topgun should be (as it is only used to submit jobs to Archer and would be like a client node or login node , it would not be running any of the pbs services/daemons but has only the command line tools for job managment)

PBS_SERVER=archer.xx.xx.xx
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

Please see the PBS Professional Installation and Upgrade Guide, especially chapter 1, “PBS Architecture”, chapter 2, “Pre-installation Steps”, and chapter 3, “Installation”.

I have it working, to a degree. I have another cluster called Maury. It is the conventional cluse with a head node and two login nodes, Maury1 and Maury. We have another machine that doesn’t show up with “pbsnodes -a”, but can submit jobs. That machine can also 'qsub -I" and land on one of the compute nodes. The users is satisfied now that he can submit jobs, but he can’t run an interactive job.

Please note that
pbsnodes -a # shows only the compute node(s) added to the pbs server using the command
qmgr -c "create node node-hostname"

If you have login nodes (say none of the PBS Services are running on them, but only commands are deployed ) , these would not be part of the compute resources and would not show in pbsnodes -a output.

I understand all of that I made all of that, I installed PBS on it. I know I installed the machine that can submit job remotely, because the PBS software is in my home directory. I was originally based on PBS 13 something. I upgraded that cluster a couple of years ago. It is a paid for version with support. I just don’t understand why I can run interactive jobs on that cluster from a machine that doesn’t show up on “pbsnodes -a”.