Torque pbs transfering files

AlexG · March 20, 2018, 12:26am

I’ve configured Torque PBS cluster with 2 machines: ip1 and ip2. ip1 acts as a server with the torque-server, torque-mom and torque-scheduler installed. ip2 is just a node with torque-mom. The configuration is ok, pbsnodes on both machines returns

cuda
state = free
np = 16
ntype = cluster
status = rectime=1519887342,varattr=,jobs=,state=free,netload=2829068930,gres=cuda:,loadave=0.50,ncpus=16,physmem=132036652kb,availmem=134818552kb,totmem=135943208kb,idletime=2822,nusers=2,nsessions=2,sessions=1363 4658,uname=Linux cuda 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64,opsys=linux

cuda2
state = free
np = 4
ntype = cluster
status = rectime=1519887335,varattr=,jobs=,state=free,netload=71522585,gres=,loadave=0.00,ncpus=4,physmem=16432464kb,availmem=18032520kb,totmem=18384204kb,idletime=2880,nusers=3,nsessions=15,sessions=1575 1584 1604 1646 1647 1648 1649 1650 1651 1653 1655 1703 1726 18189 18257,uname=Linux IU6-2 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64,opsys=linux

Only ip1 will be used by multiple users to run jobs, so to prevent torque using scp while file transferring, I’ve also configured nfs server on ip1, mapped /home folder on ip1 to /mnt/home on ip2 and according to 13.9.2.1 “Configuring the $usecp MoM Parameter” https://pbsworks.com/documentation/support/PBSProAdminGuide12.pdf added

$usecp ip1:/home/ /mnt/home/

to the file /var/spool/torque/mom_priv/config on ip2. Then I’ve tried to run simple script with qsub on both nodes:

#!/bin/bash
#PBS -l nodes=2
#PBS -k o
#PBS -j oe
$PBS_O_WORKDIR/test

In stat -f output I see:

//…
job_state = C
//…
exit_status = -1
//…

But there are no output files. And in mom logs in ip1:

pbs_mom;Job;274.localhost;ERROR: received request ‘ABORT_JOB’ from ip2:1023 for job ‘274.localhost’ (job does not exist locally)

What am I doing wrong?

Thanks in advance.

billnitzberg · March 20, 2018, 12:57am

Hi,

(Apologies – this post got stuck in the moderator queue for many days… sorry for the delay.)

Note that this forum is dedicated to “PBS Pro” software, not TORQUE. I’d encourage you to try PBS Pro (as a replacement for TORQUE). PBS Pro comes with a full-featured scheduler, has hundreds of person-years of hardening, and is running on some of the largest supercomputers in the world. If you want to stick with TORQUE, I suggest posting your question to one of the forums devoted to the TORQUE software.

Again, sorry for the delayed post.

adarsh · March 20, 2018, 11:04am

PBS Pro documentation would be useful, available at this link:
https://pbsworks.com/SupportGT.aspx?d=PBS-Professional,-Documentation

Suggestions:
Please refer to 2.1.3.4 Required Name Resolution of the PBS Professional Administrator guide.

Topic		Replies	Views
Single-node torque hostname woes Users/Site Administrators	4	2362	January 3, 2017
Installation of torque in single node - ubuntu 14.04 Users/Site Administrators	1	2937	January 24, 2018
Can not submit a job to HPC cluster after logging in Users/Site Administrators	1	146	May 11, 2024
Sharing all of /var/spool/pbs on HA server hosts Users/Site Administrators	5	1262	October 29, 2021
How to allow only users with running jobs to access compute nodes via ssh? Users/Site Administrators	2	2823	September 11, 2018

Torque pbs transfering files

Related topics