Unable to view output files when we submit job on compute node form master

Hi Team,
Please guide me to resolve this issue. I am using 19.1.1 CE and i have two nodes(Master -pbs server, node01- pbsnode)
I am able to run jobs on master and get the output file.
when i submit job on node01 using pbsdata user from master, i am getting job is ‘E’ status

[pbsdata@master ~]$ qsub -l select=1:ncpus=1:mem=100mb:host=master -- /usr/bin/echo helloworld
17.master
You have new mail in /var/spool/mail/pbsdata
[pbsdata@master ~]$ qstat -ans
[pbsdata@master ~]$ ls
STDIN.e0  STDIN.e10  STDIN.e16  STDIN.e17  STDIN.o0  STDIN.o10  STDIN.o16  STDIN.o17
[pbsdata@master ~]$ qsub -l select=1:ncpus=1:mem=100mb:host=node01 -- /usr/bin/echo helloworld
18.master
[pbsdata@master ~]$ qstat -ans

master:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
18.master       pbsdata  workq    STDIN        4738   1   1  100mb   --  E 00:00
   node01/0
   Job run at Sat Jul 18 at 17:49 on (node01:ncpus=1:mem=102400kb)

my node01 mom log file as below
less /var/spool/pbs/mom_logs/20200718

07/18/2020 17:49:30;0080;pbs_mom;Job;18.master;copy file request received
07/18/2020 17:49:30;0080;pbs_mom;Fil;sys_copy;command: /usr/bin/scp -Brvp /var/spool/pbs/spool/18.master.OU pbsdata@master.calligotech.com:/home/pbsdata/STDIN.o18 status=1, try=1
07/18/2020 17:49:30;0080;pbs_mom;Fil;sys_copy;command: /bin/false -rp /var/spool/pbs/spool/18.master.OU pbsdata@master.calligotech.com:/home/pbsdata/STDIN.o18 status=1, try=2
07/18/2020 17:49:40;0800;pbs_mom;n/a;mom_get_sample;nprocs:  233, cantstat:  0, nomem:  0, skipped:  0, cached:  0
07/18/2020 17:49:41;0080;pbs_mom;Fil;sys_copy;command: /usr/bin/scp -Brvp /var/spool/pbs/spool/18.master.OU pbsdata@master.calligotech.com:/home/pbsdata/STDIN.o18 status=1, try=3
07/18/2020 17:49:41;0080;pbs_mom;Fil;sys_copy;command: /bin/false -rp /var/spool/pbs/spool/18.master.OU pbsdata@master.calligotech.com:/home/pbsdata/STDIN.o18 status=1, try=4
07/18/2020 17:49:56;0800;pbs_mom;n/a;mom_get_sample;nprocs:  233, cantstat:  0, nomem:  0, skipped:  0, cached:  0
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;Unable to copy file /var/spool/pbs/spool/18.master.OU to master.calligotech.com:/home/pbsdata/STDIN.o18
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;Executing: program /usr/bin/ssh host master.calligotech.com, user pbsdata, command scp -v -r -p -t /home/pbsdata/STDIN.o18
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Reading configuration data /etc/ssh/ssh_config
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: /etc/ssh/ssh_config line 58: Applying options for *
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Connecting to master.calligotech.com [192.168.43.41] port 22.
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Connection established.
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_rsa type 1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_rsa-cert type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_dsa type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_dsa-cert type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_ecdsa type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_ecdsa-cert type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_ed25519 type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: key_load_public: No such file or directory
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: identity file /home/pbsdata/.ssh/id_ed25519-cert type -1
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Enabling compatibility mode for protocol 2.0
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Local version string SSH-2.0-OpenSSH_7.4
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: match: OpenSSH_7.4 pat OpenSSH* compat 0x04000000
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Authenticating to master.calligotech.com:22 as 'pbsdata'
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: SSH2_MSG_KEXINIT sent
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: SSH2_MSG_KEXINIT received
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: kex: algorithm: curve25519-sha256
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: kex: host key algorithm: ecdsa-sha2-nistp256
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: kex: curve25519-sha256 need=64 dh_need=64
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: kex: curve25519-sha256 need=64 dh_need=64
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;debug1: Server host key: ecdsa-sha2-nistp256 SHA256:nTXjdF+ZPTMGJ8vSTCtwAgukc5mz9n0lt/BwF3yvqCg
07/18/2020 17:50:02;0004;pbs_mom;Fil;**18.master.OU;Host key verification failed.**
07/18/2020 17:50:02;0004;pbs_mom;Fil;18.master.OU;lost connection
07/18/2020 17:50:02;0080;

form logs… i am getting hostkey verification failed. how to resolve this.

Regards,
Zain

Dear Zain,

it seems that the -t option in the scp command causes such problems. Provided that you can ssh passwordless between the master and node01 you may also try to add the line

PBS_SCP=/bin/scp

in /etc/pbs.conf on the master host.

Hope this solves your problem

br
Günter

Dear Gunter,

I have already added pbs_scp=/bin/scp on my master node and compute node.
below are the details

Master:

cat /etc/pbs.conf

PBS_EXEC=/share/apps/platform/pbs
PBS_HOME=/var/spool/pbs
PBS_SERVER=master
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp
PBS_RCP=/bin/false
PBS_SCP=/usr/bin/scp
PBS_RSHCOMMAND=/usr/bin/ssh

Node01:

cat /etc/pbs.conf

PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_SERVER=master
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp
PBS_RCP=/bin/false
PBS_SCP=/usr/bin/scp
PBS_RSHCOMMAND=/usr/bin/ssh

Please let me know need any changes … so that we can able to resolve this problem.

Regards,
Zain

Dear Zain,

I still think that the -t option somehow causes the problem, the corresponding

command is given in your mom log file, i.e.

scp -v -r -p -t /home/pbsdata/STDIN.o18

To verify I performed the following in any directory on a PBS server

touch test.txt

scp -v -r -p -t test.txt /home/bachlerg

scp: ambiguous target

scp -v -r -p test.txt /home/bachlerg

Executing: cp -r -p – test.txt /home/bachlerg

The question arises why PBS is using the -t option to return the o-file into the

job directory ?

Hope you find somebody who can explain that

Br

Günter

image001.jpg

Hi,

this is from pbspro ce 19.1.1 version specific or any scp/ssh related versions causing this “ambiguus target”. can’t we remove “t” option from related config file.

Else this is from password-less connection between pbsdata@master and pbsdata@node01 (here i have created pbsdata user at Master and node01 nodes)

Please guide me on this.

Regards,
Zain

zain, could you please create a wrapper script /usr/bin/scp.sh and use this wrapper script in the /etc/pbs.conf against PBS_SCP=/usr/bin/scp.sh and restart the pbs services

scp.sh will filter out the arguments that is not required and pass only your customised arguments

#!/bin/bash
#This scp.sh file is saved in /usr/bin/scp.sh
#in the /etc/pbs.conf of PBS Server and PBS Moms
#update this line PBS_SCP=/usr/bin/scp.sh
#restart the PBS Services on server node and compute node

for i in “$@”
do

if [ “$i” != “-t” ]
then
cmd+=$i" "
fi
done
echo “/usr/bin/scp $cmd”
/usr/bin/scp $cmd

Hope this works for you