Dear friends,
I have 7 nodes: node 1 is master node, and nodes 2-7 execution nodes.
I can submit jobs to node 1, and get the results i want.
However, when I submit a job designating any of nodes 2-7 to perform (i.e. except the master node), it gets into running but does not produce the results.
For example I designated node 2 to run a simple task, and it won’t produce the results to the working folder. i look up into the mom logs:
[user1@node02 user1]$ vi /var/spool/pbs/mom_logs/20221214
and found these scripts:
12/14/2022 13:51:09;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp /var/spool/pbs/spool/23.node.OU user1@node:/home2/user1/testing.o23 status=1, try=4
12/14/2022 13:51:30;0001;pbs_mom;Fil;copy_file;Job 23.node: sys_copy failed, return value=1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;Unable to copy file /var/spool/pbs/spool/23.node.OU to node:/home2/user1/testing.o23
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;node: Connection refused
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;ssh host node, user user1, command scp -v -r -p -t /home2/user1/testing.o23
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;OpenSSH_8.0p1, OpenSSL 1.1.1k FIPS 25 Mar 2021
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Reading configuration data /etc/ssh/ssh_config
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: configuration requests final Match pass
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: re-parsing configuration
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Reading configuration data /etc/ssh/ssh_config
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Connecting to node [10.2.208.101] port 22.
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Connection established.
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_rsa type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_rsa-cert type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_dsa type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_dsa-cert type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_ecdsa type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_ecdsa-cert type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_ed25519 type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_ed25519-cert type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_xmss type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: identity file /home/user1/.ssh/id_xmss-cert type -1
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Local version string SSH-2.0-OpenSSH_8.0
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Remote protocol version 2.0, remote software version OpenSSH_8.0
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: match: OpenSSH_8.0 pat OpenSSH* compat 0x04000000
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Authenticating to node:22 as 'user1'
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: SSH2_MSG_KEXINIT sent
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: SSH2_MSG_KEXINIT received
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: kex: algorithm: curve25519-sha256
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: kex: host key algorithm: ecdsa-sha2-nistp256
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: kex: server->client cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: kex: client->server cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: kex: curve25519-sha256 need=32 dh_need=32
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: kex: curve25519-sha256 need=32 dh_need=32
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;debug1: Server host key: ecdsa-sha2-nistp256 SHA256:iTVQj0N976KNZrShSECPYEnKggchsu0ZBoNOCuul1L8
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;Host key verification failed.
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;lost connection
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;ication failed.
12/14/2022 13:51:30;0004;pbs_mom;Fil;23.node.OU;lost connection
12/14/2022 13:51:30;0001;pbs_mom;Svr;pbs_mom;No such file or directory (2) in is_child_path, Failed to allocate memory
12/14/2022 13:51:30;0001;pbs_mom;Fil;stage_file;Job 23.node: no wildcards:remote stageout failed for user1 from /var/spool/pbs/spool/23.node.OU to node:/home2/user1/testing.o23
when I trace job-23, it says:
[user1@node02 user1]$ tracejob 23
Job: 23.mybay
12/14/2022 13:49:55 M Started, pid = 12357
12/14/2022 13:49:55 M task 00000001 terminated
12/14/2022 13:49:55 M Terminated
12/14/2022 13:49:55 M task 00000001 cput=00:00:00
12/14/2022 13:49:55 M kill_job
12/14/2022 13:49:55 M node02 cput=00:00:00 mem=0kb
12/14/2022 13:49:55 M Obit sent
12/14/2022 13:49:56 M copy file request received
12/14/2022 13:51:30 M Unable to copy file /var/spool/pbs/spool/23.mybay.OU to mybay:/home2/user1/testing.o23
12/14/2022 13:51:30 M mybay: Connection refused
12/14/2022 13:51:30 M ssh host mybay, user user1, command scp -v -r -p -t /home2/user1/testing.o23
12/14/2022 13:51:30 M OpenSSH_8.0p1, OpenSSL 1.1.1k FIPS 25 Mar 2021
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
12/14/2022 13:51:30 M debug1: configuration requests final Match pass
12/14/2022 13:51:30 M debug1: re-parsing configuration
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
12/14/2022 13:51:30 M debug1: Connecting to mybay [10.2.208.101] port 22.
12/14/2022 13:51:30 M debug1: Connection established.
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_rsa type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_rsa-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_dsa type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_dsa-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ecdsa type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ecdsa-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ed25519 type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ed25519-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_xmss type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_xmss-cert type -1
12/14/2022 13:51:30 M debug1: Local version string SSH-2.0-OpenSSH_8.0
12/14/2022 13:51:30 M debug1: Remote protocol version 2.0, remote software version OpenSSH_8.0
12/14/2022 13:51:30 M debug1: match: OpenSSH_8.0 pat OpenSSH* compat 0x04000000
12/14/2022 13:51:30 M debug1: Authenticating to mybay:22 as 'user1'
12/14/2022 13:51:30 M debug1: SSH2_MSG_KEXINIT sent
12/14/2022 13:51:30 M debug1: SSH2_MSG_KEXINIT received
12/14/2022 13:51:30 M debug1: kex: algorithm: curve25519-sha256
12/14/2022 13:51:30 M debug1: kex: host key algorithm: ecdsa-sha2-nistp256
12/14/2022 13:51:30 M debug1: kex: server->client cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
12/14/2022 13:51:30 M debug1: kex: client->server cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
12/14/2022 13:51:30 M debug1: kex: curve25519-sha256 need=32 dh_need=32
12/14/2022 13:51:30 M debug1: kex: curve25519-sha256 need=32 dh_need=32
12/14/2022 13:51:30 M debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
12/14/2022 13:51:30 M debug1: Server host key: ecdsa-sha2-nistp256 SHA256:iTVQj0N976KNZrShSECPYEnKggchsu0ZBoNOCuul1L8
12/14/2022 13:51:30 M Host key verification failed.
12/14/2022 13:51:30 M lost connection
12/14/2022 13:51:30 M ication failed.
12/14/2022 13:51:30 M lost connection
12/14/2022 13:51:30 M Job files not copied:---->>>>
12/14/2022 13:51:30 M Unable to copy file /var/spool/pbs/spool/23.mybay.OU to mybay:/home2/user1/testing.o23
12/14/2022 13:51:30 M >>> error from copy
12/14/2022 13:51:30 M mybay: Connection refused
12/14/2022 13:51:30 M ssh host mybay, user user1, command scp -v -r -p -t /home2/user1/testing.o23
12/14/2022 13:51:30 M OpenSSH_8.0p1, OpenSSL 1.1.1k FIPS 25 Mar 2021
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
12/14/2022 13:51:30 M debug1: configuration requests final Match pass
12/14/2022 13:51:30 M debug1: re-parsing configuration
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
12/14/2022 13:51:30 M debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
12/14/2022 13:51:30 M debug1: Connecting to mybay [10.2.208.101] port 22.
12/14/2022 13:51:30 M debug1: Connection established.
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_rsa type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_rsa-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_dsa type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_dsa-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ecdsa type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ecdsa-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ed25519 type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_ed25519-cert type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_xmss type -1
12/14/2022 13:51:30 M debug1: identity file /home/user1/.ssh/id_xmss-cert type -1
12/14/2022 13:51:30 M debug1: Local version string SSH-2.0-OpenSSH_8.0
12/14/2022 13:51:30 M debug1: Remote protocol version 2.0, remote software version OpenSSH_8.0
12/14/2022 13:51:30 M debug1: match: OpenSSH_8.0 pat OpenSSH* compat 0x04000000
12/14/2022 13:51:30 M debug1: Authenticating to mybay:22 as 'user1'
12/14/2022 13:51:30 M debug1: SSH2_MSG_KEXINIT sent
12/14/2022 13:51:30 M debug1: SSH2_MSG_KEXINIT received
12/14/2022 13:51:30 M debug1: kex: algorithm: curve25519-sha256
12/14/2022 13:51:30 M debug1: kex: host key algorithm: ecdsa-sha2-nistp256
12/14/2022 13:51:30 M debug1: kex: server->client cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
12/14/2022 13:51:30 M debug1: kex: client->server cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
12/14/2022 13:51:30 M debug1: kex: curve25519-sha256 need=32 dh_need=32
12/14/2022 13:51:30 M debug1: kex: curve25519-sha256 need=32 dh_need=32
12/14/2022 13:51:30 M debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
12/14/2022 13:51:30 M debug1: Server host key: ecdsa-sha2-nistp256 SHA256:iTVQj0N976KNZrShSECPYEnKggchsu0ZBoNOCuul1L8
12/14/2022 13:51:30 M Host key verification failed.
12/14/2022 13:51:30 M lost connection
12/14/2022 13:51:30 M ication failed.
12/14/2022 13:51:30 M lost connection
12/14/2022 13:51:30 M >>> end error output
12/14/2022 13:51:30 M Output retained on that host in: /var/spool/pbs/undelivered/23.mybay.OU
12/14/2022 13:51:30 M ---->>>>
12/14/2022 13:51:30 M Staged 0/1 items out over 0:01:34
12/14/2022 13:51:30 M no active tasks
12/14/2022 13:51:30 M Obit sent
12/14/2022 13:51:30 M delete job request received
12/14/2022 13:51:30 M kill_job
12/14/2022 13:51:30 M delete job request received
May I know what would be the problem and how to fix it?
Thanks
Best
Austin