I currently have 7 nodes in the cluster in which the first (bc1) is used as server and compute node while the other 6 (bc2 - bc7) are used as compute nodes only.
All of the nodes are on the same DHCP configured network and can ping each other by hostname.
Upon creating a reguler user on shared /home directory, those users don’t have the proper access to the cluster as all of the jobs get stuck in the E state. As far as I know, this can be overcome by creating the ssh connection from the compute nodes to the server node, but I am curious if there is an elegant solution that doesn’t require me logging in to the users account to make SSH connections.
I am using Ansible for creating the account (the user plugin):
tasks:
- name: Ensure user exists
ansible.builtin.user:
name: "{{ username }}"
password: "{{ password }}"
create_home: yes
home: "/home/{{ username }}"
shell: /bin/bash
register: result
until: result is succeeded
retries: 5
delay: 2
I am quite new to all of this stuff, and I am sorry if the question is a bit dumb. I tried finding the solution in the docs, but the only thing I’ve come accross was the creation of the pbsdata user which I tried but it didn’t help.
All machines run on Ubuntu 22.04 LTS
OpenPBS was installed using the GitHub installation tutorial README.
configure host based passwordless ssh configuration
/etc/hosts has all the information of all the nodes and hostnames and it is the same across the headnode and compute nodes
firewall opened to allow ports 15001 to 15007 across the PBS cluster
selinux disabled and system rebooted
Please refer this guide:
It is due to passwordless-scp for the user is not working for that user . Hence , stageout (result fles copy from compute node to pbs server node) is failing, hence job is in E state.
You can check the mom logs of the compute node on which the job ran
source /etc/pbs.conf
cd $PBS_HOME/mom_logs/YYYYMMDD # you will see the reason for E state, might be failing to copy back.
I tried replicating steps in that document, and I couldn’t quite manage to get the copy to work when job is executed on the other nodes. The scp commands seems to fail. I know I can overcome this issue by creating an ssh_key on any node since it’s the shared folder and just copy it to the authorized keys, but since we are planning on hooking up LDAP for further use of the cluster, it is cumbersome and seems like a bad practice to repeat the steps for each user in the LDAP system.
I’ve also tried adding a custom wrapper that uses the pbsdata account created that has passwordless ssh access for each node and set the PBS_SCP to point to it
#!/bin/bash
sudo -u pbsdata /usr/bin/scp "$@"
but without result.
I’ve also set the PBS_RCP to false without any results.
Is there any way to configure all of the users on the system to be able to send jobs to all of the other nodes. In the current state we have 1 server+compute node that we are trying to make a temporary login node, and all of the jobs run on that node from that node work fine, but all of the jobs that are executed on other nodes fail to copy the results back to the user working on the “login” node.
Any tips would be of great help to me!
The log is:
04/11/2024 14:21:13;0100;pbs_mom;Req;;Type 54 request received from root@192.168.1.x:15001, sock=0
04/11/2024 14:21:13;0080;pbs_mom;Job;1026.bc1;copy file request received
04/11/2024 14:21:13;0080;pbs_mom;Fil;sys_copy;command: /bin/scp -Brvp /var/spool/pbs/spool/1026.bc1.OU user01@bc1:/home/user01/STDIN.o1026 status=1, try=1
04/11/2024 14:21:44;0080;pbs_mom;Fil;sys_copy;command: /network-raid/software/pbs/sbin/pbs_rcp -rp /var/spool/pbs/spool/1026.bc1.OU user01@bc1:/home/user01/STDIN.o1026 status=1, try=2
04/11/2024 14:21:55;0080;pbs_mom;Fil;sys_copy;command: /bin/scp -Brvp /var/spool/pbs/spool/1026.bc1.OU user01@bc1:/home/user01/STDIN.o1026 status=1, try=3
04/11/2024 14:22:26;0080;pbs_mom;Fil;sys_copy;command: /network-raid/software/pbs/sbin/pbs_rcp -rp /var/spool/pbs/spool/1026.bc1.OU user01@bc1:/home/user01/STDIN.o1026 status=1, try=4
04/11/2024 14:22:47;0001;pbs_mom;Fil;copy_file;Job 1026.bc1: sys_copy failed, return value=1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;Unable to copy file /var/spool/pbs/spool/1026.bc1.OU to bc1:/home/user01/STDIN.o1026
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;bc1: Connection refused
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;bin/ssh host bc1, user user01, command scp -v -r -p -t /home/user01/STDIN.o1026
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;OpenSSH_8.9p1 Ubuntu-3ubuntu0.6, OpenSSL 3.0.2 15 Mar 2022
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Reading configuration data /etc/ssh/ssh_config
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: /etc/ssh/ssh_config line 21: Applying options for *
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Connecting to bc1 [192.168.1.x] port 22.
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Connection established.
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_rsa type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_rsa-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ecdsa type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ecdsa-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ecdsa_sk type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ecdsa_sk-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ed25519 type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ed25519-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ed25519_sk type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_ed25519_sk-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_xmss type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_xmss-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_dsa type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: identity file /home/user01/.ssh/id_dsa-cert type -1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Local version string SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Remote protocol version 2.0, remote software version OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: compat_banner: match: OpenSSH_8.9p1 Ubuntu-3ubuntu0.6 pat OpenSSH* compat 0x04000000
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Authenticating to bc1:22 as 'user01'
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_KEXINIT sent
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_KEXINIT received
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: kex: algorithm: curve25519-sha256
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: kex: host key algorithm: ecdsa-sha2-nistp256
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_KEX_ECDH_REPLY received
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Server host key: ecdsa-sha2-nistp256 SHA256:zkUIBIPQfiPPXB0Ke3Oz3ewOttrAkD/u6HHyHAj1CaE
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Host 'bc1' is known and matches the ECDSA host key.
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Found key in /etc/ssh/ssh_known_hosts:1
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: ssh_packet_send2_wrapped: resetting send seqnr 3
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: rekey out after 134217728 blocks
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_NEWKEYS sent
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: expecting SSH2_MSG_NEWKEYS
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: ssh_packet_read_poll2: resetting read seqnr 3
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_NEWKEYS received
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: rekey in after 134217728 blocks
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_rsa
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_ecdsa
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_ecdsa_sk
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_ed25519
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_ed25519_sk
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_xmss
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Will attempt key: /home/user01/.ssh/id_dsa
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_EXT_INFO received
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com>
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: kex_input_ext_info: publickey-hostbound@openssh.com=<0>
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: SSH2_MSG_SERVICE_ACCEPT received
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Next authentication method: hostbased
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: userauth_hostbased: trying hostkey ssh-ed25519 SHA256:cFfGK9RDRoROg+q165GygeY3JKyAErB8N/r2jvWvgCk using sigalg ssh-ed25519
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: userauth_hostbased: trying hostkey ecdsa-sha2-nistp256 SHA256:RekNqKgep3rM2V9bQP0Dl0TkJvP4ed6vTj7tVCvJwCY using sigalg ecdsa-sha2-nistp256
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: userauth_hostbased: trying hostkey ssh-rsa SHA256:ZAZVG6Ojo+Ny1HgevebzsHUNxWe1e/cn2C2Fy/sHFv4 using sigalg rsa-sha2-512
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: No more client hostkeys for hostbased authentication.
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Next authentication method: publickey
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_rsa
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_ecdsa
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_ecdsa_sk
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_ed25519
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_ed25519_sk
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_xmss
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: Trying private key: /home/user01/.ssh/id_dsa
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;debug1: No more authentication methods to try.
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;user01@bc1: Permission denied (publickey,password,hostbased).
04/11/2024 14:22:47;0004;pbs_mom;Fil;1026.bc1.OU;lost connection
04/11/2024 14:22:47;0001;pbs_mom;Svr;pbs_mom;No such file or directory (2) in is_child_path, Failed to allocate memory
04/11/2024 14:22:47;0001;pbs_mom;Fil;stage_file;Job 1026.bc1: no wildcards:remote stageout failed for user01 from /var/spool/pbs/spool/1026.bc1.OU to bc1:/home/user01/STDIN.o1026
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;Job files not copied:---->>>>
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;Unable to copy file /var/spool/pbs/spool/1026.bc1.OU to bc1:/home/user01/STDIN.o1026
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;>>> error from copy
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;bc1: Connection refused
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;bin/ssh host bc1, user user01, command scp -v -r -p -t /home/user01/STDIN.o1026
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;OpenSSH_8.9p1 Ubuntu-3ubuntu0.6, OpenSSL 3.0.2 15 Mar 2022
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Reading configuration data /etc/ssh/ssh_config
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: /etc/ssh/ssh_config line 21: Applying options for *
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Connecting to bc1 [192.168.1.x] port 22.
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Connection established.
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_rsa type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_rsa-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ecdsa type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ecdsa-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ecdsa_sk type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ecdsa_sk-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ed25519 type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ed25519-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ed25519_sk type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_ed25519_sk-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_xmss type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_xmss-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_dsa type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: identity file /home/user01/.ssh/id_dsa-cert type -1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Local version string SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Remote protocol version 2.0, remote software version OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: compat_banner: match: OpenSSH_8.9p1 Ubuntu-3ubuntu0.6 pat OpenSSH* compat 0x04000000
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Authenticating to bc1:22 as 'user01'
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_KEXINIT sent
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_KEXINIT received
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: kex: algorithm: curve25519-sha256
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: kex: host key algorithm: ecdsa-sha2-nistp256
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_KEX_ECDH_REPLY received
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Server host key: ecdsa-sha2-nistp256 SHA256:zkUIBIPQfiPPXB0Ke3Oz3ewOttrAkD/u6HHyHAj1CaE
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Host 'bc1' is known and matches the ECDSA host key.
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Found key in /etc/ssh/ssh_known_hosts:1
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: ssh_packet_send2_wrapped: resetting send seqnr 3
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: rekey out after 134217728 blocks
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_NEWKEYS sent
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: expecting SSH2_MSG_NEWKEYS
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: ssh_packet_read_poll2: resetting read seqnr 3
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_NEWKEYS received
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: rekey in after 134217728 blocks
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_rsa
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_ecdsa
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_ecdsa_sk
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_ed25519
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_ed25519_sk
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_xmss
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Will attempt key: /home/user01/.ssh/id_dsa
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_EXT_INFO received
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com>
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: kex_input_ext_info: publickey-hostbound@openssh.com=<0>
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: SSH2_MSG_SERVICE_ACCEPT received
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Next authentication method: hostbased
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: userauth_hostbased: trying hostkey ssh-ed25519 SHA256:cFfGK9RDRoROg+q165GygeY3JKyAErB8N/r2jvWvgCk using sigalg ssh-ed25519
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: userauth_hostbased: trying hostkey ecdsa-sha2-nistp256 SHA256:RekNqKgep3rM2V9bQP0Dl0TkJvP4ed6vTj7tVCvJwCY using sigalg ecdsa-sha2-nistp256
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: userauth_hostbased: trying hostkey ssh-rsa SHA256:ZAZVG6Ojo+Ny1HgevebzsHUNxWe1e/cn2C2Fy/sHFv4 using sigalg rsa-sha2-512
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Authentications that can continue: publickey,password,hostbased
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: No more client hostkeys for hostbased authentication.
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Next authentication method: publickey
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_rsa
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_ecdsa
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_ecdsa_sk
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_ed25519
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_ed25519_sk
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_xmss
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: Trying private key: /home/user01/.ssh/id_dsa
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;debug1: No more authentication methods to try.
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;user01@bc1: Permission denied (publickey,password,hostbased).
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;lost connection
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;>>> end error output
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;Output retained on that host in: /var/spool/pbs/undelivered/1026.bc1.OU
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;---->>>>
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;Staged 0/2 items out over 0:01:34
04/11/2024 14:22:47;0008;pbs_mom;Job;1026.bc1;no active tasks
04/11/2024 14:22:47;0080;pbs_mom;Req;req_reject;Reject reply code=15051, aux=0, type=54, from root@192.168.1.x:15001
04/11/2024 14:22:47;0100;pbs_mom;Job;1026.bc1;Obit sent
04/11/2024 14:22:47;0100;pbs_mom;Req;;Type 6 request received from root@192.168.1.x:15001, sock=0
04/11/2024 14:22:47;0080;pbs_mom;Job;1026.bc1;delete job request received
04/11/2024 14:22:47;0008;pbs_mom;Job;1026.bc1;kill_job
04/11/2024 14:22:47;0100;pbs_mom;Req;;Type 6 request received from root@192.168.1.x:15001, sock=0
04/11/2024 14:22:47;0080;pbs_mom;Job;1026.bc1;delete job request received
04/11/2024 14:22:47;0080;pbs_mom;Req;req_reject;Reject reply code=15001, aux=0, type=6, from root@192.168.1.x:15001
You could implement hostbased passwordless ssh authencation. This is at the host level, so you do not need to create keys for individual users and share it across.
The passwordless ssh for all the users should work seamlessly between (make sure StrictHostKeyChecking is set to no in ssh_config)
server and compute node(s)
compute node(s) and server
compute node(s) to compute node(s) – if you are running MPI jobs