Looks like your job started running and when it finished, PBS mom was trying to stage out the files (output, error files). However, the error messages seem to indicate that the passwordless scp did not work. It tried various authentication methods, but eventually gave up.
The ifl request that failed is 54 (libpbs.h #define PBS_BATCH_CopyFiles 54). Basically this means that the stageout of the stderr, stdout or similar files failed. Since your /etc/pbs.conf mentions /bin/scp that is the tool that was used to copy, so you may need to check the scp configuration.
Hmm I don’t get it. The comunication between the xyz@pbs_head (which submits job) and all the other xyz users on the nodes works without any problem. Between which users is there the communication, only between users or between the roots? Because I don’t know for which users should I set the passwordless connection: between xyz@nodes or root@nodes?
Stageout copies the files as the user (i.e. the “euser” attribute of the job), to the host specified in the Output_Path/Error_Path attribute of the job.
If you submit from a location on your cluster with shared storage, please add $usecp lines in the MoM config files to tell MoM to use “plain” cp instead of scp.
I am not using the shared storage (maybe I will try it also in the future, I would like to start a bit easier )
What is quite interesting the same command which fails during authorisation by PBS MoM:
Works without any problems when I run it later by myself:
The error is: key_parse_private2: missing begin marker
Have you experienced such problems? I know it may be the problem with the access permissions, but I don’t see any reasonable reason why it could be like this. As always, thank you very much for your help, in advance
hi guys,
I have similar situation.regarding the HPC user account.I found if I don’t add $usecp lines in the MoM config and share the /data folder through NFS. The job will fail. Do we need create individual user account and configure ssh keyless access between headnode and compute node for all HPC user? Thanks
The short answer is yes, you need to create accounts and setup pasword-free ssh access between the submission machines and the execution hosts. Please refer to section 12.8 of the PBS Pro Administrator’s Guide located here: http://www.pbsworks.com/SupportGT.aspx?d=PBS-Professional,-Documentation
The first thing you want to check is the qmgr command you used. It should be “create node” rather than “create nodes”. The second thing is to determine whether the host “pbs-slave” resolves on your system. Try using the command “host pbs-slave” to see if the lookup is happening correctly. If not, you’ll need to address the hostname resolution issue on your network.
The command “host pbs-slave” returned error info like below:
[linux@pbs-master ~]$ host pbs-slave
Host pbs-slave not found: 3(NXDOMAIN)
For the reason I used the virtual machine created by openstack, I asked my colleague,that because the DNS server on our testing environment is not available, but I have configured the /etc/hosts, so it can ping.
I’m assuming you also have a pbs-master host? Entries for both must exist in the /etc/hosts file and you must be able to ping each host from the other. At that point, you should be able to add the pbs-slave node to your complex.
We have an hpc system and users have already been created on the headnode and synced to login node. Do we need to sync to compute node as well before setting up passwordless ssh ?