Looks like your job started running and when it finished, PBS mom was trying to stage out the files (output, error files). However, the error messages seem to indicate that the passwordless scp did not work. It tried various authentication methods, but eventually gave up.
The ifl request that failed is 54 (libpbs.h #define PBS_BATCH_CopyFiles 54). Basically this means that the stageout of the stderr, stdout or similar files failed. Since your /etc/pbs.conf mentions /bin/scp that is the tool that was used to copy, so you may need to check the scp configuration.
Hmm I don’t get it. The comunication between the xyz@pbs_head (which submits job) and all the other xyz users on the nodes works without any problem. Between which users is there the communication, only between users or between the roots? Because I don’t know for which users should I set the passwordless connection: between xyz@nodes or root@nodes?
Stageout copies the files as the user (i.e. the “euser” attribute of the job), to the host specified in the Output_Path/Error_Path attribute of the job.
If you submit from a location on your cluster with shared storage, please add $usecp lines in the MoM config files to tell MoM to use “plain” cp instead of scp.
I am not using the shared storage (maybe I will try it also in the future, I would like to start a bit easier )
What is quite interesting the same command which fails during authorisation by PBS MoM:
Works without any problems when I run it later by myself:
The error is: key_parse_private2: missing begin marker
Have you experienced such problems? I know it may be the problem with the access permissions, but I don’t see any reasonable reason why it could be like this. As always, thank you very much for your help, in advance
I have similar situation.regarding the HPC user account.I found if I don’t add $usecp lines in the MoM config and share the /data folder through NFS. The job will fail. Do we need create individual user account and configure ssh keyless access between headnode and compute node for all HPC user? Thanks
The short answer is yes, you need to create accounts and setup pasword-free ssh access between the submission machines and the execution hosts. Please refer to section 12.8 of the PBS Pro Administrator’s Guide located here: http://www.pbsworks.com/SupportGT.aspx?d=PBS-Professional,-Documentation
And does anyone have any idea why the authorisation may not work in my case as described in my previous post?
Thank you for your help
You need to check whether password-less ssh access is working between the two hosts in question for that particular user who submitted the job.
I entered the same issue, but when I tried to add node as root, it return error like this:
[root@pbs-master linux]# qmgr -c "create nodes pbs-slave"
qmgr: Error (15066) returned from server
I was trying to install pbs in a virtual environment, the OS is CentOS 7.2, please help me
The first thing you want to check is the qmgr command you used. It should be “create node” rather than “create nodes”. The second thing is to determine whether the host “pbs-slave” resolves on your system. Try using the command “host pbs-slave” to see if the lookup is happening correctly. If not, you’ll need to address the hostname resolution issue on your network.
The command “host pbs-slave” returned error info like below:
[linux@pbs-master ~]$ host pbs-slave
Host pbs-slave not found: 3(NXDOMAIN)
For the reason I used the virtual machine created by openstack, I asked my colleague,that because the DNS server on our testing environment is not available, but I have configured the /etc/hosts, so it can ping.
So, do you have any ideas?
I’m assuming you also have a pbs-master host? Entries for both must exist in the /etc/hosts file and you must be able to ping each host from the other. At that point, you should be able to add the pbs-slave node to your complex.
I wonder know that if I install the PBS Pro with root, is there any ways to let me submit jobs as root?
To submit jobs as root, please run the below command:
qmgr -c “set server acl_roots=root”
By default , root user job submissions are disabled.
qmgr -c “set server acl_roots=root”
When you say the “server” , if it means the hostname? And I got this error below:
qmgr: cannot connect to server acl_roots=root”
server is not the hostname of the server: qmgr -c “set server acl_roots=root”
is the correct command.
Please check whether you can run any qmgr commands as root user ?
- check your pbs server services are running
qmgr -c “set server acl_roots+=root”
Missing the “+” symble, now it works well.
Thank you very much, adarsh.
We have an hpc system and users have already been created on the headnode and synced to login node. Do we need to sync to compute node as well before setting up passwordless ssh ?
user accounts should exist on the login nodes and compute nodes.
For passwordless-ssh to work , you would need user accounts to exist on both sides.
Thanks everyone for the help. I finally got PBS to run the jobs. The process I used can be found here