I have problem with access to the nodes via ssh. Nodes are visible in the network (they are reply for ping command), pbsnodes -a command state section are “free”, when I submit job, it is queued and changing status on R, but nothings is doing with it.
When I trying to logon to the nodes via ssh, I can’t or I logon, I receive information about last login and then I can’t do anything - I don’t have command prompt.
Headnode working correctly. When I submit that testjob, and then I was stop pbs and starts it again, pbsnodes -a state section, for that node on which job was running before restart, is “state-unknown, down”. But one day later pbsnodes -a state for all nodes are “free”.
It was happen on all my nodes simultaneously. What I may do? I need reinstall all my nodes? How I can diagnose that problem?
If your nodes aren’t functioning properly when you ssh into them there is an underlying problem unrelated to PBS Pro. Please try checking your login scripts (.bashrc, .cshrc, /etc/bashrc, etc.) to make sure they are not the cause. Once you can reliably login, then we can diagnose any issues with PBS Pro.
I suspect that sendmail could overloads systems on nodes.
Are nodes sends any mails or just headnode doing this?
I sets " -m abe -M user@domain" option to the “default_qsub_arguments” via qmgr.
Some mails couldn’t been send (I thing that antyspam filter catch them) and than it’s back to the sender. But I also sets “email@example.com” via qmgr so they can not be send and was deferred.
I was clean root mailbox on the headnode (~3,5GB) but problem on nodes don’t disappear. I wondering are nodes sends some mails or not?
Only the PBS Pro server sends mail, not the MoM nodes. I suggest you verify the configuration of your sendmail client outside of PBS Pro to ensure it’s functioning properly.
It turn out that this was a network problem witch the switch port. Now Everything is OK.
I was deleted PBS_MAIL_HOST_NAME attribute from pbs.conf because it is not necessary - when someone want get information about his/her job, he/she must put -m and -M switches to qsub command and then she/he put information about user and host: user@host . Especially when users of PBS and domain are different, so configuration witch PBS_MAIL_HOST_NAME attribute will fail: userPBS@domain.mail - this mail will never deliver because in domain wasn’t such as user. Mail will came back and will send to user in pbs server “mail_from” attribute. When this attribute will be wrong it will be send to pbs roots user mailbox and takes much space from partition /var . So be carefully with setting those arguments