I need to up the ulimit for open files from 1024 to maybe 10000.
I have added that limit for a user in /etc/security/limits.conf on all nodes.
I have added “session required pam_limits.so” to /etc/pam.d/login
With a normal login that larger limit now works but not for PBS interactive logins nor PBS jobs.
I have seen this post where they needed to up the stack-size.
They needed to edit PBS_EXEC/lib/init.d/limits.pbs_mom on each mom.
I have therefore edited /usr/pbs/lib/init.d/limits.pbs_mom (on one not busy node) and added at the end of the file:
if [ -f /etc/sgi-release -o -f /etc/sgi-compute-node-release ] ; then
MEMLOCKLIM=`ulimit -l`
NOFILESLIM=`ulimit -n`
STACKLIM=`ulimit -s`
ulimit -l unlimited
ulimit -n 16384
ulimit -s unlimited
fi
NOFILESLIM=10000
then /etc/init.d/pbs restart
Using a PBS interactive login job to login to that node I get the old limit still:
mynode~$ ulimit -Sn
1024
If I login directly without going through PBS I get the larger limit set in limits.conf as expected.
So there is something else I need to do to get PBS to set a ulimit. (Note: I have not changed limits.pbs_mom on the head node.)
I’m using PBSPro 14.2. There is no mention of ulimit or NOFILESLIM in the Admin or Reference Guides.
I added those 4 ulimit commands to limits.pbs_mom just one one non-busy node. Then /etc/init.d/pbs restart. Now an interactive PBS does show the higher ulimit
So I should not have used “NOFILESLIM=10000” (it looked to me like NOFILESLIM was a PBS env variable) and I should use the actual ulimit commands in limits.pbs_mom
Do I need to do a “init.d/pbs restart” or would a HUP of the mom process be sufficient? i.e.
I’ve been having similar issues, my solution is the following:
In /etc/systemd/system.conf append “DefaultTasksMax=65536”
set hardnofile in /etc/security/limits.conf to “* hard nofile 131072”
Thanks for this help. I have rolled out those changes to several nodes that had no jobs on them, installing the new limits.pbs_mom and a full restart of PBS mom. From my testing that looks like it has worked fine.
Hi Vincent
Check if you already have a file /opt/pbs/lib/init.d/limits.pbs_mom on your execution nodes. If so make a backup of it.
Create your new version and copy it into place.
On each exec node restart PBS: /etc/init.d/pbs restart
You can do this without affecting running jobs.
Though best still to do this on a test node first.
Mike