Dear all,
I have found a weird problem on pbspro(maybe it is not).
I have the following script(simplify for sake)
#PBS -l select=4:ncpus=1
#PBS -l place=scatter
cd PBS_O_WORKDIR echo ulimit -a ulimit -a echo hostname: hostname echo -e "you are acquiring the resoures:\n(cat $PBS_NODEFILE)"
source [intel2018 script]
mpiexec [my app]
Here is my output file:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514557
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 16384
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 514557
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
hostname:
cn001
you are acquiring the resoures:
cn001
cn002
cn003
cn006
and here is error output file:
[1] DAPL startup: RLIMIT_MEMLOCK too small
[3] DAPL startup: RLIMIT_MEMLOCK too small
[2] DAPL startup: RLIMIT_MEMLOCK too small
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805)…: fail failed
MPID_Init(1859)…: channel initialization failed
MPIDI_CH3_Init(147)…: fail failed
dapl_rc_setup_all_connections_20(1394): generic failure with errno = 872598799
getConnInfoKVS(956)…: PMI_KVS_Get failed
I realized the stack size problem, but the output file had been shown that locked memory is unlimited.
max locked memory (kbytes, -l) unlimited
This problem only happened when I made the job cross hosts. Multiple chunks in the same host will not induce this.
Do anyone have the same experience?
Gratefully for any comments.
Thanks,
Chris
P. S.
I used TORQUE as my PBS and the same job(same source of Intel compiler and my app) can be done on TORQUE.