I am a system admin and I have a user who want to submit a job and make most effective use of the machine. He is running on one node and it has 64 cores:
ramos@maury:~$ pbsnodes compute-0-3
compute-0-3
Mom = compute-0-3.local
ntype = PBS
state = free
pcpus = 64
resources_available.arch = linux
resources_available.host = compute-0-3
resources_available.mem = 529419740kb
resources_available.ncpus = 64
resources_available.vnode = compute-0-3
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
ramos@maury:~$
The user ran a job with these parameters. Are they appropriate?
Control | Location | Notes |
---|---|---|
OMP_NUM_THREADS | Environment | Executable uses this value at runtime to set number of parallel threads for OpenMP |
np | mpirun | MPIRUN uses this value to set number of MPI threads; for NAAPS, must equal number of species in namelist |
select | qsub | What does this do? |
ncpus | qsub | What does this do? |
mpiprocs | qsub | What does this do? |
As an example, I have been running a job like this:
-qsub options: select = 15:ncpus=4:host=compute-0-3
-mpirun options: -genv OMP_NUM_THREADS 16 -np 15
The ‘qstat’ for this job indicates this:
resources_used.cpupercent = 2907
resources_used.cput = 266:02:54
resources_used.mem = 198040192kb
resources_used.ncpus = 60
resources_used.vmem = 291175884kb #This is why the job has to run on MAURY
resources_used.walltime = 10:15:55