I run a lot of numerical simulations and usually use a managed cluster, however, I’m looking to have my own ‘server’ at home to run small sized jobs. For the simulations I require a few of cores per simulation and so something like the TR 3990X from AMD (64 cores) seems to fit the bill (I would go with a cluster but space is a limiting factor).
My question is, can something like PBS be used on a single node /CPU to schedule jobs to each core of the CPU so that I could use cores 1-10 for job 1, 11-20 for job 2 etc, with each core being 100% dedicated to the task. I assume at least one core would need to be used for the actual scheduling?
You can set resources_avaialble.ncpus=63 on your all in one PBS Pro Complex (server/sched/comm/mom)
You can then submit jobs as below :
qsub -l select=1:ncpus=10:mem=10gb:mpiprocs=10 – /application/executable parameters
qsub -l select=1:ncpus=10:mem=10gb:mpiprocs=10 – /application/executable parameters
Note :
here your application / MPI is intelligent enough to map or pin the cores
PBS Pro schedules the jobs on to the compute nodes and runs the application, it is application that runs that needs to utilise the cores to 100% , PBS Pro cannot make the application to use 100% of each of the cores.
Otherwise 3 + Cgroups would do the job which would efficiently take care of OOM.
some things to take care:
reserve some cores and memory for the Operating System and its services ( including PBS Pro server/scheduler )
qmgr : set node nodename resources_available.ncpus=60 ( if the node has 64 cores )
qmgr : set node nodename resources_available.mem=60gb (if the node has 64gb of memory)
Many thanks for the reply, it is much appreciated to answer the quesitons. Would the usage of a GUI cause any issues with this or can you specify a GUI in linux to run using only a few specific cores?
There would not be any issues using the GUI, provided the GUI uses single core
You can submit a interactive job qsub -I -X select=1:ncpus=1:mem=10gb and then once you get a console, then launch the GUI. If you want more control, then cgroups would help for sure.