Hello everyone.
I’m about a year and a half into an HPC role that has me administering a 35 node Ubuntu cluster. 1 head node, 1 viz node, 1 admin node, 32 compute nodes. My background previously was developing machine & deep learning models and general data science. Currently, it has a proprietary scheduler and extensive scripting written to create chroot jails and an overlayfs populated with the contents of squashfs containers on the compute nodes.
I have a complete new set of hardware soon to be setup (yay!) which we will be using a PBS & Singularity workflow. After upgrading to a recent version of Singularity I have been able to successfully run the squashfs containers in Singularity. Now I’m trying to understand how to run PBS commands in the containers.
Do I need to use hooks in PBS to the Singularity containers? Create a bootstrapped script? Any advice or conversation is welcomed.
Bit more detail
I’m needing to use some additional features. Specifically:
-
Arrary Jobs
-
Binded Directories/Mounts (Singularity feature)
-
Persistent Overlay (Singularity feature)