I have written a hook to setup a scratch file system using beeond (BeeGFS on Demand). In the execjob_begin part of the hook I make sure that the directory that beeond will use for storage is setup on the local SSD disk. Then in the execjob_prologue I start beeond. If I run an interactive job, the beeond directory is mounted and works as expected. If I run the same request in batch the job terminates before it run my script.
In both cases this is the command I am running to start beeond.
12/06/2018 18:23:03;0800;pbs_python;Hook;pbs_python;cmd: beeond start -n /tmp/1273.ip-0A0C1004.beeond -r -d /mnt/pbs_ramdisk -c /mnt/beeond -f /etc/beeond’
The 1273.ip* file I create in the hook that has each node associated to the job listed one per line.
This is what I see in the logs for a batch job.
12/06/2018 17:44:03;0400;pbs_python;Svr;pbs_python;–> Stopping Python interpreter <–
12/06/2018 17:44:03;0400;pbs_mom;Hook;beeond;finished
12/06/2018 17:44:03;0800;pbs_mom;n/a;mom_get_sample;nprocs: 317, cantstat: 2, nomem: 0, skipped: 0, cached: 0
12/06/2018 17:44:03;0008;pbs_mom;Job;1268.ip-0A0C1004;Started, pid = 85978
12/06/2018 17:44:03;0800;pbs_mom;n/a;mom_get_sample;nprocs: 315, cantstat: 0, nomem: 0, skipped: 0, cached: 0
12/06/2018 17:44:03;0080;pbs_mom;Job;1268.ip-0A0C1004;task 00000001 terminated
12/06/2018 17:44:03;0800;pbs_mom;n/a;mom_get_sample;nprocs: 314, cantstat: 0, nomem: 0, skipped: 0, cached: 0
12/06/2018 17:44:03;0008;pbs_mom;Job;1268.ip-0A0C1004;Terminated
12/06/2018 17:44:03;0100;pbs_mom;Job;1268.ip-0A0C1004;task 00000001 cput= 0:00:00
12/06/2018 17:44:03;0008;pbs_mom;Job;1268.ip-0A0C1004;kill_job
When I run the same request in interactive mode I see this in the logs.
12/06/2018 18:23:34;0400;pbs_python;Svr;pbs_python;–> Stopping Python interpreter <–
12/06/2018 18:23:34;0400;pbs_mom;Hook;beeond;finished
12/06/2018 18:23:34;0800;pbs_mom;n/a;mom_get_sample;nprocs: 328, cantstat: 0, nomem: 0, skipped: 0, cached: 0
12/06/2018 18:23:34;0008;pbs_mom;Job;1273.ip-0A0C1004;Started, pid = 93525
If I comment out the beeond start command then batch jobs starts as expect but without the beeond file system.
Any idea why PBS would terminate the job in batch mode but work as expected in interactive mode?
Jon