Trying to move a job crashes server

using an old version of openpbs (20.0.1) for reasons, so maybe this is fixed in a newer version…

I submit a job, life is good.
I submit another job that depends on the previously mentioned job, life is good.
a qsub hook moves this job into another queue while its waiting on the dependency to finish.
the first job goes about its business and runs and finishes.
the 2nd job (being held by PBS as it depended on something) gets the hold released.
life is still good.
at this point I want to do a queue move on the just released job so I have a daemon running watching to do just that:

    jobs = server.status(ptl_testlib.MGR_OBJ_JOB, extend='t')

    for j in jobs:
        if depend_queue in j.get('queue', ''):
            logger.info("Found job {} that is waiting on another".format(j['id']))
            if 'Q' in j.get('job_state', "nope"):
                server.movejob(destination=holding_queue, jobid=j['id'])
                logger.info("Job {} dependency has cleared".format(j['id']))

the server log shows the move:

date;0100;Server@hostname;Job;jobid.hostname;enqueuing into holding, state 1 hop 1

then nothing more and has crashed.
restarting PBS comes back ok but the 2nd job is not in a good state to continue so must be qdel’ed.

thoughts/suggestions?

thanks
s

No clues to start, but will ask a couple of clarifications…

even a silly hello word job with a sleep in it dependent on another silly hello world job?

Anything “else” in the logs (and assuming that logging detail was set to more verbose details and that maybe we there is more than one line to be had) head server or node logs?

at this point both jobs were silly jobs with just a sleep in them.
Im not sure what the logging level is set to, I will crank it up and run more tests
on tuesday.
as well as looking in the node logs. which i hadnt done before.

thanks
s

debug had been set to 511, but for kicks cranked it up to 4095 and tried again.
a few more log lines this time:
date;0100;Server@hostname;Job;1002.fullhost;enqueuing into holding, state 1 hop 1
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: account_entity_limit_usages: entered, INCR on queue holding, op_flag f, alt_res_ptr (nil)
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: set_entity_ct_sum_max: exiting, ret 0 [max_queued limit not set for holding]
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: set_entity_ct_sum_queued: exiting, ret 0 [queued_jobs_threshold limit not set for holding]
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: set_entity_resc_sum_max: entered [alt_res (nil)]
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: set_entity_resc_sum_max: exiting, ret 0 [max_queued_res limit not set for holding]
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: set_entity_resc_sum_queued: entered [alt_res (nil)]
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: set_entity_resc_sum_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for holding]
date;0800;Server@hostname;Job;1002.fullhost;ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0

then the server died.

thanks
s

as a followup to this I had to go through the exercise of updating the PBS version Im using and now with the latest openpbs (23.6.6) this is working as expected.

thanks
s