sometimes on restart of PBS it starts up with:
Cannot enable queue, incomplete definition (15030) in decode_attr_db, Action function failed for enabled attr, errn 15030
for a number of queues which are then not recovered.
going into qmgr to try to rebuild them:
qmgr
Qmgr: delete queue blah
qmgr obj=blah svr=default: Unknown queue
qmgr: Error (15018) returned from server
Qmgr: create queue blah
qmgr obj=blah svr=default: End of File
at this point the server has panicked with:
que_save_db, que_save failed Execution of Prepared statement insert_que failed: ERROR: duplicate key value violates unique constraint “queue_pk”
DETAIL: Key (qu_name)=(blah) already exists.
panic_stop_db, Panic shutdown of Server on database error. Please check PBS_HOME file system for no space condition.
Stopping PBS dataservice
df shows tons of space on all filesystems.
at this point the only option is to delete the database directory and rebuild everything.
which after the 3rd time this happened is now scripted and not difficult.
but it does mean at any random time that PBS gets restarted the system is broken until I rebuild.
thoughts/suggestions?
thanks
steve
The warning about disk space is a generic one, and clearly not the cause of the error here. It appears that an earlier installation process errored out with partial data in the database (apparently something is missing the queue definition of blah). When you try to start PBS again, it does not load the queue blah (due to the missing information and gets an error 15030), so your delete queue fails, but when you try to create the queue with the same name, it fails since it exists in the database (hence the duplicate key).
The simplest way to proceed would be to stop pbs services, delete /var/spool/pbs ($PBS_HOME) and restart pbs services.
If you can reproduce this issue with a set of steps, please do log an issue, so that it can be fixed by the community.
Regards
Subhasis
before I log an issue, as it happened again here is the qmgr output for one of the queues that got corrupted. is there anything missing/illdefined? thanks
Create and define queue frontend
create queue frontend
set queue frontend queue_type = Route
set queue frontend Priority = 101
set queue frontend acl_user_enable = True
set queue frontend acl_users = …
set queue frontend from_route_only = False
set queue frontend resources_max.nodect = 1
set queue frontend resources_default.walltime = 01:00:00
set queue frontend acl_groups = …
set queue frontend route_retry_time = 60
set queue frontend enabled = True
set queue frontend started = True