Design for refactoring PBS database code

Here is the design page for refactoring PBS database code. I am proposing to make databse code to a dynamic library and PBS will talk to this library over bunch of APIs.

https://pbspro.atlassian.net/wiki/spaces/PD/pages/1524563969/DB+Refactor+Design.

Thanks.

Thanks for posting this Ashwath. The doc looks good! Some comments:

  • “PBS_EXEC/sbin/pbs_dataservice <start|stop|status>”: Can you please mention here what “status” will return?
  • “PBS_EXEC/sbin/pbs_ds_password -r | -C <user_name>”:
    • Can you please explain what the options -r and -C do? Are there any other options?
    • Does this script configure the underlying db? If yes, how will it work for different kinds of db? Will one have to modify this file to add support for a new db?
    • Does “user_name” need to be an existing user or can the script actually create a new user?
  • Just a thought, right now it seems like if one wanted to replace the db service from postgres to, say, redis, then they’d need their own version of pbs_dataservice, pbs_dataservice and pbs_db_utility, along with the libdb.so. Why not instead move the functionality of these additional scripts also to libdb.so and convert these scripts into generic programs instead which link with libdb? That way, one will just have to replace postgres libdb.so with redis libdb.so in order to completely replace postgres with redis.
  • PBS_EXEC/include/pbs_db.h - can you please provide details about all the members of these structs?
  • pbs_db_connect:
    • how about renaming pbs_data_service_host and pbs_data_service_port to pbs_ds_host and pbs_ds_port instead?
    • Should this interface take a ‘timeout’ argument?
  • pbs_db_disconnect:
    • You’ve mentioned that this function will also stop the db. Since there’s a separate function for stopping the db, i think you should remove that from here.
    • this function takes a “char **db_msg” while connect takes a “char *err_msg”, can we make them consistent? If this is a return pointer then I think it will need to be a double pointer anyways which points to a single string.
    • “db_msg[out] : Connection logs/messages generated by libdb”: I think any log messages generated by libdb should be logged separately. How about a new “db_logs” directory similar to other logs in pbs?
  • get_db_errmsg: It seems like most interfaces return an error message directly anyways. So, is this really needed? If the interfaces returned an error code instead of an error message then this interface would make more sense.
  • pbs_start_db, pbs_stop_db and pbs_status_db: Now that I think about it, pbs_dataservice is going to be the way one would start, status and stop the db right? I don’t think we need these functions, what do you think?

Thanks for the comments.

  • “PBS_EXEC/sbin/pbs_dataservice <start|stop|status>”: Can you please mention here what “status” will return?

The status option will return the status of the database instance.

  • “PBS_EXEC/sbin/pbs_ds_password -r | -C <user_name>”:
  • Can you please explain what the options -r and -C do? Are there any other options?
  • Does this script configure the underlying db? If yes, how will it work for different kinds of db? Will one have to modify this file to add support for a new db?
  • Does “user_name” need to be an existing user or can the script actually create a new user?

I updated the page with info on the options. -r is to set a random password to the current database user. -C can create or change the database user to a new one.

  • Just a thought, right now it seems like if one wanted to replace the db service from postgres to, say, redis, then they’d need their own version of pbs_dataservice , pbs_dataservice and pbs_db_utility , along with the libdb.so. Why not instead move the functionality of these additional scripts also to libdb.so and convert these scripts into generic programs instead which link with libdb? That way, one will just have to replace postgres libdb.so with redis libdb.so in order to completely replace postgres with redis.

pbs_ds_password is already a binary executable. Since pbs_dataservice and utility scripts will be mostly used by pre-configure scripts of initial pbs startup, it makes sense to retain them as scripts. What do you think?

  • PBS_EXEC/include/pbs_db.h - can you please provide details about all the members of these structs?

Added structure info to the page.

  • pbs_db_connect :
  • how about renaming pbs_data_service_host and pbs_data_service_port to pbs_ds_host and pbs_ds_port instead?

Updated.

  • Should this interface take a ‘timeout’ argument?

Yes, this will make sure PBS server won’t wait indefinitely trying to connect to database.

  • pbs_db_disconnect :
  • You’ve mentioned that this function will also stop the db. Since there’s a separate function for stopping the db, i think you should remove that from here.
  • this function takes a “char **db_msg” while connect takes a “char *err_msg”, can we make them consistent? If this is a return pointer then I think it will need to be a double pointer anyways which points to a single string.

Yes, corrected it now. I removed failcode. I guess we can rely on err_msg to capture both system and db error messages.

  • “db_msg[out] : Connection logs/messages generated by libdb”: I think any log messages generated by libdb should be logged separately. How about a new “db_logs” directory similar to other logs in pbs?

PBS server is the one who is talking to libdb so it only makes sense we log the messages to server log. No?

  • get_db_errmsg : It seems like most interfaces return an error message directly anyways. So, is this really needed? If the interfaces returned an error code instead of an error message then this interface would make more sense.
  • pbs_start_db, pbs_stop_db and pbs_status_db : Now that I think about it, pbs_dataservice is going to be the way one would start, status and stop the db right? I don’t think we need these functions, what do you think?

After reading your comment, yes it makes sense. The server internally uses data service script to start/stop the database.

Let me know your opinion on the changes I have made. Thanks.

Hey sorry for the late reply.

I meant to say can you list down exactly what it can return? “Up”/“Down”/“Running”/“Unknown” etc., and what these states will mean?

“dynamic library will have the functionality for the PBS server to access the database”

Just checking, is server the only daemon that write to the db? mom also has job_save() calls, so i thought ill ask.

“PBS_DB_CONNECT_STATE_NOT_CONNECTED” that’s quite long, how about replacing “CONNECT_STATE” with just “CONN” ? so this one will become PBS_DB_CONN_NOT_CONNECTED?

“PBS_DB_DOWN”, “PBS_DB_STARTING”, “PBS_DB_STARTED”: “down” and “started” don’t go together, please make it “up” and “down”, or “started” and “ended”

“pbs_db_mominfo_time_t *pbs_db_mominfo_tm”: how about making this a pbs_db_mom_info_t instead? that’ll make it consistent with the other object types.

pbs_db_job_info:
“ji_svrflags;”: what is this?
“ji_un_type”: please rename this to be more intuitive
ji_momaddr; host addr of Server: the var name says momaddr but comment says addr of server? if it’s mom addr, what will this be if the job runs on multiple moms?
ji_rteretry;: what is this?
"char ji_4jid[8]; /* extended job save data */
** char ji_4ash[8]; /* extended job save data */

Can these be combined into one?

pbs_db_svr_info:
sv_jobidnumber: will this be useful after multi-server?

pbs_db_sched_info:
Would a ’ partition_name field make sense here? now that scheds can only serve one partition, it might speed up db queries to find the sched associated with a partition if we add a field for it here. What do you think? Similarly, it might be useful to add a parititon_name field to queue and node info structures.

pbs_db_query_options_t: Will this make sense for a no SQL db ?

" query_cb_t: Function pointer for call back function to process the data returned by the database."
That seems like a weird name for a function pointer, how about just calling it “query_cb” ? Can you also mention what kind of “processing” is this function intended for? (as opposed to the kind of processing that the Server might do after querying data from db)

PBS_DB_MOMINFO_TIME”: Again, this sticks out, I think we should just call it PBS_DB_MOM

under pbs_db_save_obj():
1 - Execution of prepared statement failed.
0 - Success and > 0 rows were affected.
1 - Execution succeeded but the statement did not affect any rows.

did you mean -1 for one of them? Same comment for other interfaces as well.

int pbs_db_load_obj(pbs_db_conn_t *conn, pbs_db_obj_info_t *obj)
int pbs_db_find_obj(pbs_db_conn_t *conn, pbs_db_obj_info_t *obj, pbs_db_query_options_t *opts, query_cb_t query_cb)

shouldn’t ‘obj’ be a double pointer in these functions?

Under pbs_db_del_attr_obj:
" Returns: Error code
0 - Success
1 - On Failure"

Will it return an error code or 0/1? I don’t think it can return both, it’ll be confusing to the callers.

pbs_db_start, pbs_shutdown_db and pbs_status_db: Don’t we need char *pbs_ds_host and int pbs_ds_port here? or a connection handler for shutdown and status?

Updated now

Yes, the server is the only one who is talking to the DB.

I updated the design to not to manage these states anymore. Figured its the application layer which is concerned and nothing to do with libdb itself.

But the structure itself is used for a specific thing, which in this context mominfo time used to map host to vnodes. So I feel the name makes sense.

Thanks for bringing this up. We have some redundant variables in our core component structures which is increasing the technical debt. I can update the design to remove them to some extent but I strongly feel (since its independent of libdb work) we need to have a separate design and PR to refactor these core structures. What do you think?

We might still need it till multi-server is implemented right?

Again, I feel this can be done independently than db refactoring work. But of-course we can brainstorm more on this.

We will still run queries in NoSQL DB as well. This can help to pass clauses to queries. For example, with mutli-server where we would like to query based on timestamp to get delta changes, we can send timestamps using this struct.

Its a typedef to function pointer hence the name. I added some more description.

Wouldnt it make it very generic when we are only worried about mominfo_time.

The page is updated now.

obj is not carrying out any output, so no.

Updated now.

Updated now.

Thanks for your comments. Please check the updates and share more feedback.

I’m ok with that

Ok, ya this can be removed in the design that talks about multiple servers

sure, it was just a thought. @arungrover and @suresht what do you think about this particular field?

I remember us discussing this and realizing that the find and load functions should probably be combined into 1 as they seem to do similar things, one finds multiple objects while the other finds one specific object. What do you think about that?

I think it will be useful to have partition name saved in the sched/queue/node/reservation info structures. Now I don’t know the internals of multi-server but we do try to find server/queue on the basis of partition name so it might be useful to add that.

I have another comment about qrank job attribute. Please change it to BIGINT because we have seen integer overflows as qrank is now measured in milliseconds.

We do have a partition attribute on queue, and node. The design does not explicitly talks about them as they are handled in a similar way. Most of the object attributes are saves as hstore attribute which is separate from the fixed area.

Thanks Nithin! I don’t understand hstore schema that well, but can one query db where primary key is one of the hstore attribute (partition in this case)? If so, then whatever we have is fine.

Having partition_name as a field really helps in making the query faster. Having said that we can also achieve the same by doing the following.
We can actually create an index on a specific hstore key i.e. partition attribute in this case and use it while querying. This way the query runs faster.
For example
CREATE UNIQUE INDEX “partition_idx” ON “pbs.scheduler” (
(attributes → ‘partition’)
);