PP-702: Design document review for RPM based install under CLE 5.2

This message is to inform the community that the design document covering RPM based installations and upgrades on Cray CLE 5.2 systems is now available. You may review the document here:

https://pbspro.atlassian.net/wiki/display/PD/PP-702%3A+Installation+and+upgrades+on+Cray+XC+CLE+5.2+systems

New location after title change:
https://pbspro.atlassian.net/wiki/display/PD/PP-702%3A+Installation+and+upgrades+on+Cray+X-series+CLE+5.2+systems

Please provide comments in response to this post. Thank you!

Hi @mkaro,

Overall the document looks good, I have only one question - is the path to apbasil always same on all installations of CLE5.2? I ask this because we are hard-coding the path as per Interface 6.

Also, could you please add a link from the design page to the community discussion.

Thanks,
Prakash

Thanks @prakashcv13, I have added the link you requested.

To answer your question… yes, the apbasil command may be found in the same location for CLE5.2 and earlier systems that we currently support.

Hi @mkaro ,
I feel the new instruction for providing the dataservice user is complex and not very neat. On one hand we are trying to automate everything to better the user experience but at the same time requesting admin to create directory,sub directories and config files for dataservice user looks very unappealing( Specially in case where the PBS_HOME directory could be a lengthy path).
I remember that we agreed upon asking admin to export the PBS_DATA_SERVICE_USER environment variable before starting the pbs services for the first time, any reason why we are not going ahead with the same.

Also current implementation of pbs_habitat and postinstall throws error if PBS_HOME directory already exists but doesn’t have all the sub directories and necessary files. Rest of EDD looks good to me.

Hi @mkaro,

Regarding the installation instructions:

  • To be more generic I suggest mentioning installing the “appropriate PBS Pro license” or some similar wording without mentioning file based licensing which is not currently supported. Keep the example that shows setting pbs_license_info and which applies to both floating and socket licensing.
  • Are there instructions for installing client commands-only?
  • Using a plain Linux system with the Cray cluster is currently supported. Can we use the same instructions that apply to plain Linux when installing PBS on a plain Linux box in a Cray cluster?
  • Do we need separate instructions for the failover setup when upgrading?
  • Could we specify the data service user via some environment variable?

Hi @mkaro,

Thanks for writing this up. I have a few comments:

  1. There’s a typo in interface 6, “$vnode_additive” should be “$vnodedef_additive”

For the install instructions I have some questions/comments:

  1. The install instructions for an older PBS release (for example the 13.1 Install Guides) say to create a “pbs-install” directory (I believe this was suggested by Cray). Is there a need to change the directory to “pbspro”? I’d rather not change to “pbspro” if it isn’t necessary.

  2. As I followed the steps I got the following to stdout, and I believe it needs to be documented as a new interface:
    *** =======
    *** NOTICE:
    *** =======
    *** PBS Pro commands have moved.
    *** Old location: /opt/pbs/default
    *** New location: /opt/pbs
    *** Users will need to ensure their PATH and MANPATH are set correctly.
    *** In most cases, users must simply logout and log back in to source
    *** the new files in /etc/profile.d.

  3. Are these stdout messages new to this feature (or part of the earlier linux rpm install changes)? IF these messages are being introduced with this feature, they also need to be documented in this design. I have separated the various messages in question with “AND”:

*** Existing configuration file found: /etc/pbs.conf
*** Saving /etc/pbs.conf as /etc/pbs.conf.pre.17.2.0.20170307180403

AND
*** PBS_HOME is /var/spool/PBS
*** Existing environment file left unmodified: /var/spool/PBS/pbs_environment

AND
*** The PBS Pro server has been installed in /opt/pbs/sbin.
*** The PBS Pro scheduler has been installed in /opt/pbs/sbin.

AND
*** The PBS Pro communication agent has been installed in /opt/pbs/sbin.

*** The PBS Pro MOM has been installed in /opt/pbs/sbin.

*** The PBS commands have been installed in /opt/pbs/bin.

  1. I’m pretty sure this message is new and unique to installing on a Cray, therefore it should be documented as a new interface:
    *** Found vntype in sched_config resources

5a) Where is it “finding” vntype in the sched_config resources? At this point PBS_HOME has not been created yet…

  1. looks like PBS_SCP is getting listed twice in /etc/pbs.conf
    boot-p1:/rr/current/software/pbspro # xtopview -e "cat /etc/pbs.conf"
    PBS_EXEC=/opt/pbs
    PBS_HOME=/var/spool/PBS
    PBS_START_SERVER=0
    PBS_START_MOM=0
    PBS_START_SCHED=0
    PBS_START_COMM=0
    PBS_SERVER=sdb
    PBS_CORE_LIMIT=unlimited
    PBS_SCP=/usr/bin/scp
    PBS_SCP=/usr/bin/scp

  2. For the dataservice account…the 14.2 LINUX install instructions say that I have to set PBS_DATA_SERVICE_USER environment variable BEFORE installing PBS. The steps seem to imply that it is not the case for the Cray. If so, that should be documented as an interface change in the design.

  3. The db_user file is not a public interface, so we should not have the admin write to it/create it.

  4. Is there a reason that we build the PBS_HOME area by hand on the sdb? Can we have pbs_habitat build us the PBS_HOME area on the sdb?

  5. There is an extra space after /opt/pbs in the following message to stdout:
    *** Postinstall script called as follows:
    *** /opt/pbs/libexec/pbs_postinstall server 17.2.0.20170307180403 /opt/pbs /var/spool/PBS sameconf

10a) What does “sameconf” mean? Why is it there?

  1. When I started up the mom (only) on a separate node the following message was printed AND the server_* and sched_* directories were also created. But in prior releases there were no such messages, and the directories for PBS daemons that were not started on that machine were not previously created.
    Here’s what stdout had:
    *** The PBS Pro server has been installed in /opt/pbs/sbin.
    *** The PBS Pro scheduler has been installed in /opt/pbs/sbin.
    *** Added vntype to sched_config resources

One thing I only noticed now…the title on the design page says “Cray XC”. This should be changed to “Cray X-series” to be more accurate.

Mike, there is one upgrade procedure posted. Could you make clear whether this is migration or overlay upgrade?

Also, can you confirm that this RFE then does not provide support for the other method? If that is the intent, the EDD should specify that.

Do the installation and upgrade procedures automatically

  • add ‘vntype’ to the resources line of sched_config?
  • enable the PBS_translate_mpp.HK hook?

Older installations using the INSTALL script would automatically add “vntype” and enable the PBS_translate_mpp hook.

First, thank you for your feedback. Addressing comments up to this point…

@dilip-krishnan:

  • I agree with you. I have updated the instructions and the script to utilize PBS_DATA_SERVICE_USER as suggested.
  • I believe I have also addressed your concern regarding directory creation. Please confirm this is the case.

@vccardenas:

  • I adjusted the instructions for configuring the PBS Pro license.
  • There are no instructions for installing the client commands only because with CLE 5.2 PBS Pro needs access to the daemons from the PBS server node and the login nodes. We’re not doing anything different than we had done in the past with the INSTALL script.
  • You should use the Cray installation instructions when installing on CLE based nodes and generic instructions for installing on generic Linux nodes (standard RPM installation with no xtopview commands). On these nodes you may be able to install just the pbspro-execution or pbspro-client RPM.
  • The current Cray installation instructions do not cover failover on Cray systems.
  • The data service user is now specified during initial startup of the server by prefixing the command with PBS_DATA_SERVICE_USER=. Instructions have been updated.
  • The installation procedures automatically add vntype to the sched_config resources line and enable the PBS_translate_mpp hook.

@lisa-altair:

  1. Fixed.
  2. The use of the string “pbs” or “PBS” is ambiguous in terms of the product you are referring to. The strings “PBSPro” and “pbspro” are unambiguous because they reference PBS Professional specifically. We should be specific where possible. By convention, I see directory names like “mysql” in /rr/current/software as opposed to mysql-install.
  3. Added interface #8.
  4. The messages you cited are not new to this feature. That’s also true for your question on #3.
  5. I have removed this message. I was using it mostly for debugging, and an administrator might find it confusing.
    5a. It is finding vntype in the resources line of sched_config. PBS_HOME has been created because the execution of pbs_postinstall is deferred to initial startup as opposed to being run as part of the RPM installation.
  6. Fixed.
  7. Instructions have been updated to pass PBS_DATA_SERVICE_USER at initial startup.
  8. We are no longer asking the administrator to create/write/modify this file.
  9. The only reason the instructions included manual steps to create PBS_HOME was to later create the db_user file. This is no longer the case and PBS_HOME is created automatically.
  10. That space is due to an empty argument being passed. I didn’t do the work to introduce “sameconf” into pbs_habitat and pbs_postinstall. That was done here: https://github.com/PBSPro/pbspro/pull/157
  11. The pbs_postinstall script is checking for the presence of the server and scheduler binaries. You see the messages because we are performing a server install across the shared root.

Other notes:

  • Updated title to Cray X-series
  • The installation automatically enables the PBS_translate_mpp hook and adds vntype to the sched_config resources.

@iestockdale:

  • The procedure documents an overlay upgrade (only one server running). The RFE does not require documentation of migration upgrades. The prior installation instructions for Cray CLE systems also do not cover this per section 3.6 of the PBS Pro Installation Guide for v13.0. I added a note to the document.

A pull request has been filed here: https://github.com/PBSPro/pbspro/pull/279

Comments and feedback are welcome.

Due to the changes made to the title, looks like the external design can now be found here: https://pbspro.atlassian.net/wiki/display/PD/PP-702%3A+Installation+and+upgrades+on+Cray+X-series+CLE+5.2+systems

I edited the initial post in this series to contain the correct link.

@mkaro and @smgoosen, regarding upgrades, is this the correct summary:
. This EDD is only intended to document new install and overlay ugrade steps.
. Migration upgrade to PBS 17.x is not supported on Cray.

@smgoosen, the upgrade user story needs to be clarified or a new one for upgrade filed, as the support status of the two different types of upgrade is not covered and thus not clear.

Thank you for your detailed responses. I have one comment back.

Since the message I mention in #3 is not new to this feature, there is no need to add interface #8.

I will review your new steps and get back to you.

@lisa-altair: Interface 8 has been removed, per your comment.

@mkaro, I used the updated instructions and here are some observations:

  1. Clean install has two options. The first option, which uses pbsdata, does not
    show the command to use to start PBS unlike the second option.

  2. Overlay upgrade from 13.0.401

  • All the contents of /etc/pbs.conf on MoM node remained the same but on the
    server node (sdb) only the PBS_HOME value remained the same. It seems I had to:
    boot # xtopview -n 30 -e “xtunspec /etc/pbs.conf” instead of
    boot # xtopview -c login -e "xtunspec /etc/pbs.conf"
    in order to unspec the /etc/pbs.conf on the mom.

  • There were errors starting MoM when I used xtopview -c login -e “xtunspec /etc/pbs.conf”

# /etc/init.d/pbs start
Starting PBS
/etc/init.d/pbs: line 315: /opt/pbs/default/bin/qstat: No such file or directory


*** /opt/pbs/default/libexec/pbs_habitat is missing.


  • Since I had used crayadm as the data service user when I installed 13.0.401, I
    started the upgraded PBS this way on the sdb:
    sdb:~ # PBS_DATA_SERVICE_USER=crayadm /etc/init.d/pbs start
    Starting PBS
    PBS Home directory /var/spool/PBS needs updating.
    Running /opt/pbs/libexec/pbs_habitat to update it.

*** End of /opt/pbs/libexec/pbs_habitat
Home directory /var/spool/PBS updated.
/opt/pbs/sbin/pbs_comm ready (pid=14054), Proxy Name:sdb:17001, Threads:4
PBS comm
PBS sched
Connecting to PBS dataservice…connected to PBS dataservice@sdb
Server@sdb: recov_attr_db, unknown attribute “res_released_on_susp” discarded
Using license server at 6200@licenseserver
PBS server

–> note the recov_attr_db, unknown attribute message - is it harmful or benign?

  1. There seems to be a licensing issue.
  • When I installed PBS from mike0042’s branch, I could not get a license from the license server.

  • Then I installed 13.0.401 and got this from qstat -Bf:
    pbs_license_info = 6200@licenseserver
    license_count = Avail_Global:99999 Avail_Local:1 Used:0 High_Use:0 Avail_So
    ckets:0 Unused_Sockets:0
    pbs_version = PBSPro_13.0.401.160285

  • After upgrading to PBS from mike0042’s branch, I can no longer get a license:
    pbs_license_info = 6200@licenseserver
    license_count = Avail_Global:0 Avail_Local:0 Used:0 High_Use:0 Avail_Socket
    s:0 Unused_Sockets:0

  • I’ve updated the acceptance criteria to exclude migration upgrades from PP-702
  • I also created PP-733 to cover migration upgrades on a Cray system

@mkaro, I have some more questions:

  1. Older installations using the INSTALL script had the option of installing client commands only as well as communication only (in addition to “all” and execution only choices).
  • If communication only is desired I understand that one can modify the /etc/pbs.conf file to start just pbs_comm.
  • If client commands only is desired does one keep all the PBS_START_[daemon] = 0 in /etc/pbs.conf?
  1. Will there be installation and upgrade instructions for failover because I think that failover is supported on Cray?

@mkaro, you might not need the steps below in the overlay upgrade from 13.0.40x instructions since they are part of the steps for fresh installation.

  • Log in to the boot node and create the /rr/current/software/pbspro
    directory if it is not present.

  • Copy the new PBS Pro server RPM to the /rr/current/software/pbspro
    on the boot node.