This message is to inform the community that the design document covering RPM based installations and upgrades on Cray CLE 5.2 systems is now available. You may review the document here:
Overall the document looks good, I have only one question - is the path to apbasil always same on all installations of CLE5.2? I ask this because we are hard-coding the path as per Interface 6.
Also, could you please add a link from the design page to the community discussion.
Hi @mkaro ,
I feel the new instruction for providing the dataservice user is complex and not very neat. On one hand we are trying to automate everything to better the user experience but at the same time requesting admin to create directory,sub directories and config files for dataservice user looks very unappealing( Specially in case where the PBS_HOME directory could be a lengthy path).
I remember that we agreed upon asking admin to export the PBS_DATA_SERVICE_USER environment variable before starting the pbs services for the first time, any reason why we are not going ahead with the same.
Also current implementation of pbs_habitat and postinstall throws error if PBS_HOME directory already exists but doesn’t have all the sub directories and necessary files. Rest of EDD looks good to me.
To be more generic I suggest mentioning installing the “appropriate PBS Pro license” or some similar wording without mentioning file based licensing which is not currently supported. Keep the example that shows setting pbs_license_info and which applies to both floating and socket licensing.
Are there instructions for installing client commands-only?
Using a plain Linux system with the Cray cluster is currently supported. Can we use the same instructions that apply to plain Linux when installing PBS on a plain Linux box in a Cray cluster?
Do we need separate instructions for the failover setup when upgrading?
Could we specify the data service user via some environment variable?
Thanks for writing this up. I have a few comments:
There’s a typo in interface 6, “$vnode_additive” should be “$vnodedef_additive”
For the install instructions I have some questions/comments:
The install instructions for an older PBS release (for example the 13.1 Install Guides) say to create a “pbs-install” directory (I believe this was suggested by Cray). Is there a need to change the directory to “pbspro”? I’d rather not change to “pbspro” if it isn’t necessary.
As I followed the steps I got the following to stdout, and I believe it needs to be documented as a new interface:
*** =======
*** NOTICE:
*** =======
*** PBS Pro commands have moved.
*** Old location: /opt/pbs/default
*** New location: /opt/pbs
*** Users will need to ensure their PATH and MANPATH are set correctly.
*** In most cases, users must simply logout and log back in to source
*** the new files in /etc/profile.d.
Are these stdout messages new to this feature (or part of the earlier linux rpm install changes)? IF these messages are being introduced with this feature, they also need to be documented in this design. I have separated the various messages in question with “AND”:
AND
*** PBS_HOME is /var/spool/PBS
*** Existing environment file left unmodified: /var/spool/PBS/pbs_environment
AND
*** The PBS Pro server has been installed in /opt/pbs/sbin.
*** The PBS Pro scheduler has been installed in /opt/pbs/sbin.
AND
*** The PBS Pro communication agent has been installed in /opt/pbs/sbin.
*** The PBS Pro MOM has been installed in /opt/pbs/sbin.
*** The PBS commands have been installed in /opt/pbs/bin.
I’m pretty sure this message is new and unique to installing on a Cray, therefore it should be documented as a new interface:
*** Found vntype in sched_config resources
5a) Where is it “finding” vntype in the sched_config resources? At this point PBS_HOME has not been created yet…
looks like PBS_SCP is getting listed twice in /etc/pbs.conf
boot-p1:/rr/current/software/pbspro # xtopview -e "cat /etc/pbs.conf"
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/PBS
PBS_START_SERVER=0
PBS_START_MOM=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_SERVER=sdb
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp
PBS_SCP=/usr/bin/scp
For the dataservice account…the 14.2 LINUX install instructions say that I have to set PBS_DATA_SERVICE_USER environment variable BEFORE installing PBS. The steps seem to imply that it is not the case for the Cray. If so, that should be documented as an interface change in the design.
The db_user file is not a public interface, so we should not have the admin write to it/create it.
Is there a reason that we build the PBS_HOME area by hand on the sdb? Can we have pbs_habitat build us the PBS_HOME area on the sdb?
There is an extra space after /opt/pbs in the following message to stdout:
*** Postinstall script called as follows:
*** /opt/pbs/libexec/pbs_postinstall server 17.2.0.20170307180403 /opt/pbs /var/spool/PBS sameconf
10a) What does “sameconf” mean? Why is it there?
When I started up the mom (only) on a separate node the following message was printed AND the server_* and sched_* directories were also created. But in prior releases there were no such messages, and the directories for PBS daemons that were not started on that machine were not previously created.
Here’s what stdout had:
*** The PBS Pro server has been installed in /opt/pbs/sbin.
*** The PBS Pro scheduler has been installed in /opt/pbs/sbin.
*** Added vntype to sched_config resources
I adjusted the instructions for configuring the PBS Pro license.
There are no instructions for installing the client commands only because with CLE 5.2 PBS Pro needs access to the daemons from the PBS server node and the login nodes. We’re not doing anything different than we had done in the past with the INSTALL script.
You should use the Cray installation instructions when installing on CLE based nodes and generic instructions for installing on generic Linux nodes (standard RPM installation with no xtopview commands). On these nodes you may be able to install just the pbspro-execution or pbspro-client RPM.
The current Cray installation instructions do not cover failover on Cray systems.
The data service user is now specified during initial startup of the server by prefixing the command with PBS_DATA_SERVICE_USER=. Instructions have been updated.
The installation procedures automatically add vntype to the sched_config resources line and enable the PBS_translate_mpp hook.
@lisa-altair:
Fixed.
The use of the string “pbs” or “PBS” is ambiguous in terms of the product you are referring to. The strings “PBSPro” and “pbspro” are unambiguous because they reference PBS Professional specifically. We should be specific where possible. By convention, I see directory names like “mysql” in /rr/current/software as opposed to mysql-install.
Added interface #8.
The messages you cited are not new to this feature. That’s also true for your question on #3.
I have removed this message. I was using it mostly for debugging, and an administrator might find it confusing.
5a. It is finding vntype in the resources line of sched_config. PBS_HOME has been created because the execution of pbs_postinstall is deferred to initial startup as opposed to being run as part of the RPM installation.
Fixed.
Instructions have been updated to pass PBS_DATA_SERVICE_USER at initial startup.
We are no longer asking the administrator to create/write/modify this file.
The only reason the instructions included manual steps to create PBS_HOME was to later create the db_user file. This is no longer the case and PBS_HOME is created automatically.
That space is due to an empty argument being passed. I didn’t do the work to introduce “sameconf” into pbs_habitat and pbs_postinstall. That was done here: https://github.com/PBSPro/pbspro/pull/157
The pbs_postinstall script is checking for the presence of the server and scheduler binaries. You see the messages because we are performing a server install across the shared root.
Other notes:
Updated title to Cray X-series
The installation automatically enables the PBS_translate_mpp hook and adds vntype to the sched_config resources.
@iestockdale:
The procedure documents an overlay upgrade (only one server running). The RFE does not require documentation of migration upgrades. The prior installation instructions for Cray CLE systems also do not cover this per section 3.6 of the PBS Pro Installation Guide for v13.0. I added a note to the document.
@mkaro and @smgoosen, regarding upgrades, is this the correct summary:
. This EDD is only intended to document new install and overlay ugrade steps.
. Migration upgrade to PBS 17.x is not supported on Cray.
@smgoosen, the upgrade user story needs to be clarified or a new one for upgrade filed, as the support status of the two different types of upgrade is not covered and thus not clear.
@mkaro, I used the updated instructions and here are some observations:
Clean install has two options. The first option, which uses pbsdata, does not
show the command to use to start PBS unlike the second option.
Overlay upgrade from 13.0.401
All the contents of /etc/pbs.conf on MoM node remained the same but on the
server node (sdb) only the PBS_HOME value remained the same. It seems I had to:
boot # xtopview -n 30 -e “xtunspec /etc/pbs.conf” instead of
boot # xtopview -c login -e "xtunspec /etc/pbs.conf"
in order to unspec the /etc/pbs.conf on the mom.
There were errors starting MoM when I used xtopview -c login -e “xtunspec /etc/pbs.conf”
# /etc/init.d/pbs start
Starting PBS
/etc/init.d/pbs: line 315: /opt/pbs/default/bin/qstat: No such file or directory
*** /opt/pbs/default/libexec/pbs_habitat is missing.
Since I had used crayadm as the data service user when I installed 13.0.401, I
started the upgraded PBS this way on the sdb:
sdb:~ # PBS_DATA_SERVICE_USER=crayadm /etc/init.d/pbs start
Starting PBS
PBS Home directory /var/spool/PBS needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.
*** End of /opt/pbs/libexec/pbs_habitat
Home directory /var/spool/PBS updated.
/opt/pbs/sbin/pbs_comm ready (pid=14054), Proxy Name:sdb:17001, Threads:4
PBS comm
PBS sched
Connecting to PBS dataservice…connected to PBS dataservice@sdb
Server@sdb: recov_attr_db, unknown attribute “res_released_on_susp” discarded
Using license server at 6200@licenseserver
PBS server
–> note the recov_attr_db, unknown attribute message - is it harmful or benign?
There seems to be a licensing issue.
When I installed PBS from mike0042’s branch, I could not get a license from the license server.
Then I installed 13.0.401 and got this from qstat -Bf:
pbs_license_info = 6200@licenseserver
license_count = Avail_Global:99999 Avail_Local:1 Used:0 High_Use:0 Avail_So
ckets:0 Unused_Sockets:0
pbs_version = PBSPro_13.0.401.160285
After upgrading to PBS from mike0042’s branch, I can no longer get a license:
pbs_license_info = 6200@licenseserver
license_count = Avail_Global:0 Avail_Local:0 Used:0 High_Use:0 Avail_Socket
s:0 Unused_Sockets:0
Older installations using the INSTALL script had the option of installing client commands only as well as communication only (in addition to “all” and execution only choices).
If communication only is desired I understand that one can modify the /etc/pbs.conf file to start just pbs_comm.
If client commands only is desired does one keep all the PBS_START_[daemon] = 0 in /etc/pbs.conf?
Will there be installation and upgrade instructions for failover because I think that failover is supported on Cray?