PP-289: unique job ids up to 1 trillion

varunsonkar · March 15, 2017, 9:14am

Hi All,

Following is the link to the design document for enhancing the unique job IDs to One Trillion:
https://pbspro.atlassian.net/wiki/display/PD/PP-289%3A+unique+job+ids+up+to+1+trillion

Please review the proposed design and provide the comments/feedback for same.

Regards,
Varun

mkaro · March 15, 2017, 7:13pm

Hi @varunsonkar,

So the purpose of setting the limit is to restrict field width as opposed using the maximum number a 64bit integer could actually hold? Please note that the old maximum could fit within a 32bit integer, while the new one requires a 64bit integer. This is actually a big concern for 32bit systems. Do we need to explore this in the design?

Thanks,

Mike

bhroam · March 15, 2017, 7:17pm

Hi Varun,
I usually like the idea of not hard coding values into PBS and providing a knob. We usually get ourselves into trouble when just hard coding a value. I’m not sure this is one of those cases. What is the value provided to the admin to turning this knob? Why wouldn’t we just set it to something huge (like maybe the max of an unsigned long)? Is there a case when they’ll want to turn this down? Why not always have unique job ids if we can?

Bhroam

vinodchitrali · March 16, 2017, 10:32am

I think @bhroam is right.
Also,
Exposing this attribute leads more in-consistence in job-id … Like “What if the value is set lower than current job ID?”

varunsonkar · March 17, 2017, 8:02am

Hi @bhroam,
Thanks for the suggestion/feedback.
We are now, not allowing to set/unset the jobs’s maximum Id. This would be the same behavior as earlier. Only we are increasing the range to 1 trillion from 10 million.
We believe this will be knob which is high enough for sites. As simple calculation suggests that sites submitting around 10k jobs per sec continuously will take over 3 years to consume 1 trillion job Ids.
We have now edited the design document for same. Please review it.
Regards,
Varun

varunsonkar · March 17, 2017, 8:06am

Hi @mkaro,
Thanks for your reply.
We could not fully understand your point about “This is actually a big concern for 32bit systems” .
Could you please elaborate a little bit more on the possible use case you are referring to.
Regards,
Varun

mkaro · March 17, 2017, 4:46pm

Are there cases in the PBS Pro source code where the sequence number of a job is stored within an int or a long? If so, any sequence number larger than 2147483647 (2^31 - 1) could cause a problem for 32bit processors or systems using 32bit compilers. We’ll need to make sure that we don’t overflow in these cases. As a reference:
https://en.wikibooks.org/wiki/C_Programming/C_Reference/limits.h

jon · March 23, 2017, 3:54am

Is this only a problem on a 32bit system or compiler if the system grows to more that 2B+ jobs? Also, do we want to continue to support 32bit systems?

subhasisb · March 23, 2017, 4:42am

Most of the places inside PBS treat the jobid as a char array, eg the database column is a TEXT field, the ji_jobid member in the job_qs structure is a char array as well, and so are the database queries around it.

CREATE TABLE pbs.job (
ji_jobid TEXT NOT NULL,
ji_sv_name TEXT NOT NULL,

char ji_jobid[PBS_MAXSVRJOBID+1]; /* job identifier */ --> ji_qs

struct pbs_db_job_info {
char ji_jobid[PBS_MAXSVRJOBID + 1]; /* job identifier /
char ji_sv_name[PBS_MAXSERVERNAME + 1]; / server id */

So the (possibly only) part we have to be careful about is how (or what type of variable we use) to increment the id to get the next one in sequence. One idea was to simply generate it from a sequence in the database which can handle huge values anyway (even on 32 bit systems).

Also as Jon pointed out, we probably don’t need to support 32 bit systems anyway.

mkaro · March 23, 2017, 5:00am

In src/incude/server.h the variable sv_jobidnumber is defined as an int. If this is a 32 bit compiler (which is likely for Windows) this value could overflow. We would be safe to use a “long long” or “long long int” to represent this number even for 32 bit hardware and compilers.

varunsonkar · March 23, 2017, 5:00am

Hi @subhasisb,
Thanks for the comment.
I think “PBS_MAXSVRJOBID” and the variables declared using it are not a problem. As you see following is how “PBS_MAXSVRJOBID” has been defined:
#define PBS_MAXSVRJOBID (PBS_MAXSEQNUM - 1 + PBS_MAXSERVERNAME + PBS_MAXPORTNUM + 2) /* server job id size, -1 to keep same length when made SEQ 7 */
Which has worked fine for 32 bit systems.
Having said that there are some places where variables like “sv_jobidnumber” have been declared as “int” also in database as well which can cause issues on 32 bit.
We are analyzing the code with the possible affects and probable solution.
Regards,
Varun

varunsonkar · March 23, 2017, 5:05am

Thanks @mkaro,
Yes we are also moving ahead with our analysis with the same approach (long long).
But we want to be thorough with that we are not breaking somethings like if sizeof(*pointer) is done or comparison is done.
Up till now we have not found anything alarming when we use “long long”.
Once we are done with analysis we will update the same.
Regards,
Varun

bhroam · March 23, 2017, 9:43pm

Just a quick note: the scheduler queries the job id and shoves it into an int. Look at src/scheduler/data_types.h in the structure job_info

varunsonkar · March 24, 2017, 4:31am

Hi @bhroam,
Thanks for the input. We will analyze and handle it accordingly.

jon · March 30, 2017, 4:07am

After thinking about this more, I don’t feel that we need to support 32 bit processor. My vote is we move past the 32 bit era to 64 bit. However, if the community feels like we have to support 32 bit processors/compilers then it makes since to set the max to 2147483647 (2^31 - 1). The other thought is that if it significantly reduces risk and development time to just increase the max job id to 2147483647 then maybe we should consider reducing the max job id to the max 32 bit int value.

ashwathraop · March 30, 2017, 4:48am

Changing PBS_MAXSEQNUM from 7 to 12 might affect how output/error files formed as their name structure is “{JOB_NAME}.{JOBID}o” or “{JOB_NAME}.{JOBID}e” . Since system limit for max_file_name_length is 255 we may have to manage the job name length to accommodate this new job id length.

varunsonkar · March 30, 2017, 5:39am

Hi @ashwathraop,
Thanks for the input. We will take care of this as well.

mkaro · March 30, 2017, 3:59pm

The real issue here is identifying the underlying C data type we use for representing the sequence number. I believe that long long int or long long unsigned is the correct choice because they are the same size (64 bits) regardless of the underlying architecture or the compiler being used.

bhroam · March 30, 2017, 9:51pm

I agree with Jon that we should leave 32bit architectures behind. It has been quite a long time since chips/OSes became 64bit. While Mike’s point is well taken that we could use a long long, I suggest we use an unsigned long. I think long longs are 128bits these days. That’s a bit overkill. If we use an unsigned long and a long is 32bits, it’ll just cycle back around to 0 earlier than we intended all on its own.

Bhroam

billnitzberg · March 30, 2017, 10:45pm

One question and one suggestion:

Question: Should we consider either (a) no upper bound, and/or (b) extending Job IDs to alphanumerics?

Suggestion: If the intention is to have 64 bits, it is best to be explicit:

#include <stdint.h>

int64_t job_id;

And, I believe the above is part of the C99 standard, so it should work most everywhere. (Of course, the C99 standard also mandates that long long int is at least 64 bits, so if it’s just a minimum we want, that would also work.)

Topic		Replies	Views
Redhat 9 and PBS server reboot causing "next job id" to increase Users/Site Administrators	7	38	May 28, 2025
Jobid namespace resolution for multi-server Developers	36	2075	January 26, 2021
Any way to update the "last used" jobid? Users/Site Administrators	1	392	July 28, 2021
Theoretical PBS Scheduler/Server Limits Users/Site Administrators	3	623	January 19, 2022
PP-759: possibility to disable job-wide limit enforcement for exclusive jobs Developers	11	2030	February 26, 2021

PP-289: unique job ids up to 1 trillion

Related topics