Qsub from different location

Running qsub inside the job folder (mounted drive) or on the root gives different results:

1- [saf112092@ip-0A582E84 test14]$ qsub FeaJob.py (inside job folder)
2- [saf112092@ip-0A582E84 /]$ qsub /mnt/data/Public/pbs2/test14/FeaJob.py (outside job folder)

1 Works fine
2 Stagein, Running works fine but Stageout doesn’t work.

In case 2, MOM LOG shows:
12/18/2024 14:26:55;0001;pbs_mom;Fil;copy_file;Job 26.ip-0A582E84: sys_copy failed, return value=1
12/18/2024 14:26:55;0004;pbs_mom;Fil;26.ip-0A582E84.OU;Unable to copy file 26.ip-0A582E84.OU to ip-0a582e84://FeaJob.py.o26
12/18/2024 14:26:55;0004;pbs_mom;Fil;26.ip-0A582E84.OU;ip-0A582E84: Connection refused
12/18/2024 14:26:55;0001;pbs_mom;Fil;stage_file;Job 26.ip-0A582E84: no wildcards:remote stageout failed for saf112092 from 26.ip-0A582E84.OU to ip-0a582e84://FeaJob.py.o26
12/18/2024 14:26:55;0100;pbs_mom;Job;26.ip-0A582E84;Job files not copied:---->>>>
12/18/2024 14:26:55;0100;pbs_mom;Job;26.ip-0A582E84;Unable to copy file 26.ip-0A582E84.OU to ip-0a582e84://FeaJob.py.o26

12/18/2024 14:26:55;0100;pbs_mom;Job;26.ip-0A582E84;>>> error from copy

12/18/2024 14:26:55;0100;pbs_mom;Job;26.ip-0A582E84;ip-0A582E84: Connection refused

12/18/2024 14:26:55;0100;pbs_mom;Job;26.ip-0A582E84;>>> end error output

Regards,

Please increase the mom log level and check the detailed mom log.
From the above logs it seems the firewall or ports blocked or network issues.
It seems staging out of stdout /stderr is failing to the location where the qsub was initiated.
Also, if you could share the snippet of your stagein and stageout attribute might help

I rerun the 2 cases:
1- [saf112092@ip-0A582E84 test14]$ qsub FeaJob.py (job 27)
2- [saf112092@ip-0A582E84 /]$ qsub /mnt/data/Public/pbs2/test14/FeaJob.py (job 28)

In case 2 it looks like it is trying to copy:
/opt/pbs/sbin/pbs_rcp -rp 28.ip-0A582E84.OU
whereas it is supposed to use pbs_cp

In case 1 pbs_cp works fine

12/19/2024 12:00:05;0100;pbs_mom;Req;;Type 54 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:00:05;0080;pbs_mom;Job;27.ip-0A582E84;copy file request received
12/19/2024 12:00:05;0008;pbs_mom;Job;27.ip-0A582E84;created the job directory /home/saf112092/pbs.27.ip-0A582E84.x8z
12/19/2024 12:00:06;0100;pbs_mom;Job;27.ip-0A582E84;Staged 1/1 items in over 0:00:01
12/19/2024 12:00:07;0100;pbs_mom;Req;;Type 1 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:00:07;0100;pbs_mom;Req;;Type 3 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:00:07;0100;pbs_mom;Req;;Type 5 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:00:07;0008;pbs_mom;Job;27.ip-0A582E84;created the job directory /home/saf112092/pbs.27.ip-0A582E84.x8z
12/19/2024 12:00:07;0008;pbs_mom;Job;27.ip-0A582E84;Started, pid = 12666
12/19/2024 12:05:38;0080;pbs_mom;Job;27.ip-0A582E84;task 00000001 terminated
12/19/2024 12:05:38;0008;pbs_mom;Job;27.ip-0A582E84;Terminated
12/19/2024 12:05:38;0100;pbs_mom;Job;27.ip-0A582E84;task 00000001 cput=00:00:03
12/19/2024 12:05:38;0008;pbs_mom;Job;27.ip-0A582E84;kill_job
12/19/2024 12:05:38;0100;pbs_mom;Job;27.ip-0A582E84;ip-0A582E86 cput=00:00:03 mem=247480kb
12/19/2024 12:05:38;0100;pbs_mom;Job;27.ip-0A582E84;Obit sent
12/19/2024 12:05:39;0100;pbs_mom;Req;;Type 54 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:05:39;0080;pbs_mom;Job;27.ip-0A582E84;copy file request received
12/19/2024 12:05:39;0100;pbs_mom;Job;27.ip-0A582E84;Staged 3/3 items out over 0:00:00
12/19/2024 12:05:39;0008;pbs_mom;Job;27.ip-0A582E84;no active tasks
12/19/2024 12:05:39;0100;pbs_mom;Req;;Type 55 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:05:39;0080;pbs_mom;Job;27.ip-0A582E84;delete file request received
12/19/2024 12:05:39;0008;pbs_mom;Job;27.ip-0A582E84;no active tasks
12/19/2024 12:05:39;0100;pbs_mom;Req;;Type 6 request received from root@10.88.46.132:15001, sock=0
12/19/2024 12:05:39;0080;pbs_mom;Job;27.ip-0A582E84;delete job request received
12/19/2024 12:05:39;0008;pbs_mom;Job;27.ip-0A582E84;kill_job
12/19/2024 12:08:07;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 10.88.46.132:15001 on stream 0
12/19/2024 12:08:07;0002;pbs_mom;Svr;im_eof;Server closed connection.
12/19/2024 12:08:07;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at ip-0A582E84:15001, stream:1
12/19/2024 12:08:07;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 10.88.46.132:15001 on stream 1
12/19/2024 12:08:07;0002;pbs_mom;Svr;im_eof;Server closed connection.
12/19/2024 12:08:08;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Connection to pbs_comm ip-0A582E84:17001 down
12/19/2024 12:08:08;0001;pbs_mom;Svr;net_down_handler;net down handler called
12/19/2024 12:08:10;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Registering address 10.88.46.134:15003 to pbs_comm ip-0A582E84:17001
12/19/2024 12:08:10;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Connected to pbs_comm ip-0A582E84:17001
12/19/2024 12:08:10;0001;pbs_mom;Svr;net_restore_handler;net restore handler called
12/19/2024 12:08:14;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at ip-0A582E84:15001, stream:2
12/19/2024 12:08:14;0002;pbs_mom;Svr;pbs_mom;ReplyHello from server at 10.88.46.132:15001
12/19/2024 12:09:18;0100;pbs_mom;Req;;Type 54 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:09:18;0080;pbs_mom;Job;28.ip-0A582E84;copy file request received
12/19/2024 12:09:18;0008;pbs_mom;Job;28.ip-0A582E84;created the job directory /home/saf112092/pbs.28.ip-0A582E84.x8z
12/19/2024 12:09:18;0100;pbs_mom;Job;28.ip-0A582E84;Staged 1/1 items in over 0:00:00
12/19/2024 12:09:19;0100;pbs_mom;Req;;Type 1 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:09:19;0100;pbs_mom;Req;;Type 3 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:09:19;0100;pbs_mom;Req;;Type 5 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:09:19;0008;pbs_mom;Job;28.ip-0A582E84;created the job directory /home/saf112092/pbs.28.ip-0A582E84.x8z
12/19/2024 12:09:19;0008;pbs_mom;Job;28.ip-0A582E84;Started, pid = 12984
12/19/2024 12:12:32;0080;pbs_mom;Job;28.ip-0A582E84;task 00000001 terminated
12/19/2024 12:12:32;0008;pbs_mom;Job;28.ip-0A582E84;Terminated
12/19/2024 12:12:32;0100;pbs_mom;Job;28.ip-0A582E84;task 00000001 cput=00:00:01
12/19/2024 12:12:32;0008;pbs_mom;Job;28.ip-0A582E84;kill_job
12/19/2024 12:12:32;0100;pbs_mom;Job;28.ip-0A582E84;ip-0A582E86 cput=00:00:01 mem=81772kb
12/19/2024 12:12:32;0100;pbs_mom;Job;28.ip-0A582E84;Obit sent
12/19/2024 12:12:33;0100;pbs_mom;Req;;Type 54 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:12:33;0080;pbs_mom;Job;28.ip-0A582E84;copy file request received
12/19/2024 12:13:04;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 28.ip-0A582E84.OU saf112092@ip-0a582e84://FeaJob.py.o28 status=1, try=1
12/19/2024 12:13:35;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 28.ip-0A582E84.OU saf112092@ip-0a582e84://FeaJob.py.o28 status=1, try=2
12/19/2024 12:14:17;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 28.ip-0A582E84.OU saf112092@ip-0a582e84://FeaJob.py.o28 status=1, try=3
12/19/2024 12:14:48;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 28.ip-0A582E84.OU saf112092@ip-0a582e84://FeaJob.py.o28 status=1, try=4
12/19/2024 12:15:09;0001;pbs_mom;Fil;copy_file;Job 28.ip-0A582E84: sys_copy failed, return value=1
12/19/2024 12:15:09;0004;pbs_mom;Fil;28.ip-0A582E84.OU;Unable to copy file 28.ip-0A582E84.OU to ip-0a582e84://FeaJob.py.o28
12/19/2024 12:15:09;0004;pbs_mom;Fil;28.ip-0A582E84.OU;ip-0A582E84: Connection refused
12/19/2024 12:15:09;0001;pbs_mom;Fil;stage_file;Job 28.ip-0A582E84: no wildcards:remote stageout failed for saf112092 from 28.ip-0A582E84.OU to ip-0a582e84://FeaJob.py.o28
12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;Job files not copied:---->>>>
12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;Unable to copy file 28.ip-0A582E84.OU to ip-0a582e84://FeaJob.py.o28

12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;>>> error from copy

12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;ip-0A582E84: Connection refused

12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;>>> end error output

12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;---->>>>
12/19/2024 12:15:09;0100;pbs_mom;Job;28.ip-0A582E84;Staged 0/3 items out over 0:02:36
12/19/2024 12:15:09;0008;pbs_mom;Job;28.ip-0A582E84;no active tasks
12/19/2024 12:15:09;0080;pbs_mom;Req;req_reject;Reject reply code=15051, aux=0, type=54, from root@10.88.46.132:15001
12/19/2024 12:15:09;0100;pbs_mom;Req;;Type 55 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:15:09;0080;pbs_mom;Job;28.ip-0A582E84;delete file request received
12/19/2024 12:15:09;0008;pbs_mom;Job;28.ip-0A582E84;no active tasks
12/19/2024 12:15:09;0100;pbs_mom;Req;;Type 6 request received from root@10.88.46.132:15001, sock=2
12/19/2024 12:15:09;0080;pbs_mom;Job;28.ip-0A582E84;delete job request received
12/19/2024 12:15:09;0008;pbs_mom;Job;28.ip-0A582E84;kill_job

PBS Script:
#!/shared/apps/Python3.12/bin/python3.12

declare a private sandbox.

all files will be copied to private directory in C:\Users\safxxxxx\Documents\PBS Pro\pbs.####.awesv0061.x8z

#PBS -W sandbox=private

Specify which directory to copy for input and where to get the output

#PBS -W stagein=‘.@10.88.46.132:/mnt/data/Public/pbs2/test14/
#PBS -W stageout='
@10.88.46.132:/mnt/data/Public/pbs2/test14/’

request Abaqus ressources

#PBS -l select=1:ncpus=1:mem=2gb:vm_size=Standard_D2ds_v5
#PBS -l lic_abaqus_q=5

Specify user Account

#PBS -A saf112092

Specify work queue

#PBS -q workq

Tracejob 28:
Job: 28.ip-0A582E84

12/19/2024 12:09:18 L Considering job to run
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_max: entered for workq
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_max: exiting, ret 0 [max_queued limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_queued: entered for workq
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_queued: exiting, ret 0 [queued_jobs_threshold limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_max: entered for server
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_max: exiting, ret 0 [max_queued limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_queued: entered for server
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_ct_limit_queued: exiting, ret 0 [queued_jobs_threshold limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_max: entered for workq, alt_res (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_max: exiting, ret 0 [max_queued_res limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_queued: entered for workq, alt_res (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for
workq]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_max: entered for server, alt_res (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_max: exiting, ret 0 [max_queued_res limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_queued: entered for server, alt_res (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: check_entity_resc_limit_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for
server]
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: entered, INCR on server ip-0A582E84, op_flag f, alt_res_ptr (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_ct_sum_max: exiting, ret 0 [max_queued limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_ct_sum_queued: exiting, ret 0 [queued_jobs_threshold limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_max: entered [alt_res (nil)]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_max: exiting, ret 0 [max_queued_res limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: entered [alt_res (nil)]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: entered, INCR on queue workq, op_flag f, alt_res_ptr (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_ct_sum_max: exiting, ret 0 [max_queued limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_ct_sum_queued: exiting, ret 0 [queued_jobs_threshold limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_max: entered [alt_res (nil)]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_max: exiting, ret 0 [max_queued_res limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: entered [alt_res (nil)]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0
12/19/2024 12:09:18 S Job Queued at request of saf112092@ip-0a582e84, owner = saf112092@ip-0a582e84, job name = FeaJob.py, queue =
workq
12/19/2024 12:09:18 S Job Run at request of Scheduler@ip-0a582e84 on exec_vnode (ip-0A582E86:ncpus=1:mem=2097152kb)
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: entered, DECR on server ip-0A582E84, op_flag 7, alt_res_ptr (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_ct_sum_queued: exiting, ret 0 [queued_jobs_threshold limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: entered [alt_res (nil)]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for server]
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: entered, DECR on queue workq, op_flag 7, alt_res_ptr (nil)
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_ct_sum_queued: exiting, ret 0 [queued_jobs_threshold limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: entered [alt_res (nil)]
12/19/2024 12:09:18 S ET_LIM_DBG: set_entity_resc_sum_queued: exiting, ret 0 [queued_jobs_threshold_res limit not set for workq]
12/19/2024 12:09:18 S ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0
12/19/2024 12:09:18 L Job run
12/19/2024 12:09:18 S Updated job state to 81 and substate to 11
12/19/2024 12:09:18 S enqueuing into workq, state Q hop 1
12/19/2024 12:09:18 S Updated job state to 82 and substate to 15
12/19/2024 12:09:19 S Updated job state to 82 and substate to 41
12/19/2024 12:09:21 S Received session ID for job: 12984
12/19/2024 12:09:21 S Updated job state to 82 and substate to 42
12/19/2024 12:09:31 S Received the same SID as before: 12984
12/19/2024 12:09:47 S Received the same SID as before: 12984
12/19/2024 12:10:10 S Received the same SID as before: 12984
12/19/2024 12:10:38 S Received the same SID as before: 12984
12/19/2024 12:11:12 S Received the same SID as before: 12984
12/19/2024 12:11:52 S Received the same SID as before: 12984
12/19/2024 12:12:33 S Obit received momhop:1 serverhop:1 state:R substate:42
12/19/2024 12:12:33 S Updated job state to 69 and substate to 50
12/19/2024 12:12:33 S Updated job state to 69 and substate to 51
12/19/2024 12:15:09 S Post job file processing error
12/19/2024 12:15:09 S Updated job state to 69 and substate to 52
12/19/2024 12:15:09 S Updated job state to 69 and substate to 53
12/19/2024 12:15:09 S Exit_status=0 resources_used.cpupercent=2 resources_used.cput=00:00:01 resources_used.mem=81772kb
resources_used.ncpus=1 resources_used.vmem=272868kb resources_used.walltime=00:03:13
12/19/2024 12:15:09 S ET_LIM_DBG: account_entity_limit_usages: entered, DECR on server ip-0A582E84, op_flag b, alt_res_ptr (nil)
12/19/2024 12:15:09 S ET_LIM_DBG: set_entity_ct_sum_max: exiting, ret 0 [max_queued limit not set for server]
12/19/2024 12:15:09 S ET_LIM_DBG: set_entity_resc_sum_max: entered [alt_res (nil)]
12/19/2024 12:15:09 S ET_LIM_DBG: set_entity_resc_sum_max: exiting, ret 0 [max_queued_res limit not set for server]
12/19/2024 12:15:09 S ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0
12/19/2024 12:15:09 S ET_LIM_DBG: account_entity_limit_usages: entered, DECR on queue workq, op_flag b, alt_res_ptr (nil)
12/19/2024 12:15:09 S ET_LIM_DBG: set_entity_ct_sum_max: exiting, ret 0 [max_queued limit not set for workq]
12/19/2024 12:15:09 S ET_LIM_DBG: set_entity_resc_sum_max: entered [alt_res (nil)]
12/19/2024 12:15:09 S ET_LIM_DBG: set_entity_resc_sum_max: exiting, ret 0 [max_queued_res limit not set for workq]
12/19/2024 12:15:09 S ET_LIM_DBG: account_entity_limit_usages: exiting, ret_error 0

Thank you for sharing the below details and information.
Please update the /etc/pbs.conf (on server and compute nodes as below in the same order)

PBS_RCP=/bin/false
PBS_SCP=/bin/scp
PBS_RSHCOMMAND=/bin/ssh

and restart the pbs services and try again.

If you would lick to use “cp” then you would need to used
$usecp attribute in the $PBS_HOME/mom_priv/config.

I don’t want to use RCP, SCP or SSH. I configured pbs_cp as follow:

/etc/pbs.conf:
PBS_SERVER=ip-0A582E84
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited

PBS_SCP=/bin/scp

PBS_CP=/usr/bin/pbs_cp

/usr/bin/pbs_cp:
#!/bin/sh
/usr/bin/cp -r $2 $3

/var/spool/pbs/mom_priv/config:
$restrict_user_maxsysid 999
$usecp *:/mnt/data/ /mnt/data/

I found a way around. Since I am using a python script to automates the pbs requests with subprocess, specifying the cwd makes it work:

request = ‘qsub /mnt/data/Public/pbs2/test14/FeaJob.py’
path = Path(request.split(’ ‘)[1])
cmd = request.split(’ ')[0] + ’ ’ + path.name
proc = subprocess.run(cmd,
cwd=path.parent,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=True,
shell=True)

This manner makes it to be always case 1.

What would it change if changing pbs.conf in my case?
PBS_RCP=/bin/false
PBS_SCP=/bin/scp
PBS_RSHCOMMAND=/bin/ssh

Bertrand.

1 Like

Please use the lines in this order: we would like to avoid trying pbs_rcp

PBS_RCP=/bin/false
PBS_CP=/usr/bin/pbs_cp
PBS_RSHCOMMAND=/bin/ssh

Thanks, I did it.
It is not working either:
12/19/2024 17:33:57;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 33.ip-0A582E84.OU saf112092@ip-0a582e84:/opt/job_serv/FeaJob.py.o33 status=1, try=1
12/19/2024 17:34:28;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 33.ip-0A582E84.OU saf112092@ip-0a582e84:/opt/job_serv/FeaJob.py.o33 status=1, try=2
12/19/2024 17:35:10;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 33.ip-0A582E84.OU saf112092@ip-0a582e84:/opt/job_serv/FeaJob.py.o33 status=1, try=3
12/19/2024 17:35:41;0080;pbs_mom;Fil;sys_copy;command: /opt/pbs/sbin/pbs_rcp -rp 33.ip-0A582E84.OU saf112092@ip-0a582e84:/opt/job_serv/FeaJob.py.o33 status=1, try=4

I looks like the stageout is pointing to the wrong folder. It is pointing to the folder where I executed the job:
[saf112092@ip-0A582E84 job_serv]$ qsub /mnt/data/Public/pbs2/test14/FeaJob.py

However the pbs script states:
#!/shared/apps/Python3.12/bin/python3.12
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/mnt/data/Public/pbs2/test14/
#PBS -W stageout='
@10.88.46.132:/mnt/data/Public/pbs2/test14/’

Did you change the /etc/pbs.conf on the Compute Node as well and restarted the pbs services.?

#PBS -S /shared/apps/Python3.12/bin/python3.12

Please could you check whether you can use file names instead of wild characters for stagein

#PBS -W stagein=‘file1.txt@10.88.46.132:/mnt/data/Public/pbs2/test14/file1.txt,inputfile.txt@10.88.46.132:/mnt/data/Public/pbs2/test14/inputfile.txt’