Hi all,
After running the following job script, no output file or error file created.
#!/bin/bash
### Thejob name
#PBS -N moose_test
### The number of node and processors per node
#PBS -l nodes=node03:ppn=56
### The maximum time for job running
#PBS -l walltime=48:00:00
### The standard output and error
#PBS -j oe
### The queue for job running
#conda activate moose
#
cd $PBS_O_WORKDIR
NSLOTS=`cat ${PBS_NODEFILE} | wc -l`
echo "This jobs is "$PBS_JOBID@$PBS_QUEUE
uniq -c $PBS_NODEFILE |awk '{print $2":"$1}'
#
date
time mpiexec -n 56 ./*opt -i ./test/tests/kernels/simple_diffusion/simple_diffusion.i
date
exit 0
Hi, here lists the info of the command you mentioned:
qstat -a
node01:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
867.node01 DengChaoQun batch moose_test 29802 1 56 -- 48:00:00 C --
qstat -xf
<?xml version="1.0"?>
<Data><Job><Job_Id>867.node01</Job_Id><Job_Name>moose_test</Job_Name><Job_Owner>DengChaoQun@node01</Job_Owner><resources_used><cput>00:02:32</cput><vmem>0kb</vmem><walltime>00:00:08</walltime><mem>0kb</mem><energy_used>0</energy_used></resources_used><job_state>C</job_state><queue>batch</queue><server>node01</server><Checkpoint>u</Checkpoint><ctime>1707221317</ctime><Error_Path>node01:/home/DengChaoQun/projects/bees2024/moose_test.e867</Error_Path><exec_host>node03/0-55</exec_host><Hold_Types>n</Hold_Types><Join_Path>oe</Join_Path><Keep_Files>n</Keep_Files><Mail_Points>a</Mail_Points><mtime>1707221329</mtime><Output_Path>node01:/home/DengChaoQun/projects/bees2024/moose_test.o867</Output_Path><Priority>0</Priority><qtime>1707221317</qtime><Rerunable>True</Rerunable><Resource_List><nodes>node03:ppn=56</nodes><walltime>48:00:00</walltime><nodect>1</nodect></Resource_List><session_id>29802</session_id><Variable_List>PBS_O_QUEUE=batch,PBS_O_HOME=/home/DengChaoQun,PBS_O_LOGNAME=DengChaoQun,PBS_O_PATH=/home/DengChaoQun/mpich-4.0.2/install/bin:/home/DengChaoQun/gcc-13.1.0/gcc-install/bin:/opt/torque/bin:/opt/torque/sbin:/home/DengChaoQun/.vscode-server/bin/8b3775030ed1a69b13e4f4c628c612102e30a681/bin/remote-cli:/home/DengChaoQun/mpich-4.0.2/install/bin:/home/DengChaoQun/gcc-13.1.0/gcc-install/bin:/opt/rh/devtoolset-9/root/usr/bin:/opt/torque/bin:/opt/torque/sbin/opt/intel/oneapi/vtune/2022.2.0/bin64:/opt/intel/oneapi/vpl/2022.1.0/bin:/opt/intel/oneapi/mpi/2021.6.0/libfabric/bin:/opt/intel/oneapi/mpi/2021.6.0/bin:/opt/intel/oneapi/mkl/2022.1.0/bin/intel64:/opt/intel/oneapi/itac/2021.6.0/bin:/opt/intel/oneapi/inspector/2022.1.0/bin64:/opt/intel/oneapi/dpcpp-ct/2022.1.0/bin:/opt/intel/oneapi/dev-utilities/2021.6.0/bin:/opt/intel/oneapi/debugger/2021.6.0/gdb/intel64/bin:/opt/intel/oneapi/compiler/2022.1.0/linux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2022.1.0/linux/bin/intel64:/opt/intel/oneapi/compiler/2022.1.0/linux/bin:/opt/intel/oneapi/clck/2021.6.0/bin/intel64:/opt/intel/oneapi/advisor/2022.1.0/bin64:/opt/torque/bin:/opt/torque/sbin:/home/DengChaoQun/mpich-4.0.2/install/bin:/home/DengChaoQun/gcc-13.1.0/gcc-install/bin:/home/DengChaoQun/miniforge/envs/moose/bin:/home/DengChaoQun/miniforge/condabin:/opt/torque/bin:/opt/torque/sbin:/usr/lib64/qt-3.3/bin:/home/DengChaoQun/perl5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/DengChaoQun/.local/bin:/home/DengChaoQun/bin:/home/DengChaoQun/miniforge/envs/moose/wasp/bin,PBS_O_MAIL=/var/spool/mail/DengChaoQun,PBS_O_SHELL=/bin/bash,PBS_O_LANG=en_US.UTF-8,PBS_O_WORKDIR=/home/DengChaoQun/projects/bees2024,PBS_O_HOST=node01,PBS_O_SERVER=node01</Variable_List><euser>DengChaoQun</euser><egroup>DengChaoQun</egroup><queue_type>E</queue_type><sched_hint>Unable to copy files back - please see the mother superior's log for exact details.</sched_hint><comment>Job started on Tue Feb 06 at 20:08</comment><etime>1707221317</etime><exit_status>0</exit_status><submit_args>run_moose.pbs</submit_args><start_time>1707221317</start_time><start_count>1</start_count><fault_tolerant>False</fault_tolerant><comp_time>1707221329</comp_time><job_radix>0</job_radix><total_runtime>12.143389</total_runtime><submit_host>node01</submit_host><init_work_dir>/home/DengChaoQun/projects/bees2024</init_work_dir><request_version>1</request_version></Job></Data>
tracejob 867
/var/spool/torque/server_priv/accounting/20240206: Permission denied
/var/spool/torque/mom_logs/20240206: No matching job records located
Job: 867.node01
02/06/2024 20:08:37.607 S enqueuing into batch, state 1 hop 1
02/06/2024 20:08:37.788 S Job Modified at request of root@node01
02/06/2024 20:08:37.821 L Job Run
02/06/2024 20:08:37.788 S Job Run at request of root@node01
02/06/2024 20:08:37.821 S Not sending email: User does not want mail of this type.
02/06/2024 20:08:49.931 S Not sending email: User does not want mail of this type.
02/06/2024 20:08:49.932 S Exit_status=0 resources_used.cput=152 resources_used.vmem=0kb resources_used.walltime=00:00:08 resources_used.mem=0kb resources_used.energy_used=0
it contains the output info of my running case, but nothing useful to solve my problem
Maybe I make a mistake here and caused a misunderstanding, on node03, when I run touch test_file.txt instead of touch $PBS_O_WORKDIR/test_file.txt , it can create a file named test_file.txt .
I create a directory for_out in /home/username/projects/bees2024/, and run qsub -koe -o ./for_out/out.OU -e ./for_out/error.ER run_moose.pbs, but still no output files
The job id is 873, but this time I can’t find 873 related files in the directory /var/spool/pbs/undelivered/
Here is the info in /var/spool/torque/mom_logs/ :
02/07/2024 00:12:10.903;01; pbs_mom.3769;Job;TMomFinalizeJob3;job 873.node01 started, pid = 40049
02/07/2024 00:12:18.662;128; pbs_mom.3769;Job;873.node01;scan_for_terminated: job 873.node01 task 1 terminated, sid=40049
02/07/2024 00:12:18.662;08; pbs_mom.3769;Job;873.node01;job was terminated
02/07/2024 00:12:18.662;128; pbs_mom.3769;Svr;preobit_preparation;top
02/07/2024 00:12:27.052;128; pbs_mom.3769;Job;873.node01;obit sent to server
02/07/2024 00:12:27.055;128; pbs_mom.3825;Job;873.node01;removed job script
02/07/2024 00:14:35.506;02; pbs_mom.3769;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
Could you try and use the full path rather than relative?
i.e. -o /home/DengChaoQun/projects/bees2022/out.OU -e /home/DengChaoQun/projects/bees2022/err.ER