Hi Adarsh, thanks for your reply.
Below is the error when I try to submit job with those.
qsub: “-lresource=” cannot be used with “select” or “place”, resource is: mem
So I removed the memory parameter from below script and submitted it again
My script is -
#!/bin/bash
staring with # is a comment and # PBS is a pbs parameter
#Give a name to your Job
#PBS -N mpiruns
#Give the output file name
#PBS -o mpiruns.o.txt
#Give the error file name
#PBS -e mpiruns.e.txt
#Give the Queue name.
#PBS -q all
#Give the nodes. This is default. it will submit accros 10 nodes as per cores asked. Per node is 16 core.
#PBS -l select=2:ncpus=1
#PBS -l place=scatter
#Load Default Enviromnet
#PBS -V
#Mention your email to get notified once the job is done.
#PBS -m abe
#PBS -M myemai@abc.com
#Memory needed for your code
#PBS -l mem=1024mb
Specify time required for your runs. Below example is 1 hours of CPU Time. Please don’t block job for over 24 hours.
#PBS -l cput=00:03:00
Give your run as below
mpirun $HOME/mpi
Tracejob output -
Considering job to run
05/20/2022 02:07:43 S Job Queued at request of vinay@pb0, owner = vinay@pb0, job name = mpiruns, queue = all
05/20/2022 02:07:43 S Job Run at request of Scheduler@pb0 on exec_vnode (pb1:ncpus=1)+(pb2:ncpus=1)
05/20/2022 02:07:43 L Job run
But it still ran only on pb1 (that is only one node)
I am running a basic mpi script which just prints the hostname
MPI code -
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv)
{
int rank, size, h_len;
char hostname[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
// get rank of this proces
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// get total process number
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(hostname, &h_len);
printf("Start! rank:%d size: %d at %s\n", rank, size,hostname);
//do something
printf("Done! rank:%d size: %d at %s\n", rank, size,hostname);
MPI_Finalize();
return 0;
}
Output what I am getting is -
Start! rank:0 size: 8 at pb1
Done! rank:0 size: 8 at pb1
Start! rank:1 size: 8 at pb1
Done! rank:1 size: 8 at pb1
Start! rank:2 size: 8 at pb1
Done! rank:2 size: 8 at pb1
Start! rank:3 size: 8 at pb1
Done! rank:3 size: 8 at pb1
Start! rank:4 size: 8 at pb1
Done! rank:4 size: 8 at pb1
Start! rank:5 size: 8 at pb1
Done! rank:5 size: 8 at pb1
Start! rank:6 size: 8 at pb1
Done! rank:6 size: 8 at pb1
Start! rank:7 size: 8 at pb1
Done! rank:7 size: 8 at pb1
It should also show pb2 , Also ncpus=1 so it should take only one core. but it runs on 8 as you can see the output.
Is there any configuration on the headnode I am missing.?
pbsnodes -a output –
pb1
Mom = pb1
ntype = PBS
state = free
pcpus = 8
resources_available.arch = linux
resources_available.host = pb1
resources_available.mem = 3970436kb
resources_available.ncpus = 8
resources_available.vnode = pb1
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Wed May 4 03:58:53 2022
last_used_time = Fri May 20 02:07:45 2022
pb2
Mom = pb2
ntype = PBS
state = free
pcpus = 8
resources_available.arch = linux
resources_available.host = pb2
resources_available.mem = 3970436kb
resources_available.ncpus = 8
resources_available.vnode = pb2
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Wed May 4 03:58:53 2022
last_used_time = Fri May 20 02:07:45 2022