Memory usage across nodes

I am a new user to PBS

I have a 12 node cluster with 376 GB of memory on each node but I want to run a parallel job that requires 1TB of memory. So is there a way that PBS can give a argument such that I can Use the memory of other nodes parallely and run my job of 1TB.

Please try this
qsub -l select=3:ncpus=16:mem=350gb:mpiprocs=16 -l place=scatter -- /bin/sleep 1000

Thanks, Adarsh for such a quick reply.

I have tried qsub -l select=3:ncpus=16:mem=350gb:mpiprocs=16 -l place=scatter – /bin/sleep 1000
the job goes in queue state and then gets terminated stating insufficient resources.
I want to run a job that requires 1TB but our each node has 376 GB of memory so I was want a way such that such I can run a parallel job using the memory from different nodes.

As we have 12 nodes with 376 GB of memory each so could you please give us a solution to it

Thanks in advance

Please note : if the requested resources to run the job are not available, then the job would be in the queued state until the resources are available, they would not be deleted. if the job gets deleted/terminated, then there must be some other issue.

The parallel job would use 350GB from each of the compute nodes (3 in this case) and the total memory requested for this job would be 1.x TB.

If your job has terminated, then please share us the output of

  1. source /etc/pbs.conf ; $PBS_EXEC/unsupported/pbs_dtj < job id >
    eg. source /etc/pbs.conf ; $PBS_EXEC/unsupported/pbs_dtj 111

thanks, Adarsh for your tremendous support.

I was requesting resources more than available, issue solved.

thank you

1 Like