Big Data Processing framework (Spark and Dask) on PBS

guillaumeeb · January 7, 2018, 9:31pm

Hi everyone,

A quick word to share with you work that has been done on using Spark and Dask with PBS at CNES (French Spatial Agency). PBS scripts to launch Spark or Dask based cluster are available in this repo: https://github.com/guillaumeeb/big-data-frameworks-on-pbs.

Has anyone already done this? Do you have something to share?
Are there plan to add similar functionnality in PBS? I’ve already discussed a bit with @subhasisb some times ago, but I don’t know what is he current situation.
I will be happy to have any feedback on this, so fill free to answer or ask anything.

Cheers,
Guillaume.

subhasisb · January 8, 2018, 4:31am

Hi @guillaumeeb,

Thanks for updating about the work here. This is going to benefit the entire PBS community.
We can certainly look at including links to your work from the PBS Professional github pages etc.

To start with, we will help by testing out these scripts in the short term.

Regards,
Subhasis

cherryppju · August 22, 2019, 1:58pm

Hi @subhasisb, how about this testing on the scripts? Thanks.

guillaumeeb · September 20, 2019, 7:35pm

To update a bit here, we are now exclusively using Dask on our cluster. I thus contributed to a project that has been up for quite some time now:
https://jobqueue.dask.org/en/latest/

I encourage everyone to look at it, it can really simplify a cluster usage, from job array, to complex workflows, to distributed science.

See more about Dask here:

Topic		Replies	Views
How to query the number of available cores to your job Users/Site Administrators	17	15642	October 11, 2018
Installing PBS for intel oneapi based HPC Cluster for DFT calculations Developers	1	285	February 2, 2024
Does PBSpro suitable for DL jobs Users/Site Administrators	1	821	December 20, 2017
PBS-server not running Developers	31	7068	October 20, 2022
Looking for a "get started guide" Developers	28	6809	April 22, 2020

Big Data Processing framework (Spark and Dask) on PBS

Related topics