Limiting I/O to a filesystem

fnevgeny · August 15, 2023, 11:32am

Hello,

We need to be able to count and limit the I/O of jobs. Specifically, to/from a networked filesystem. I understand cgroup can be used to throttle I/O to a block device, but not a filesystem. Netfilter/TC, on the other hand, works at the level of the entire machine, not a job. Any suggestions will be appreciated.

adarsh · August 16, 2023, 6:42am

Not sure about the open source tools, Altair Breeze/mistral are tools for I/O profiling, they might of some help.

fnevgeny · August 16, 2023, 1:53pm

Thank you. Well, I hope to remain in the realm of the open source, indeed :). Meantime, I recalled a similar in spirit utility operating on the network bandwidth, trickle. If I don’t find what I need ready, probably the simplest way would just be intercepting read, write and like calls through LD_PRELOAD (like trickle does for socket and friends).

speleolinux · August 17, 2023, 1:57am

Hi
A roundabout way to do it would be to use one of the command line utilities that measures I/O and then take an action in PBS. For instance if you use iotop that has options of --batch --pid=PID and --user=USER. Get the USER from the PBS job, monitor the users I/O on the execution node, when it gets above a certain amount do something with the job. You could even have a custom resources called max_IO and as as I/O gets used update a used_IO resource. This is all just guess work and might not work.
Mike

fnevgeny · August 17, 2023, 7:45am

Hi,

The problem with iotop or similar utilities (like iostat, ionice) is that they either count total I/O or can be limited to a given block device. But I don’t care, e.g., about I/O to $TMPDIR, which is a fast local disk coping just fine with a few jobs. But a distributed networked file system (with no underlying block device on the client machine) is something that users tend to abuse, and I want at least to have per-job real-time stats on that or, ideally, throttle the usage.

I strongly suspect there is already something like that out there; would be a pity to reinvent the wheel…

Topic		Replies	Views
PP-759: possibility to disable job-wide limit enforcement for exclusive jobs Developers	11	1948	February 26, 2021
Memory restriction on all nodes Users/Site Administrators	5	837	September 22, 2021
Qsub interactive job does not limit ncpu Users/Site Administrators	4	446	July 3, 2023
$restrict_user and high system UIDs Users/Site Administrators	1	1961	August 8, 2016
How to prevent HUGE stdout in /var/spool, which might occupy the whole partition Users/Site Administrators	3	1648	May 9, 2018

Limiting I/O to a filesystem

Related Topics