I am trying to parallelize the
tm_spawn function using tpp_mcast and need your expertise.
Here is a brief summary of how
tm_spawn currently works. The client that uses the task manager API (for example pbs_dsh) has a list of nodes that it wants to spawn a task one, and for each of the node it calls tm_spawn, which has the signature below:
int tm_spawn(int argc, char **argv, char **envp, tm_node_id where, tm_task_id *tid, tm_event_t *event)
Note that the client passes an “event”, which is just a number. If there are n nodes, there will be n event numbers. These will be used to track the status of the tm_spawn requests.
Once the mom gets a TM_SPAWN request from the client, and finds that the requested node to spawn task on is not herself, mom sends an IM_SPAWN (inter-mom message) to the sister, forwarding the event number as well. And when the mom gets a reply back from the sister, mom
tm_reply to reply to the client. The client will learn about the status of the request its sent out, by doing tm_poll using the event number.
Now I would like to achieve something like this:
int tm_spawn_multi(int argc, char **argv, char **envp, tm_node_id where, tm_task_id *tid, tm_event_t *event)
so that a client can spawn tasks on multiple nodes (note where has turned into an array) at once instead of using a loop. The actual implementation could utilize the tpp_mcast functionality. Once mom gets the TM_SPAWN request from the client, mom could use
tpp_mcast_add_strm to add the list of receivers (sister moms) specified by
im_compose to send the all sisters an IM_SPAWN request at once.
What I could not figure out was how to use events to track the status of these requests now. The way that the
tpp_mcast suite of functions were implemented assumes that we send the same data to all the streams (sisters), as you can see from the signature of
im_compose, all the streams will be sent the same event number:
int im_compose(int stream, char *jobid, char *cookie, int command, tm_event_t event, tm_task_id taskid, int version)
Is there a way to use tpp_mcast, but send each receiver a different event number?
Or is there a better way to track status of the tm_spawn request? Maybe instead of replying to the client every time a sister complete the spawn, the mom can somehow using a single event to track the number of sisters have finished spawning task, and then send a single reply to the client once all sisters have finished?