Brandon Wigfield February 2016

GNU Parallel - Detecting that a command run in parallel has completed

So I have a situation where I'm running numerous commands with parallel and piping the output to another script that consumes the output. The problem I'm having is that my script that does the processing of output needs to know when a particular command has finished executing.

I'm using the --tag option so that I know what command has generated output but currently I have to wait until parallel is done running all commands before I can know that I'm not going to get anymore output from a particular command. From my understanding of parallel I see the following possible solutions but none really suit me.

  1. I could group the output lines with the --line-buffer option so it looks like that were ran sequentially. Then whenever I see output from the next command I know the previous has finished, however doing it that way slows me up as one command may take 30 seconds to complete while after it there may 20 other commands that only took one second and I wish to process them in as close to real-time as possible.

  2. I could wrap my command in a tiny bash script that outputs 'Process with some ID DONE' to get the notification the command completed. I don't really like this because I'm running several hundred commands at a time and don't really want to add all those extra bash processes.

I am really hoping that I'm just missing something in the docs and there is a flag in there to do what I'm looking for.

My understanding is that parallel is implemented in perl, which I'm comfortable with, but would rather not have to add the functionality myself unless its completely necessary.

Any help or suggestions are greatly appreciated.

Answers


Ole Tange February 2016

The default behaviour with --tag should work perfectly. It will not output anything until the job is done. And then your postprocessor can simply grab the argument from the start of the line.

Example:

parallel -j3 --tag 'echo Job {} start; sleep {}; echo Job {} ended' ::: 7 1 3 5 2 4 6

If you want to keep the order:

parallel -j3 --keep-order --tag 'echo Job {} start; sleep {}; echo Job {} ended' ::: 7 1 3 5 2 4 6

Notice how the jobs would mix if the output was done immediately. Compare with --ungroup (which you do not want):

parallel -j3 --ungroup 'echo Job {} start; sleep {}; echo Job {} ended' ::: 7 1 3 5 2 4 6

Post Status

Asked in February 2016
Viewed 3,915 times
Voted 14
Answered 1 times

Search




Leave an answer