I have been asked to parallelize an existing c program in order to decrease its runtime.
I only have some (very limited) experience using basic MPI (and all my programming knowledge is self taught, so it is somewhat spotty). I am currently trying to figure out the best parallelization approach.
Currently, during every iteration of the main loop (M = number of iterations), the program sequentially accesses a set of input files (N= number of files)-each of varying length. After all the input files are read the program sorts the data and updates a set of output files. Both N and M are known at the start, and N is always larger than M. In fact, N is too large to read all the input data into memory, so each time the files are read, only the information pertinent to that main loop iteration is kept.
I am confident I can make each main loop iteration independent, but every iteration would still need to access all N files. What would be the best way to use OpenMPI (technically OpenRTE 1.6.2 running on Rocks- i.e. RedHat Linux) to parallelize this program?
My first idea was to simply split up the read-in of the input files across multiple threads- each thread handling a subset of files and then ordering the inputs at the end.
My second idea was to instead split up the main M loop across the threads, which would be a much better utilization of MPI. But would this method require copies all input files in every thread (to avoid reading conflicts)? If so, I am worried copying the files may offset any time gained from parallelizing the main loop. Also, besides building a test program for each approach, is there an easier way to determine which method would be faster?
Edit: The file system is NFS.
After reading the comments I went back and ran a few tests on the code. The program spends 93% of its runtime reading in data. From what has been said it seems parallelization alone may not be the best solution. At this point it seems