Lance February 2016

MPI parallelization help for reading a large amount of data multiple times

I have been asked to parallelize an existing c program in order to decrease its runtime. I only have some (very limited) experience using basic MPI (and all my programming knowledge is self taught, so it is somewhat spotty). I am currently trying to figure out the best parallelization approach.

Currently, during every iteration of the main loop (M = number of iterations), the program sequentially accesses a set of input files (N= number of files)-each of varying length. After all the input files are read the program sorts the data and updates a set of output files. Both N and M are known at the start, and N is always larger than M. In fact, N is too large to read all the input data into memory, so each time the files are read, only the information pertinent to that main loop iteration is kept.

I am confident I can make each main loop iteration independent, but every iteration would still need to access all N files. What would be the best way to use OpenMPI (technically OpenRTE 1.6.2 running on Rocks- i.e. RedHat Linux) to parallelize this program?

My first idea was to simply split up the read-in of the input files across multiple threads- each thread handling a subset of files and then ordering the inputs at the end.

My second idea was to instead split up the main M loop across the threads, which would be a much better utilization of MPI. But would this method require copies all input files in every thread (to avoid reading conflicts)? If so, I am worried copying the files may offset any time gained from parallelizing the main loop. Also, besides building a test program for each approach, is there an easier way to determine which method would be faster?

Edit: The file system is NFS.

After reading the comments I went back and ran a few tests on the code. The program spends 93% of its runtime reading in data. From what has been said it seems parallelization alone may not be the best solution. At this point it seems

Answers


ron February 2016

based on the comment responses, with the file system being NFS you mean you are reading your files across the network? This can be very problematic if you parallelize about N the number of files. If N is too large you risk exceeding the maximum number of open file pointers at a time, which is typically defined in /etc/security/limits.conf. I know if the shell type is either csh or tcsh, then if you type limit at the prompt it will display all those values. Sorry i forget the command to display that in a bash shell. Then you also risk overloading NFS, and problems with lan or wan bandwidth. If your network is at 100 mbps then that is only 12 Megabytes of data per second at best. And if you don't check it how do you know it is not really a value in Kilobytes per second?

if the biggest cause of run time of the program is reading in data, there may be little you can do about it. Besides the NFS issue, I would suggest thinking in terms of how a hard drive (wherever it may be located) will be commanded to read each chunk/file of data. I think it is usually best to have only one file pointer reading as sequentially as possible the data from a disk drive, and that would leave it up to you how to buffer that data to be used in your program. You would need to do the math and figure if you have enough RAM. IF not then that would be what you need to increase otherwise you are forced to rely on disk i/o which is a killer.


Rob Latham February 2016

Parallel I/O to NFS is a fools errand. MPI implementations will try their best, but NFS -- in addition to being serial -- provides horrible consistency semantics. Client writes show up to other processes at some vauge undefined time. You can turn off caching and fcntl-lock around each operation and you still won't get the consistency you might expect.

MPI implementations provide NFS support because NFS is everywhere, but for not much more effort you can deploy something like PVFS / OrangeFS, should profiling determine that I/O is actually an important bottleneck for you.

Post Status

Asked in February 2016
Viewed 3,106 times
Voted 10
Answered 2 times

Search




Leave an answer