bob.sacamento February 2016

Trying to pass MPI derived types between processors (and failing)

I am trying to parallelize a customer's Fortran code with MPI. f is an array of 4-byte reals dimensioned f(dimx,dimy,dimz,dimf). I need the various processes to work on different parts of the array's first dimension. (I would have rather started with the last, but it wasn't up to me.) So I define a derived type mpi_x_inteface like so

call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL,  &
                     mpi_x_interface, mpi_err)
call mpi_type_commit(mpi_x_interface, mpi_err)

My intent is that a single mpi_x_interface will contain all of the data in 'f' at some given first index "i". That is, for given i, it should contain f(i,:,:,:). (Note that at this stage of the game, all procs have a complete copy of f. I intend to eventually split f up between the procs, except I want proc 0 to have a full copy for the purpose of gathering.)

ptsinproc is an array containing the number of "i" indices handled by each proc. x_slab_displs is the displacement from the beginning of the array for each proc. For two procs, which is what I am testing on, they are ptsinproc=(/61,60/), x_slab_displs=(/0,61/). myminpt is a simple integer giving the minimum index handled in each proc.

So now I want to gather all of f into proc 0 and I run

    if (myrank == 0) then
      call mpi_gatherv(MPI_IN_PLACE, ptsinproc(myrank),
 +                     mpi_x_interface, f(1,1,1,1), ptsinproc,
 +                     x_slab_displs, mpi_x_interface, 0,
 +                     mpi_comm_world, mpi_err)
    else
      call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
 +                     mpi_x_interface, f(1,1,1,1), ptsinproc,
 +                     x_slab_displs, mpi_x_interface, 0,
 +                     mpi_comm_world, mpi_err)
    endif

Answers


NoseKnowsAll February 2016

First off, I just want to say that the reason you're running into so many problems is because you are trying to split up the first (fastest) axis. This is not recommended at all because as-is packing your mpi_x_interface requires a lot of non-contiguous memory accesses. We're talking a huge loss in performance.

Splitting up the slowest axis across MPI processes is a much better strategy. I would highly recommend transposing your 4D matrix so that the x axis is last if you can.

Now to your actual problem(s)...

Derived datatypes

As you have deduced, one problem is that the size and extent of your derived datatype might be incorrect. Let's simplify your problem a bit so I can draw a picture. Say dimy*dimz*dimf=3, and dimx=4. As-is, your datatype mpi_x_interface describes the following data in memory:

| X |   |   |   | X |   |   |   | X |   |   |   |

That is, every 4th MPI_REAL, and 3 of them total. Seeing as this is what you want, so far so good: the size of your variable is correct. However, if you try and send "the next" mpi_x_interface, you see that your implementation of MPI will start at the next point in memory (which in your case has not been allocated), and throw an "invalid memory access" at you:

                                             tries to access and bombs
                                                 vvv
| X |   |   |   | X |   |   |   | X |   |   |   | Y |   |   |   | Y | ...

What you need to tell MPI as part of your datatype is that "the next" mpi_x_interface starts only 1 real into the array. This is accomplished by redefining the "extent" of your derived datatype by calling MPI_Type_create_resized().

Post Status

Asked in February 2016
Viewed 3,236 times
Voted 9
Answered 1 times

Search




Leave an answer