Library for 2D pencil decomposition and distributed Fast Fourier Transform |
This page explains the key public interfaces of the 2D decomposition library. After reading this section, users should be able to easily build applications using this domain decomposition strategy. The library interface is designed to be very simple. One can refer to the sample applications for a quick start.
The 2D Pencil Decomposition API is defined in one Fortran module which should be used by applications:
use decomp_2d
Global Variables
Following is the list of global variables defined by the library that can be used in applications. Obviously these names should not be redefined in applications to avoid conflict. Also note that some variables contain duplicate or redundant information just to simplify the programming.
It is recommended that an application define its major data structures using these global variables in its main program immediately after initialising the decomposition library. Using allocatable arrays is preferred, as shown in the following examples:
! allocate a X-pencil array allocate(ux(xsize(1), xsize(2), xsize(3))) ! allocate a Y-pencil array using global indices allocate(uy(ystart(1):yend(1), ystart(2):yend(2), ystart(3):yend(3)))
There are also a number of utility subroutines to help allocate 3D arrays.
For debugging or otherwise writing information to standard output, the following is recommended:
if (nrank==0) then write(*,*) ! list of variables ...... end if
Basic 2D Decomposition API
All the global variables defined above are initialised by:
call decomp_2d_init(nx, ny, nz, P_row, P_col)
where nx, ny and nz are the size of 3D global data to be distributed over a 2D processor grid P_row*P_col. Note that none of the dimensions need to be divisible by P_row or P_col, i.e. the library can handle non-evenly distributed data. However, choose the numbers smartly to avoid significant load-imbalance, which could lead to poor performance. Also note that a constraint imposed by the algorithm is: P_row <= min(nx, ny) and P_col <= min(ny, nz).
An optional parameter may be passed to this initialisation routine:
call decomp_2d_init(nx, ny, nz, P_row, P_col, periodic_bc)
Here periodic_bc is a 1D array containing 3 logical values that specify whether periodic boundary condition should apply in certain dimensions. Note this is only applicable if halo-cell communication is to be used.
To get the best performance, it is very important to choose the best possible processor grid P_row*P_col. Please follow this note for more instructions.
A key element of this library is a set of communication routines that actually perform the data transpositions. As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations. Correspondingly, the library provides 4 communication subroutines:
call transpose_x_to_y(in, out) call transpose_y_to_z(in, out) call transpose_z_to_y(in, out) call transpose_y_to_x(in, out)
The input array in and output array out should have been defined and contain distributed data for the correct pencil orientations.
Note that the library is written using Fortran's generic interface so different data types are supported without user intervention. That means in and out above can be either real arrays or complex arrays, the latter being useful for FFT-type of applications.
As seen, the communication details are packed within a black box. From a user's perspective, it is not necessary to understand the internal logic of these transposition routines. From the developer's perspective, he has the freedom to change the implementation without breaking user codes.
It is however noted that the communication routines are expensive, especially when running on large number of processors. So applications should try to minimize the number of calls to them by adjusting the algorithms in use, even sometimes by duplicating computations.
Finally, before exit, applications should clean up the memory by:
call decomp_2d_finalize
Advanced 2D Decomposition API
While the basic decomposition API is very user-friendly, there may be situations in which applications need to handle more complex data structures. There are quite a few examples:
In all these examples, there are multiple global sizes and applications need to be able to distributed different data sets as 2D pencils. 2DECOMP&FFT provides a powerful and flexible programming interface to handle this:
TYPE(DECOMP_INFO) :: decomp call decomp_info_init(n1, n2, n3, decomp)
Here decomp is an instance of Fortran derived data type DECOMP_INFO encapsulating the 2D decomposition information associated with one particular global size n1*n2*n3. The decomposition object can be initialised using the decomp_info_init routine. This object then can be passed to the communication routines defined in the basic interface as a third parameter. For example:
call transpose_x_to_y(in, out, decomp)
The third parameter is optional and can be safely ignored by users using the basic interface only. When it is in use, the global transposition will be applied to data set of the associated global size instead of the default global size nx*ny*nz. The decomp object can also be used in many other subroutines where arbitrary global data size is required.
The following code snippets demonstrate the use of such API in 2DECOMP&FFT's FFT implementations:
! Two decomposition objects used by physical- and spectral-space variables, respectively. TYPE(DECOMP_INFO), save :: ph ! physical space TYPE(DECOMP_INFO), save :: sp ! spectral space ! ...... ! Initialise the decomposition objects. call decomp_info_init(nx, ny, nz, ph) call decomp_info_init(nx/2+1, ny, nz, sp) ! spectral space roughly half size
The DECOMP_INFO type defines the following public components that can be used by applications:
These describe the sizes, starting indices and ending indices of the sub-domains held by the current process - exactly the same meaning as the size/start/end variables defined earlier, except that they apply to the global size associated with one particular decomposition object, rather than the default global size. Here is more sample code:
real(mytype), allocatable, dimension(:,:,:) : wk1 complex(mytype), allocatable, dimension(:,:,:) : wk2 ! allocate memory space allocate(wk1(ph%xsz(1),ph%xsz(2),ph%xsz(3))) ! X-pencil, for physical-space data allocate(wk2(sp%zsz(1),sp%zsz(2),sp%zsz(3))) ! Z-pencil, for spectral-space data ! to loop through the allocated array do k = 1, sp%xsz(3) do j = 1, sp%xsz(2) do i = 1, sp%xsz(1) wk1(i,j,k) = ...... end do end do end do
The DECOMP_INFO type also defines other components that are mainly used internally in the library, such as parameters describing the memory addresses, sizes and displacements of the MPI buffers to be used by the communication code.
If a decomposition object is no longer needed in an application, one can release its associated memory by:
call decomp_info_finalize(decomp)
Utility Routines to Help Allocate 3D Arrays
If a large number of distributed arrays are to be used in an application, using the global variables defined above for memory allocation can become a bit awkward. For this reason, a number of utility routines have been introduced in version 1.5.x of the library to simplify such operation. In the simplest form, these utility routines look like:
call alloc_x(var) call alloc_y(var) call alloc_z(var)
The three routines are used to create X-pencil, Y-pencil and Z-pencil distributed arrays, respectively. Here var is a three-dimensional real or complex allocatable array declared in the calling program and to be allocated in the subroutines. The following two lines of code are equivalent:
call alloc_x(var) allocate(var(xsize(1), xsize(2), xsize(3)))
The routines also accept additional parameters for more general case:
call alloc_x(var, decomp, global)
Here the optional parameter decomp of type DECOMP_INFO, if present, specifies the global size of the array to be allocated; the optional logical parameter global, if set to .true., indicates that the array should be allocated using the global coordinates. So the above statement is equivalent to:
allocate(var(decomp%xsz(1), decomp%xsz(2), decomp%xsz(3))) ! if global==.false. allocate(var(decomp%xst(1):decomp%xen(1), decomp%xst(2):decomp%xen(2), & decomp%xst(3):decomp%xen(3))) ! if global==.true.
As can be seen, the utility routines are in a much more compact form. They will also report an error if the memory allocation is somehow not successful.