Library for 2D pencil decomposition and distributed Fast Fourier Transform |
When developing large-scale parallel applications, it is generally speaking a good idea to use middle-layer solutions to simplify application development and improve software performance. One good example of such solution is the Global Arrays Toolkit (GA), which is a high-level library that works on array-based distributed data. It provides a mechanism to access distributed data using a global shared-memory view via one-sided communication calls. It can be used side by side to a message passing library.
Version 1.5.x of 2DECOMP&FFT contains an experimental API to make use of the GA library (note the API apparently works but has not been tested vigorously in production codes for the moment). First of all, the GA integration code has to be enabled at compile-time by passing -DGLOBAL_ARRAYS pre-processing flag to the compiler. For this to work, a GA installation has to be available on the target system. Then applications may use the following API:
call get_global_array(ga, ipencil, data_type, opt_decomp)
Here the input value ipencil is the distribution of the input data (valid values are: 1 for X-pencil; 2 for Y-pencil and 3 for Z-pencil) and data_type can be either ga_real_type (for real data set) or ga_complex_type (for complex data set), both pre-defined constants. The output ga is a reference to the global array object that describes the data partitioning pattern that exactly matches the so-called pencils as defined by 2DECOMP&FFT. GA library calls can be made on such reference later. Finally, opt_decomp is an optional parameter of type DECOMP_INFO allowing the creation of global arrays of arbitrary size.
Sample Application
This example demonstrates the use of the API above to perform a global transposition. Assume one wants to transpose a distributed 3D real array from X pencil to Y pencil using 2DECOMP&FFT, one can use the following communication routine (implemented using message passing):
call transpose_x_to_y(input, output)
Alternatively, one can apply the API above to create two GA objects:
call get_global_array(ga1, 1, ga_real_type) call get_global_array(ga2, 2, ga_real_type)
Then the same transposition can be done using GA operations:
call nga_put(ga1, xstart, xend, input, xsize) call ga_copy(ga1, ga2) call nga_get(ga2, ystart, yend, output, ysize)
One complete example distributed with the 2DECOMP&FFT library shows how GA can be used to perform a distributed 3D FFT. Note, however, that GA is much more suitable for applications with irregular and dynamic communication patterns. For example, one might consider building a fluid solver on a structured mesh using 2DECOMP&FFT but using GA to perform particle tracking (particles are not normally distributed evenly in space, resulting in irregular communication patterns and requiring dynamic load balance).