2DECOMP Logo Library for 2D pencil decomposition and distributed Fast Fourier Transform


P3DFFT is probably the most well-known open-source distributed FFT library. The project was initiated at San Diego Supercomputer Center at UCSD by Dmitry Pekurovsky. It is highly efficient and it has been widely adopted by scientists doing large-scale simulations, such as high-resolution turbulence simulations.

P3DFFT was actually ported onto HECToR (my development system) at the early stage of the 2DECOMP&FFT project. Fig. 1 shows its good scaling on the old hardware (back in early 2009, the system was a Cray XT4 using dual-core AMD Opteron processors and Cray SeaStar interconnect).

Fig. 1: P3DFFT scaling on Cray XT4 HECToR.

What motivated the author to develop a new and somewhat competing library are the following:

Performance Comparison

The parallel performance of 2DECOMP&FFT and P3DFFT has been studied in great detail in a MSc thesis by E. Brachos at University of Edinburgh. Fig. 2 shows a set of benchmark on r2c/c2r transforms of size 256^3. The MPI interface of FFTW 3.3 was also examined, although it can only run in 1D slab decomposition mode.

Fig. 2: Speedup of 2DECOMP&FFT, P3DFFT and FFTW 3.3's MPI interface.

The performance difference between 2DECOMP&FFT and P3DFFT is often shown to be marginal, although the best 2D processor grid to achieve the optimal performance can be very different due to the different internal architecture of the two libraries.

The scalability and the absolute performance of both 2DECOMP&FFT and P3DFFT are better than FFTW 3.3 running in MPI mode. FFTW is, however, much more efficient in OpenMP mode. This suggests that a hybrid implementation may be the future direction of 2DECOMP&FFT.