Library for 2D pencil decomposition and distributed Fast Fourier Transform |
Memory consumption is normally not an issue for applications built on 2DECOMP&FFT. Because the scalability of the library is extremely good, simply increasing the core count would solve most memory-related problems.
For distributed FFTs with 2D decomposition, assuming the size of the input is 1X, the memory footprint of the algorithms is 5X, including:
Some of the temporary space may be allocated as needed. But for performance reason (Fortran ALLOCATE can be quite slow for large memory space), it is preferable to have them allocated and stored globally.
The following optimisations are possible and may be introduced in a future version of the library:
In Practice
Footnotes
1. This can be achieved in Fortran very efficiently using the EQUIVALENCE statement if the input and output are both static arrays. However, this technique does not apply to dynamically allocated arrays and is strongly discouraged in modern Fortran. More discussions can be found here.