As already covered in other parts of the documentation, one major strength of the 2DECOMP&FFT library is its user-friendliness. This case study demonstrates the parallelisation practice of a real-world CFD application.
I worked for the CSE team supporting HECToR, the UK's national supercomputing facility. I was one day contacted by Dr. Shankar Balakrishnan from Southampton University regarding the use of FFT in his CFD code. He later provided the following summary:
"For direct numerical simulations of fluid flow phenomenon involving vortex rings, the distribution of vorticity within the core of the vortex ring is initialised by an analytical expression usually a Gaussian distribution. From this known vorticity field the corresponding velocity field needs to be obtained to initialise the flow. A three-dimensional Fourier-transform is performed on the vorticity field and the Fourier coefficients are obtained in 'wave-space'. The Fourier components of the velocity field are then obtained and an inverse Fourier transform is performed to obtain the velocity field in real-space.
The computational time for this initialisation process is typically small compared to the overall runtime for the simulation of the evolution of the fluid flow with time. Hence the gain in computational time does not justify the parallelisation of the code used for the initialisation process. However, the memory required for the process becomes too large to be performed on a single node as the domain size increases. For this reason the initialisation process needs to be parallelised to distribute the work over multiple nodes. Using a code for performing fast-Fourier transforms in parallel, Vorticity field-Velocity field transformations can be performed for computational domains of indefinitely large sizes."
Without knowing much scientific details of his work, it was possible to parallelise his code using the 2DECOMP&FFT library in a matter of days. Some essential parts of the serial and the the parallel code are compared side-by-side below to show how easy it is to use the library.
Serial Code | Parallel Code |
---|---|
program vort |
program vort |
As can be seen, the code structure remains unchanged in the parallel version. The main effort in the parallel code is to properly define the portion of data locally resided on each processor (in allocation statements and loop count variables) - all such decomposition information is available through public variables after the initialisation of the library. The code actually looks messier than it should be because it was translated from C with a 0-based index system. The actual FFT computations are much simpler because the use of FFTW (or whatever other 3rd party libraries) is handled internally by 2DECOMP&FFT. It is also noticeable that very few MPI routines are called in the parallel code - the library hides as much communication details as possible.
In the end, Dr Balakrishnan reported back that the data generated and written out using the MPI-IO routines in the parallel code was identical to that written out using Fortran direct-access IO from the serial code. He was able to initialise much bigger datasets for his direct numerical simulations.