Reimplement complex 2D FFT in idg-fft
Use a composite kernel for large transformations: instead of calling a multi-threaded 2D FFT, the transformation is performed as a series of 1D FFTs (kernel_fft_composite
). The main benefit is better scalability and therefore (much) better performance on large transformations. For smaller transformations, another new kernel (kernel_fft_coarse
) is applied.
These kernels resemble the kernels in CPU Optimized, albeit without scaling.