Extend coherent stokes transpose kernel with FFT
The CoherentStokesTransposeKernel
now also optionally performs a FFT over all channels (per tab,sample and polarization) and a FFT-shift before writing the samples to device memory. A test for this new functionally is added. The kernel can also operate in 16-bit (input and output are half-precision), but the FFT will still be performed in 32-bit. The benchmark programs have been updated to work with the upgraded kernel.