Remove samples4 in beamFormer kernel
The code implicitly assumes that the input has a [2]
dimension, which corresponds to [NR_POLARIZATIONS]
. This dimension is now added to the multidimensional type definition, and the samples4
have been removed from the shared memory union. A for-loop in loadSamples
now makes sure that all 4 values are copied from device memory to shared memory. These changes even marginally improve kernel performance.