Vectorization of sincos() bottleneck
The beamformer weight calculation is limited by the sincos() operation, that can be vectorized by splitting the loop into separate sin() and cos() loops. Performance: generic '-O3' performance vs '-O2 -ffast-math -ftree-vectorize' shows a speedup