add sincos() vectorization
The point source phase calculation is limited by the sincos() operation, that can be vectorized by splitting the loop into separate sin() and cos() loops. Performance: generic '-O3' performance vs '-O2 -ffast-math -ftree-vectorize' shows a speedup
Repeat the merge req because prev merge request cannot be rebased with master.