Fixed AVX2 support for radec2lmn.
We have to make sure that SmartVector is actually aligned. Then we can switch back to the aligned mode. The current implementation does not guarantee that the data is aligned.
Benchmark results (node504):
benchmark/microbenchmarks --benchmark_filter="DirectionsBenchmark" --benchmark_min_warmup_time=1
--------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------
DirectionsBenchmark/DirectionsOneByOne/#dir:1024 0.042 ms 0.042 ms 16301
DirectionsBenchmark/DirectionsOneByOne/#dir:2048 0.083 ms 0.083 ms 8384
DirectionsBenchmark/DirectionsOneByOne/#dir:4096 0.165 ms 0.165 ms 4158
DirectionsBenchmark/DirectionsOneByOne/#dir:8192 0.334 ms 0.333 ms 2119
DirectionsBenchmark/DirectionsMulti/#dir:1024 0.036 ms 0.036 ms 19541
DirectionsBenchmark/DirectionsMulti/#dir:2048 0.071 ms 0.071 ms 9743
DirectionsBenchmark/DirectionsMulti/#dir:4096 0.142 ms 0.141 ms 5047
DirectionsBenchmark/DirectionsMulti/#dir:8192 0.278 ms 0.277 ms 2527
DirectionsBenchmark/DirectionsMultiSIMD/#dir:1024 0.030 ms 0.030 ms 23531
DirectionsBenchmark/DirectionsMultiSIMD/#dir:2048 0.060 ms 0.060 ms 11790
DirectionsBenchmark/DirectionsMultiSIMD/#dir:4096 0.125 ms 0.125 ms 5856
DirectionsBenchmark/DirectionsMultiSIMD/#dir:8192 0.273 ms 0.271 ms 2943
perf
72.51% microbenchmarks microbenchmarks [.] Directions::radec2lmn<(Directions::computation_strategy)2> ◆
9.52% microbenchmarks microbenchmarks [.] xt::stepper_tools<(xt::layout_type)1>::increment_stepper<xt::stepper_assigner<xt::xtensor_container<xt::uvector<doub▒
3.22% microbenchmarks microbenchmarks [.] xt::strided_loop_assigner<true>::run<xt::xtensor_container<xt::uvector<double, xsimd::aligned_allocator<double, 16ul▒
1.93% microbenchmarks microbenchmarks [.] xsimd::kernel::detail::trigo_reducer<xsimd::batch<double, xsimd::sse2>, xsimd::kernel::detail::trigo_radian_tag>::re
Edited by Wiebe van Breukelen