Skip to content

Fixed AVX2 support for radec2lmn.

We have to make sure that SmartVector is actually aligned. Then we can switch back to the aligned mode. The current implementation does not guarantee that the data is aligned.

Benchmark results (node504):

benchmark/microbenchmarks --benchmark_filter="DirectionsBenchmark" --benchmark_min_warmup_time=1
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
DirectionsBenchmark/DirectionsOneByOne/#dir:1024       0.042 ms        0.042 ms        16301
DirectionsBenchmark/DirectionsOneByOne/#dir:2048       0.083 ms        0.083 ms         8384
DirectionsBenchmark/DirectionsOneByOne/#dir:4096       0.165 ms        0.165 ms         4158
DirectionsBenchmark/DirectionsOneByOne/#dir:8192       0.334 ms        0.333 ms         2119
DirectionsBenchmark/DirectionsMulti/#dir:1024          0.036 ms        0.036 ms        19541
DirectionsBenchmark/DirectionsMulti/#dir:2048          0.071 ms        0.071 ms         9743
DirectionsBenchmark/DirectionsMulti/#dir:4096          0.142 ms        0.141 ms         5047
DirectionsBenchmark/DirectionsMulti/#dir:8192          0.278 ms        0.277 ms         2527
DirectionsBenchmark/DirectionsMultiSIMD/#dir:1024      0.030 ms        0.030 ms        23531
DirectionsBenchmark/DirectionsMultiSIMD/#dir:2048      0.060 ms        0.060 ms        11790
DirectionsBenchmark/DirectionsMultiSIMD/#dir:4096      0.125 ms        0.125 ms         5856
DirectionsBenchmark/DirectionsMultiSIMD/#dir:8192      0.273 ms        0.271 ms         2943

perf

  72.51%  microbenchmarks  microbenchmarks      [.] Directions::radec2lmn<(Directions::computation_strategy)2>                                                          ◆
   9.52%  microbenchmarks  microbenchmarks      [.] xt::stepper_tools<(xt::layout_type)1>::increment_stepper<xt::stepper_assigner<xt::xtensor_container<xt::uvector<doub▒
   3.22%  microbenchmarks  microbenchmarks      [.] xt::strided_loop_assigner<true>::run<xt::xtensor_container<xt::uvector<double, xsimd::aligned_allocator<double, 16ul▒
   1.93%  microbenchmarks  microbenchmarks      [.] xsimd::kernel::detail::trigo_reducer<xsimd::batch<double, xsimd::sse2>, xsimd::kernel::detail::trigo_radian_tag>::re
Edited by Wiebe van Breukelen

Merge request reports

Loading