Skip to content

Add matrixMultiplyAVX2b

Bram Veenboer requested to merge second-avx2-implementation into main

This version is based on matrixMultiplyAVX2 with some changes:

  • Remove the multiplication with inv
  • Use _mm256_addsub_ps
  • Replace the overkill _mm256_permutevar8x32_ps with cheaper (and cleaner) _mm256_permute_ps
  • Reshuffle b_1 and b_2 to get b_3 and b_4

Merge request reports