Add matrixMultiplyAVX2b (!4) · Merge requests · Mattia Mancini / Microbenchmarks · GitLab

Bram Veenboer requested to merge second-avx2-implementation into main May 14, 2024

This version is based on matrixMultiplyAVX2 with some changes:

Remove the multiplication with inv
Use _mm256_addsub_ps
Replace the overkill _mm256_permutevar8x32_ps with cheaper (and cleaner) _mm256_permute_ps
Reshuffle b_1 and b_2 to get b_3 and b_4