Draft: Initial version
Replaced all compile time CPU support detection using #ifdef
's with __attribute__ ((target (...)))
annotation for complete functions, or with __builtin_cpu_supports(...)
if the #ifdef
's occurred inside the function body. The latter has a small run time overhead, but the the speed gains are larger than the penalty.