Add kernel_tuner application
Most of the GEMM parameters are hard-coded. This application determines these parameters for a particular target.
Python
Proposal
/tuner
tunable parameters:
BLOCK_SIZE_X
BLOCK_SIZE_Y
BLOCK_SIZE_Z
M_PER_BLOCK
N_PER_BLOCK
M_PER_WARP
N_PER_WARP
M_PER_WMMA
N_PER_WMMA
K_PER_WMMA
NBUFFER
via commandline
NR_INPUT_BITS
M_GLOBAL
N_GLOBAL
K_GLOBAL
BATCH_SIZE