Skip to content

Add kernel_tuner application

Most of the GEMM parameters are hard-coded. This application determines these parameters for a particular target.

Python

Proposal

/tuner

tunable parameters:

BLOCK_SIZE_X
BLOCK_SIZE_Y
BLOCK_SIZE_Z
M_PER_BLOCK
N_PER_BLOCK
M_PER_WARP
N_PER_WARP
M_PER_WMMA
N_PER_WMMA
K_PER_WMMA
NBUFFER

via commandline

NR_INPUT_BITS
M_GLOBAL
N_GLOBAL
K_GLOBAL
BATCH_SIZE