Update to latest wmma kernel
Update WMMA kernel. Added a switch for WMMA_K parameter to support 16/32bit floats in GEMM. Improved device<->shmem copying.
Update WMMA kernel. Added a switch for WMMA_K parameter to support 16/32bit floats in GEMM. Improved device<->shmem copying.