Skip to content

Consider using KernelFloat

To be flexible in conversions to/from reduced precision types, we could consider using the KernelFloat library. In supports vector types and complex arithmetic as well. From their README:

In a nutshell, Kernel Float offers the following features:

  • Single type vec<T, N> that unifies all vector types.
  • Operator overloading to simplify programming.
  • Support for half (16 bit) floating-point arithmetic, with a fallback to single precision for unsupported operations.
  • Support for quarter (8 bit) floating-point types.
  • Easy integration as a single header file.
  • Written for C++17.
  • Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).