Consider using KernelFloat
To be flexible in conversions to/from reduced precision types, we could consider using the KernelFloat library. In supports vector types and complex arithmetic as well. From their README:
In a nutshell, Kernel Float offers the following features:
- Single type vec<T, N> that unifies all vector types.
- Operator overloading to simplify programming.
- Support for half (16 bit) floating-point arithmetic, with a fallback to single precision for unsupported operations.
- Support for quarter (8 bit) floating-point types.
- Easy integration as a single header file.
- Written for C++17.
- Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).