The find_package(CUDAToolkit) call should always be executed, not just when NVML is built.
find_package(CUDAToolkit)
NVML