Add alternative compilation method using NVRTC
This is based on cherry-picked code from cob-148
, after which it is refactored quite a bit. A test case is added to t_gpu_util
to compare the patched NVCC compilation method with the new NVRTC compilation method. By default, NVCC is still used. There are differences in the generated PTX, which might need to be investigated further.