Support l,m shift in CUDA kernels.

Merge request reports

Loading