Skip to content

Improve performance of CUDA gridder and degridder with many aterms

Bram Veenboer requested to merge cuda-aterms into master

The CUDA gridder and degridder kernel have been optimized for the case when multiple aterms need to be applied to a single subgrid. Moreover, the mechanism to select batch and block size parameters in InstanceCUDA is outdated. Since these new kernels have been tested on an RTX A4000 (ga102), it does not make much sense to keep parameters for other GPUs. In case these changes cause performance issues for older architectures, they will need to be remedied separately.

The performance is measured using a (synthetic) benchmark separate from this repository:

throughput-gridder.pdf throughput-degridder.pdf

Merge request reports