Skip to content

Optimize correlator

Jan David Mol requested to merge optimize-correlator into cobalt2.1

The correlator code, originally added to the LOFAR (now Cobalt) codebase over a decade ago, has remained largely unchanged aside from minor maintenance. Since then, the NVIDIA compiler has significantly improved its ability to optimise code automatically, reducing the need for manual optimisations, such as loop unrolling.

This MR introduces a substantial cleanup and simplification of the kernel code. Key changes include:

  • Introducing new helper functions:
    • load_samples
    • do_correlate
    • compute_do_baseline
  • Replacing the separate correlate_1x1 to correlate_4x4 functions with a single, templated correlate_nxn function.

As shown in the figures below, the performance on an NVIDIA Tesla V100 remain virtually unchanged between the original (‘reference’) and updated kernel implementations.

image

The tCorrelatorPerformance test only ran when a Tesla K10 GPU was detected. This is now changed to Tesla V100. The runtime using the newest correlator kernel were put in as reference runtimes. Applying this change (only update tCorrelatorPerformance, but keeping the existing correlator kernel, we see some interesting speedups of the optimised kernel version:

Failure in 48_Stations_250ms_16ch: Expected 2.5 +/- 0.5 but was 5.37528
Failure in 80_Stations_250ms_16ch: Expected 3.6 +/- 0.5 but was 10.1913

In other words, for these cases, this new version is up to 2-3x faster.

Edited by Bram Veenboer

Merge request reports

Loading