Skip to content

Add SIMD-based optimizations and refactored PredictPlanExecCPU

This merge request introduces significant performance improvements—up to a reported 2× speed up for computing cross-correlations involving point and Gaussian sources, compared to the main branch. Comparing performance to the reference Predict implementation, at least a 2.5x speed up is observed.

Key changes include:

  • Optimized memory layout: Complex real and imaginary elements are now stored in continuous memory rather than using std::complex's interleaved layout. This change improves opportunities for vectorization.
  • Code refactoring: The cross-correlation code has been rewritten with three implementations:
    • A single-threaded (original) version
    • A version using manual SIMD
    • An implementation using xsimd
  • Profiler support: Added integration with the Tracy profiler for performance analysis.
  • Renamed execution class: PredictPlanExec has been renamed to PredictPlanExecCPU to pave the way for future GPU implementations (e.g., PredictPlanExecGPU).
  • Extended test cases

Fixes #24 (closed).

Performance Comparison

Old Performance (current main branch commit 5ce36ed2)

Old Performance

New Performance

New Performance

Edited by Wiebe van Breukelen

Merge request reports

Loading