Add SIMD-based optimizations and refactored PredictPlanExecCPU
This merge request introduces significant performance improvements—up to a reported 2× speed up for computing cross-correlations involving point and Gaussian sources, compared to the main branch. Comparing performance to the reference Predict
implementation, at least a 2.5x speed up is observed.
Key changes include:
- Optimized memory layout: Complex real and imaginary elements are now stored in continuous memory rather than using
std::complex
's interleaved layout. This change improves opportunities for vectorization. - Code refactoring: The cross-correlation code has been rewritten with three implementations:
- A single-threaded (original) version
- A version using manual SIMD
- An implementation using
xsimd
- Profiler support: Added integration with the Tracy profiler for performance analysis.
- Renamed execution class:
PredictPlanExec
has been renamed toPredictPlanExecCPU
to pave the way for future GPU implementations (e.g.,PredictPlanExecGPU
). - Extended test cases
Fixes #24 (closed).
Performance Comparison
5ce36ed2)
Old Performance (current main branch commitNew Performance
Edited by Wiebe van Breukelen