Skip to content

Pre-allocate memory for transposed input

Bram Veenboer requested to merge optimize-allocation into main

The BeamFormerCCGKernel now allocates the cu::DeviceMemory for the transposed input in the constructor, rather than using asynchronous allocation and free on-demand. The asynchronous scheme is surprisingly slow, so this solution improves overall throughput at the cost of keeping some device memory allocated over the lifetime of the kernel class.

Merge request reports

Loading