Skip to content

Add support for extreme cases in QuantizeOutput kernel

Bram Veenboer requested to merge cob-90-kernel into cob-90

Due to pipeline-buffers.txt it was assumed that the CoherentStokes or IncoherentStokes kernel never produced more than 12288 samples per visibility. However, this is not true in 'extreme cases' (according to Sarod and Cees). The updated kernel uses the same shared memory buffer (for at most 12288 samples) and iterates over the input in batches if the input is larger than the batch size. The 'two-pass' implementation is removed in the process.

Furthermore, in case of a large number of channels (e.g. 512) and using all four Stokes parameters, the number of supported TABs is limited to 32 (due to CUDA not supporting grids larger than 65536 blocks). As a workaround, thread blocks are now assigned to stokes and TABs (not to channels) and iterate over all channels.

Edited by Bram Veenboer

Merge request reports

Loading