Skip to content

Improve performance consistency for the different quantization use cases

Bram Veenboer requested to merge cob-90-kernel into cob-90

Now that tQuantizeOutput tests various 'extreme cases', it became apparent that some cases took much longer to quantize than others. These changes fix that by moving to a different thread-block mapping and by additional finetuning of the launch configuration.

Merge request reports

Loading