Update reduction
This merge request has many changes related to parallel reductions. These changes have now been separated to create this new merge request.
The main changes are:
- Move reduction code out of
QuantizeOutput.cuintoreduction.cuh - Make the
reduce_sumfunction more robust (e.g. correct regardless of block size) - Add
tReductionto test the functionality provided inreduction.cuh
Edited by Bram Veenboer