Merge/cleanup/fix CUDA W-Tiling code
The W-Tiling code of GenericOptimized
is moved to common/CUDA
and Generic
now uses the same code. The Unified Memory submode is added to this shared set of W-Tiling routines. All submodes are tested and where needed fixed. Some documentation is added to the constructors of Generic
and GenericOptiized
to explain the capabilities of the proxies.
Edited by Bram Veenboer