Update CPU W-Tiling
Major update to CPU W-Tiling code:
- Combine the forward and backward code in a single source file
- Apply various optimizations
- Split functions into smaller sub-functions
- Expose the
wtiles_to_grid
andwtiles_from_grid
kernels inInstanceCPU
These changes make the code both faster and more readable. Furthermore, it allows for a better comparison against the CUDA W-Tiling code. These two codes now look more similar and share a number of optimizations.
Edited by Bram Veenboer