Remove limit on number of aterms in CUDA average beam kernel
Add an extra loop over blocks of size MAX_NR_ATERMS in cuda average beam.
Also fixes an error in accumulation of average beam
Add an extra loop over blocks of size MAX_NR_ATERMS in cuda average beam.
Also fixes an error in accumulation of average beam