Minor performance fix for CoherentStokesKernel
It is not needed to initialize the output to zero with the time integration exactly equal to the number of threads in a warp.
It is not needed to initialize the output to zero with the time integration exactly equal to the number of threads in a warp.