Compare revisions

Sarod Yatawatta · Sarod Yatawatta · Sarod Yatawatta · Sarod Yatawatta · Sarod Yatawatta · Sarod Yatawatta
--- a/RTCP/Cobalt/CoInterface/src/Parset.cc
+++ b/RTCP/Cobalt/CoInterface/src/Parset.cc
@@ -1321,6 +1321,8 @@ namespace LOFAR
      // The correlator.enabled setting is used as default value for sap.correlatorEnabled
      // and thus has thus be set before addSAPs is called.
      settings.correlator.enabled = getBool("Observation.DataProducts.Output_Correlated.enabled", false);
+      // Doppler correction in correlator 
+      settings.correlator.dopplerCorrection= getBool("Cobalt.Correlator.dopplerCorrection", false);

      // Pointing information
      addSAPs(settings);

--- a/RTCP/Cobalt/CoInterface/src/Parset.h
+++ b/RTCP/Cobalt/CoInterface/src/Parset.h
@@ -508,6 +508,11 @@ namespace LOFAR
        // set to: subbandWidth() / nrChannels
        double channelWidth;

+        // Doppler correction 
+        //
+        // key: Cobalt.Correlator.dopplerCorrection, default false
+        bool dopplerCorrection;
+
        // The number of samples in one block of one channel.
        //
        // key: OLAP.CNProc.integrationSteps

--- a/RTCP/Cobalt/CoInterface/test/tParset.cc
+++ b/RTCP/Cobalt/CoInterface/test/tParset.cc
@@ -963,6 +963,20 @@ SUITE(correlator) {
    }
  }

+  TEST(dopplerCorrection) {
+    LOG_INFO("Test correlator dopplerCorrection");
+
+    Parset ps = makeDefaultTestParset();
+    
+    // set
+    ps.replace("Observation.DataProducts.Output_Correlated.enabled", "true");
+    ps.replace("Cobalt.Correlator.dopplerCorrection", "true");
+    ps.updateSettings();
+
+    // verify settings
+    CHECK_EQUAL(true, ps.settings.correlator.dopplerCorrection);
+  }
+
  TEST(nrSamplesPerChannel) {
    LOG_INFO("Test correlator nrSamplesPerChannel");


--- a/RTCP/Cobalt/GPUProc/doc/doppler-correction/Doppler.md
+++ b/RTCP/Cobalt/GPUProc/doc/doppler-correction/Doppler.md
+# Doppler correction
+This is a brief description of how Doppler correction is implemented in the FIR_Filter and subsequent DelayAndBandpass correction.
+
+
+Let the intrinsic sky signal be *g(t)* with its Fourier transform *G(f)*. Due to the Doppler shift, the measured signal will have a frequency shift *f_0* and in order to get back the orignal signal, we shift the measured signal Fourier transform as *G(f-f_0)*, which is done in continuous time by multiplication of *g(t)* with *exp(-j 2 pi f_0 t)*.
+
+<img src="phaseramp.png" width="900"/>
+
+The signal streaming to FIR_Filter is a discrete time signal, about 190k samples per one block of about 1 second duration (see the figure). The sampling frequency is *f_s* = clock frequency/1024.
+The delays at the start of the block of data and after the end of the block are *tau0* and *tau1*, respectively.
+
+Let *t* denote time within the block and *T* be the duration of the block, so *t* in *[0,T]*.
+The linear delay for the sample at time *t* is *tau=tau1 (t/T) + tau0(1-t/T)*. The corresponding discrete time exponential is *exp(-j 2 pi f tau)*. Discarding the constant terms (that do not vary with time), the exponential becomes  *exp(-j 2 pi f (tau1-tau0)/T t)*. The subband frequency is *f*.
+
+As seen in the figure, each block is divided to many FFT blocks, so within each block, we use the same delay gradient *(tau0-tau0)/T* to calculate the phase ramp. In other words, the constant part of the delay ramp is not used as it will only increase decorelation, not affecting the needed shift in frequency.
+
+To summarize, the exact term used at channel *chan* is *j 2 pi (f/f_s) (tau1-tau0)/NR_FFT_BLOCKS chan/NR_CHANNELS*. 
+
+Due to this correction, the shift in channels is *(tau1-tau0)/NR_FFT_BLOCKS (f/f_s)*. The corresponding shift in frequency is *(tau1-tau0)/NR_FFT_BLOCKS (f/NR_CHANNELS)*.
+
+The Doppler correction is not affected by the rotation of the dipoles, so both polarizations get the same correction (unless the delays for each polarization is different).
+The bandpass correction is modified by linear interpolation of the bandpass correction weights with the appropriate channel shift.
\ No newline at end of file
--- a/RTCP/Cobalt/GPUProc/doc/doppler-correction/phaseramp.png
+++ b/RTCP/Cobalt/GPUProc/doc/doppler-correction/phaseramp.png
--- a/RTCP/Cobalt/GPUProc/doc/quantization/Quantization.md
+++ b/RTCP/Cobalt/GPUProc/doc/quantization/Quantization.md
+This document describes the inner workings of the CUDA kernel used in quantizing the 32 bit input data into 8 bit data.
+The Parset keys used in quantization of beamformed data are
+
+For coherent Stokes:
+
+* `Cobalt.BeamFormer.CoherentStokes.quantize=false`
+* `Cobalt.BeamFormer.CoherentStokes.quantizeBits=8`
+* `Cobalt.BeamFormer.CoherentStokes.quantizeScaleMax=5`
+* `Cobalt.BeamFormer.CoherentStokes.quantizeScaleMin=-5`
+* `Cobalt.BeamFormer.CoherentStokes.quantizeIpositive=false`
+
+For incoherent Stokes:
+
+* `Cobalt.BeamFormer.IncoherentStokes.quantize=false`
+* `Cobalt.BeamFormer.IncoherentStokes.quantizeBits=8`
+* `Cobalt.BeamFormer.IncoherentStokes.quantizeScaleMax=5`
+* `Cobalt.BeamFormer.IncoherentStokes.quantizeScaleMin=-5`
+* `Cobalt.BeamFormer.IncoherentStokes.quantizeIpositive=false`
+
+The values for each key shown above are the default values (if that particular key is not defined in the parset). The description of the keys:
+
+* `.quantize=true|false`: If true, the output will be quantized (instead of using 32 bit float as output datatype, a reduced number of bits will be used).
+* `.quantizeBits=8`: Currently, 8 bits will be used to store each quantized data point. This implies the output data type will be signed char (int8_t) or unsigned char (uint8_t). In addtion, scale and offset of each data block will also be produced as 32 bit float values.
+* `.quantizeScaleMax` and .`quantizeScaleMin`: These two keys will be used to cut off extreme data points above or below a threshold, prior to quantization.
+* `.quantizeIpositive=true|false`: If `quantizeScaleMin` is negative, the usable range for Stokes I will also include some values below zero. However, by definition, Stokes I is always positive. By setting this key to true, we can override this to only consider the positive range of Stokes I for quantization. In this way, the available number of bits in the quantizer are not wasted by representing values that do not exist in the orignal data.
+
+Let us call the values defined by `quantizeScaleMax` and `quantizeScaleMin` as `Smax` and `Smin`, respectively. 
+
+<img src="quantization.png" alt="How quantization works" width="700"/>
+
+We can explain the workings of the quantization kernel by looking at the probability density function (PDF) from the input 32 bit data to output 8 bit data as shown in the above figure. 
+
+  * The input PDF will have data in the range -inf...inf with one exception, for Stokes I, it will be 0...inf. (The notation inf represents infinity). This is shown on the top plot.
+  * Using the user defined *Smax* and *Smin*. the range of the data to be quantized is selected. In most cases it is the range between *Smin*&times;&sigma; and *Smax*&times;&sigma;. (&sigma; is the standard deviation). The user can specify the best values for *Smax* and *Smin* depending on how the input data are distributed (i.e., by looking at the PDF). The input data that fall outside this range are cut-off as show in in the middle plot. In addition the mean (&mu;) of the original data will be subtracted in the cases of Stokes Q,U,V. 
+  * Finally, the original range is mapped to values in the range 0 to 255 (for unsigned data, i.e., Stokes I) or in the range -128 to 127 (for signed data) as shown in the bottom plot. The output PDF will be the area-sampled version of the input PDF (after cutoff) as shown in this plot.
+  * A special case is quantization of Stokes I, because the original data have values in 0...inf. If the user-specified *Smin* value is negative, some useful number of quantized levels will be used to represent negative data that does not exist. Therefore, in this case, the user supplied *Smin* value will be over-ridden to use 0 instead. This behaviour is enabled by setting `.quantizeIpositive=true`.
+
+In order to (approximately) recover the original data from the quantized data, the scale and offset that are used to transform the data to the ranges 0 to 255 or -128 to 127  are also produced as output. The scale and offset are determined per each block of data (duration approximately 1 sec) and per each channel and polarization. The number of data samples per each block is dependent on the integration factor and the sampling clock frequency.
--- a/RTCP/Cobalt/GPUProc/doc/quantization/quantization.png
+++ b/RTCP/Cobalt/GPUProc/doc/quantization/quantization.png
--- a/RTCP/Cobalt/GPUProc/share/gpu/kernels/DelayAndBandPass.cu
+++ b/RTCP/Cobalt/GPUProc/share/gpu/kernels/DelayAndBandPass.cu
@@ -45,6 +45,8 @@
 * - @c DELAY_COMPENSATION: defined or not
 * - @c BANDPASS_CORRECTION: defined or not
 * - @c DO_TRANSPOSE: defined or not
+* - @c DOPPLER_CORRECTION: if defined, DELAY_COMPENSATION and CLOCK_MHZ also must be defined
+* - @c CLOCK_MHZ: clock frequency in MHz, normally 200 or 160 
 */

 #include "gpu_math.cuh"
@@ -62,6 +64,15 @@
 #  undef BANDPASS_CORRECTION
 #endif

+#if defined DOPPLER_CORRECTION   
+#ifndef CLOCK_MHZ
+#error DOPPLER_CORRECTION=1 but CLOCK_MHZ not defined
+#endif
+#ifndef DELAY_COMPENSATION
+#error DOPPLER_CORRECTION=1 but DELAY_COMPENSATION not enabled
+#endif
+#endif
+
 #if defined DO_TRANSPOSE
 typedef  fcomplex(*OutputDataType)[NR_STATIONS][NR_CHANNELS][NR_SAMPLES_PER_CHANNEL][NR_POLARIZATIONS];
 #else
@@ -128,7 +139,7 @@ extern "C" {
    const fcomplex * filteredDataPtr,
    const unsigned * delayIndices,
    double subbandFrequency,
-    unsigned beam,
+    unsigned beam, // =nrSAPS
    const double * delaysAtBeginPtr,
    const double * delaysAfterEndPtr,
    const double * phase0sPtr,
@@ -139,7 +150,9 @@ extern "C" {

    /* The z dimension is NR_STATIONS wide. */
    const unsigned station = blockIdx.z * blockDim.z + threadIdx.z;
+#if defined DELAY_COMPENSATION
    const unsigned delayIdx = delayIndices[station];
+#endif

    /*
     * channel: will cover all channels
@@ -173,10 +186,12 @@ extern "C" {
 #endif

 #if defined BANDPASS_CORRECTION
+#ifndef DOPPLER_CORRECTION
    BandPassFactorsType bandPassFactors = (BandPassFactorsType)bandPassFactorsPtr;

    float weight((*bandPassFactors)[channel]);
 #endif
+#endif

 #if defined DELAY_COMPENSATION
    DelaysType delaysAtBegin = (DelaysType)delaysAtBeginPtr;
@@ -222,9 +237,38 @@ extern "C" {
    // Calculate the angles to rotate for for the first and (beyond the) last sample.
    //
    // We need to undo the delay, so we rotate BACK, resulting in a negative constant factor.
+#if defined DOPPLER_CORRECTION   
+    const double2 freqOffset=(( delayAfterEnd - delayAtBegin ))*((subbandFrequency / NR_CHANNELS)/(NR_SAMPLES_PER_CHANNEL)); //divide this with (CLOCK_MHZ*1e6/1024.0)/NR_CHANNELS to get channel offset 
+
+    // Since Doppler correction has already been applied at subbandFrequency,
+    // we shift frequencies
+    const double2 phiAtBegin = make_double2(-2.0 * M_PI * (frequency+freqOffset.x) * delayAtBegin.x - ((*phase0s)[delayIdx]).x,
+            -2.0 * M_PI * (frequency+freqOffset.y) * delayAtBegin.y - ((*phase0s)[delayIdx]).y);
+    const double2 phiAfterEnd = make_double2(-2.0 * M_PI * (frequency+freqOffset.x) * delayAfterEnd.x - ((*phase0s)[delayIdx]).x,
+            -2.0 * M_PI * (frequency+freqOffset.y) * delayAfterEnd.y - ((*phase0s)[delayIdx]).y);
+
+#if defined BANDPASS_CORRECTION
+    BandPassFactorsType bandPassFactors = (BandPassFactorsType)bandPassFactorsPtr;
+
+    // positive offset means moving to right ->->
+    const double2 chanOffset=freqOffset*(NR_CHANNELS/(CLOCK_MHZ*1e6/1024.0)); //mult this with (CLOCK_MHZ*1e6/1024.0)/NR_CHANNELS to get freq
+
+    const float2 chanShifted={chanOffset.x+channel,chanOffset.y+channel};
+    unsigned chanlow[2]={__float2uint_rd(chanShifted.x),__float2uint_rd(chanShifted.y)};
+    // check and adjust to valid range
+    chanlow[0]=(chanlow[0]>NR_CHANNELS-1?NR_CHANNELS-1:chanlow[0]);
+    chanlow[1]=(chanlow[1]>NR_CHANNELS-1?NR_CHANNELS-1:chanlow[1]);
+    const unsigned chanhigh[2]={(chanlow[0]<NR_CHANNELS-1?chanlow[0]+1:chanlow[0]),
+                          (chanlow[1]<NR_CHANNELS-1?chanlow[1]+1:chanlow[1])};
+    const float2 w1={chanShifted.x-chanlow[0],chanShifted.y-chanlow[1]};
+    float2 weight=make_float2((*bandPassFactors)[chanlow[0]]*(1.0f-w1.x)+(*bandPassFactors)[chanhigh[0]]*w1.x,
+               (*bandPassFactors)[chanlow[1]]*(1.0f-w1.y)+(*bandPassFactors)[chanhigh[1]]*w1.y);
+#endif
+#else
    const double2 phiAtBegin = -2.0 * M_PI * frequency * delayAtBegin - (*phase0s)[delayIdx];
    const double2 phiAfterEnd = -2.0 * M_PI * frequency * delayAfterEnd - (*phase0s)[delayIdx];
-#endif
+#endif // DOPPLER_CORRECTION
+#endif // DELAY_COMPENSATION

    for (unsigned time = timeStart; time < NR_SAMPLES_PER_CHANNEL; time += timeInc)
    {
@@ -250,10 +294,17 @@ extern "C" {
 #endif

 #if defined BANDPASS_CORRECTION
+#if defined DOPPLER_CORRECTION   
+      sampleX.x *= weight.x;
+      sampleX.y *= weight.x;
+      sampleY.x *= weight.y;
+      sampleY.y *= weight.y;
+#else
      sampleX.x *= weight;
      sampleX.y *= weight;
      sampleY.x *= weight;
      sampleY.y *= weight;
+#endif
 #endif

      // Support all variants of NR_CHANNELS and DO_TRANSPOSE for testing etc.

--- a/RTCP/Cobalt/GPUProc/share/gpu/kernels/FIR_Filter.cu
+++ b/RTCP/Cobalt/GPUProc/share/gpu/kernels/FIR_Filter.cu
@@ -71,8 +71,14 @@ typedef float SampleType;
 typedef signed char (*SampledDataType)[NR_STABS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][NR_POLARIZATIONS];
 #define SAMPLE(time) extractRI((*sampledData)[station][time][channel][pol], ri)
 #  else
+#ifndef DOPPLER_CORRECTION
 typedef SampleType (*SampledDataType)[NR_STABS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][NR_POLARIZATIONS * COMPLEX];
 #define SAMPLE(time) (*sampledData)[station][time][channel][pol_ri]
+#else //DOPPLER_CORRECTION
+typedef SampleType (*SampledDataType)[NR_STABS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][NR_POLARIZATIONS][COMPLEX];
+#define REAL(time) convertIntToFloat((*sampledData)[station][time][channel][pol][0])
+#define IMAG(time) convertIntToFloat((*sampledData)[station][time][channel][pol][1])
+#endif //DOPPLER_CORRECTION
 #endif

 #else
@@ -88,10 +94,38 @@ inline __device__ float convertIntToFloat(float x)
 }
 #endif

+// subbandIdx=1 is assumed below
+#ifdef DOPPLER_CORRECTION
+typedef float (*HistoryDataType)[1][NR_STABS][NR_TAPS - 1][NR_CHANNELS][NR_POLARIZATIONS * COMPLEX];
+#else
 typedef SampleType (*HistoryDataType)[1][NR_STABS][NR_TAPS - 1][NR_CHANNELS][NR_POLARIZATIONS * COMPLEX];
+#endif
 typedef float (*FilteredDataType)[NR_STABS][NR_POLARIZATIONS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][COMPLEX];
 typedef const float (*WeightsType)[NR_CHANNELS][NR_TAPS];

+#ifdef DOPPLER_CORRECTION
+// Check if CLOCK_MHZ is also defined
+#ifndef CLOCK_MHZ
+#error DOPPLER_CORRECTION=1 but CLOCK_MHZ not defined
+#endif
+
+// this is faster than doing "pol ? sin(phi) : cos(phi)"
+// because that statement forces CUDA to still compute both
+// as GPUs always compute both branches.
+inline __device__ float2 sincos_d2f_select(double phi, int ri)
+{
+  double r[2];
+
+  sincos(phi, &r[1], &r[0]);
+  float c=__double2float_rz(r[0]);
+  float s=__double2float_rz(r[1]);
+  // return: ri==0 (cos,-sin) ri==1 (sin,cos)
+  return make_float2((ri?s:c),(ri?c:-s));
+}
+
+typedef  const double(*DelaysType)[1][NR_STABS][NR_POLARIZATIONS]; // 2 Polarizations; in seconds
+typedef  const double(*Phase0sType)[NR_STABS][NR_POLARIZATIONS]; // 2 Polarizations; in radians
+#endif /* DOPPLER_CORRECTION */

 /*!
 * Applies the Finite Input Response filter defined by the weightsPtr array
@@ -122,7 +156,8 @@ typedef const float (*WeightsType)[NR_CHANNELS][NR_TAPS];
 * COMPLEX                 | 2                           | size of complex in number of floats/doubles
 * INPUT_IS_STATIONDATA    | defined or not              | if true, input is intX[stabs][samples][pol]
 *                         |                             | if false, input is float[stabs][pol][samples]
- *
+ * DOPPLER_CORRECTION      | defined or not              | If true, apply subband-based delay compensation
+ * CLOCK_MHZ               | >0 clock Freq in MHz        | Must be defined with DOPPLER_CORRECTION
 * Execution configuration: (TODO: enforce using __attribute__ reqd_work_group_size)
 * - Work dim == 2  (can be 1 iff NR_STABS == 1)
 *     + Inner dim: the channel, pol, real/imag the thread processes
@@ -137,7 +172,10 @@ __global__ void FIR_filter( void *filteredDataPtr,
                            const void *sampledDataPtr,
                            const void *weightsPtr,
                            void *historyDataPtr,
-                            unsigned subbandIdx)
+                            const double * delaysAtBeginPtr,
+                            const double * delaysAfterEndPtr,
+                            unsigned subbandIdx,
+                            double subbandFrequency)
 {
  SampledDataType sampledData = (SampledDataType) sampledDataPtr;
  FilteredDataType filteredData = (FilteredDataType) filteredDataPtr;
@@ -159,6 +197,38 @@ __global__ void FIR_filter( void *filteredDataPtr,
 #endif
  unsigned station = blockIdx.y;

+#ifdef DOPPLER_CORRECTION
+  DelaysType delaysAtBegin = (DelaysType)delaysAtBeginPtr;
+  DelaysType delaysAfterEnd = (DelaysType)delaysAfterEndPtr;
+  const double delayAtBegin = (*delaysAtBegin)[0][station][pol];
+  const double delayAfterEnd = (*delaysAfterEnd)[0][station][pol];
+
+  // Calculate the angles to rotate for for the first and (beyond the) last sample.
+  //
+  // We need to undo the delay, so we rotate BACK, resulting in a negative constant factor.
+  // Note we use the sample frequency=CLOCK_MHZ/1024 (MHz) to normalize the frequency
+  // Let tau_0: delay at begin, tau_1: delay after end 
+  // t : time within a block, T: total time (duration) of block,
+  // So, t in [0,T] and  t/T=blockoffset in [0,1]
+  // then delay at t = tau_1 (t/T) + tau_0 (1 - t/T)
+  // exponent at frqeuency f = -j 2 pi (f tau) 
+  // simplifying, exponent = -j 2 pi f tau_0 - j 2 pi f (tau_1 -tau_0) (t/T)
+  // We only need the term that changes with t, so discard the rest
+  // and only keep : 2 pi f (tau_1 - tau_0) (t/T) for the rest of calculations
+  // also replace f with f/f_s where f_s is sample frequency = clock/1024
+  // phi = phiGradient x blockOffset
+  // Offset of this sample between begin and end. = t/T fraction, within one FFT block of NR_CHANNELS
+  const double phi = 2.0 * M_PI * (subbandFrequency / (CLOCK_MHZ*1e6/1024.0) )*( delayAfterEnd - delayAtBegin )/NR_SAMPLES_PER_CHANNEL * double(channel)/NR_CHANNELS;
+
+  //
+  // Use double precision here, when phi~=0, error in cos() is minimum
+  // but error in sin() is highest, and will affect most baselines (whose Doppler effect ~=0)
+  // Note: both X and Y polarizations will have same correction
+  // so real=cos(phi), imag=sin(phi) correction factor
+  // phi=phiGradient * blockOffset
+  const float2 FACTOR=sincos_d2f_select(phi, ri);
+#endif
+
  //# const float16 weights = (*weightsData)[channel];
  const float weights_s0 = (*weightsData)[channel][0];
  const float weights_s1 = (*weightsData)[channel][1];
@@ -183,6 +253,23 @@ __global__ void FIR_filter( void *filteredDataPtr,
        delayLine_s8, delayLine_s9, delayLine_sA, delayLine_sB,
        delayLine_sC, delayLine_sD, delayLine_sE, delayLine_sF;
  
+#if defined DOPPLER_CORRECTION
+  delayLine_s0 = ((*historyData)[subbandIdx][station][0][channel][pol_ri]);
+  delayLine_s1 = ((*historyData)[subbandIdx][station][1][channel][pol_ri]);
+  delayLine_s2 = ((*historyData)[subbandIdx][station][2][channel][pol_ri]);
+  delayLine_s3 = ((*historyData)[subbandIdx][station][3][channel][pol_ri]);
+  delayLine_s4 = ((*historyData)[subbandIdx][station][4][channel][pol_ri]);
+  delayLine_s5 = ((*historyData)[subbandIdx][station][5][channel][pol_ri]);
+  delayLine_s6 = ((*historyData)[subbandIdx][station][6][channel][pol_ri]);
+  delayLine_s7 = ((*historyData)[subbandIdx][station][7][channel][pol_ri]);
+  delayLine_s8 = ((*historyData)[subbandIdx][station][8][channel][pol_ri]);
+  delayLine_s9 = ((*historyData)[subbandIdx][station][9][channel][pol_ri]);
+  delayLine_sA = ((*historyData)[subbandIdx][station][10][channel][pol_ri]);
+  delayLine_sB = ((*historyData)[subbandIdx][station][11][channel][pol_ri]);
+  delayLine_sC = ((*historyData)[subbandIdx][station][12][channel][pol_ri]);
+  delayLine_sD = ((*historyData)[subbandIdx][station][13][channel][pol_ri]);
+  delayLine_sE = ((*historyData)[subbandIdx][station][14][channel][pol_ri]);
+#else
  delayLine_s0 = convertIntToFloat((*historyData)[subbandIdx][station][0][channel][pol_ri]);
  delayLine_s1 = convertIntToFloat((*historyData)[subbandIdx][station][1][channel][pol_ri]);
  delayLine_s2 = convertIntToFloat((*historyData)[subbandIdx][station][2][channel][pol_ri]);
@@ -198,6 +285,7 @@ __global__ void FIR_filter( void *filteredDataPtr,
  delayLine_sC = convertIntToFloat((*historyData)[subbandIdx][station][12][channel][pol_ri]);
  delayLine_sD = convertIntToFloat((*historyData)[subbandIdx][station][13][channel][pol_ri]);
  delayLine_sE = convertIntToFloat((*historyData)[subbandIdx][station][14][channel][pol_ri]);
+#endif

  float sum_s0, sum_s1, sum_s2, sum_s3,
        sum_s4, sum_s5, sum_s6, sum_s7,
@@ -206,9 +294,21 @@ __global__ void FIR_filter( void *filteredDataPtr,

  for (unsigned time = 0; time < NR_SAMPLES_PER_CHANNEL; time += NR_TAPS) 
  {
+
+#if defined DOPPLER_CORRECTION
+    //(X,Y): sample real,imag parts, output a X + b Y
+    // FACTOR (cos,sin) = (c,s)
+    // ri=0 : a=cos, b=-sin ; ri=1: a=sin, b=cos
+    delayLine_sF = REAL(time + 0)*FACTOR.x+IMAG(time + 0)*FACTOR.y;
+#else
    delayLine_sF = convertIntToFloat(SAMPLE(time + 0));
+#endif
    sum_s0 = weights_sF * delayLine_s0;
+#if defined DOPPLER_CORRECTION
+    delayLine_s0 = REAL(time + 1)*FACTOR.x+IMAG(time + 1)*FACTOR.y;
+#else
    delayLine_s0 = convertIntToFloat(SAMPLE(time + 1));
+#endif
    sum_s0 += weights_sE * delayLine_s1;
    sum_s0 += weights_sD * delayLine_s2;
    sum_s0 += weights_sC * delayLine_s3;
@@ -227,7 +327,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 0][channel][ri] = sum_s0;

    sum_s1 = weights_sF * delayLine_s1;
+#if defined DOPPLER_CORRECTION
+    delayLine_s1 = REAL(time + 2)*FACTOR.x+IMAG(time + 2)*FACTOR.y;
+#else
    delayLine_s1 = convertIntToFloat(SAMPLE(time + 2));
+#endif
    sum_s1 += weights_sE * delayLine_s2;
    sum_s1 += weights_sD * delayLine_s3;
    sum_s1 += weights_sC * delayLine_s4;
@@ -246,7 +350,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 1][channel][ri] = sum_s1;

    sum_s2 = weights_sF * delayLine_s2;
+#if defined DOPPLER_CORRECTION
+    delayLine_s2 = REAL(time + 3)*FACTOR.x+IMAG(time + 3)*FACTOR.y;
+#else
    delayLine_s2 = convertIntToFloat(SAMPLE(time + 3));
+#endif
    sum_s2 += weights_sE * delayLine_s3;
    sum_s2 += weights_sD * delayLine_s4;
    sum_s2 += weights_sC * delayLine_s5;
@@ -265,7 +373,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 2][channel][ri] = sum_s2;

    sum_s3 = weights_sF * delayLine_s3;
+#if defined DOPPLER_CORRECTION
+    delayLine_s3 = REAL(time + 4)*FACTOR.x+IMAG(time + 4)*FACTOR.y;
+#else
    delayLine_s3 = convertIntToFloat(SAMPLE(time + 4));
+#endif
    sum_s3 += weights_sE * delayLine_s4;
    sum_s3 += weights_sD * delayLine_s5;
    sum_s3 += weights_sC * delayLine_s6;
@@ -284,7 +396,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 3][channel][ri] = sum_s3;

    sum_s4 = weights_sF * delayLine_s4;
+#if defined DOPPLER_CORRECTION
+    delayLine_s4 = REAL(time + 5)*FACTOR.x+IMAG(time + 5)*FACTOR.y;
+#else
    delayLine_s4 = convertIntToFloat(SAMPLE(time + 5));
+#endif
    sum_s4 += weights_sE * delayLine_s5;
    sum_s4 += weights_sD * delayLine_s6;
    sum_s4 += weights_sC * delayLine_s7;
@@ -303,7 +419,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 4][channel][ri] = sum_s4;

    sum_s5 = weights_sF * delayLine_s5;
+#if defined DOPPLER_CORRECTION
+    delayLine_s5 = REAL(time + 6)*FACTOR.x+IMAG(time + 6)*FACTOR.y;
+#else
    delayLine_s5 = convertIntToFloat(SAMPLE(time + 6));
+#endif
    sum_s5 += weights_sE * delayLine_s6;
    sum_s5 += weights_sD * delayLine_s7;
    sum_s5 += weights_sC * delayLine_s8;
@@ -322,7 +442,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 5][channel][ri] = sum_s5;

    sum_s6 = weights_sF * delayLine_s6;
+#if defined DOPPLER_CORRECTION
+    delayLine_s6 = REAL(time + 7)*FACTOR.x+IMAG(time + 7)*FACTOR.y;
+#else
    delayLine_s6 = convertIntToFloat(SAMPLE(time + 7));
+#endif
    sum_s6 += weights_sE * delayLine_s7;
    sum_s6 += weights_sD * delayLine_s8;
    sum_s6 += weights_sC * delayLine_s9;
@@ -341,7 +465,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 6][channel][ri] = sum_s6;

    sum_s7 = weights_sF * delayLine_s7;
+#if defined DOPPLER_CORRECTION
+    delayLine_s7 = REAL(time + 8)*FACTOR.x+IMAG(time + 8)*FACTOR.y;
+#else
    delayLine_s7 = convertIntToFloat(SAMPLE(time + 8));
+#endif
    sum_s7 += weights_sE * delayLine_s8;
    sum_s7 += weights_sD * delayLine_s9;
    sum_s7 += weights_sC * delayLine_sA;
@@ -360,7 +488,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 7][channel][ri] = sum_s7;

    sum_s8 = weights_sF * delayLine_s8;
+#if defined DOPPLER_CORRECTION
+    delayLine_s8 = REAL(time + 9)*FACTOR.x+IMAG(time + 9)*FACTOR.y;
+#else
    delayLine_s8 = convertIntToFloat(SAMPLE(time + 9));
+#endif
    sum_s8 += weights_sE * delayLine_s9;
    sum_s8 += weights_sD * delayLine_sA;
    sum_s8 += weights_sC * delayLine_sB;
@@ -379,7 +511,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 8][channel][ri] = sum_s8;

    sum_s9 = weights_sF * delayLine_s9;
+#if defined DOPPLER_CORRECTION
+    delayLine_s9 = REAL(time + 10)*FACTOR.x+IMAG(time + 10)*FACTOR.y;
+#else
    delayLine_s9 = convertIntToFloat(SAMPLE(time + 10));
+#endif
    sum_s9 += weights_sE * delayLine_sA;
    sum_s9 += weights_sD * delayLine_sB;
    sum_s9 += weights_sC * delayLine_sC;
@@ -398,7 +534,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 9][channel][ri] = sum_s9;

    sum_sA = weights_sF * delayLine_sA;
+#if defined DOPPLER_CORRECTION
+    delayLine_sA = REAL(time + 11)*FACTOR.x+IMAG(time + 11)*FACTOR.y;
+#else
    delayLine_sA = convertIntToFloat(SAMPLE(time + 11));
+#endif
    sum_sA += weights_sE * delayLine_sB;
    sum_sA += weights_sD * delayLine_sC;
    sum_sA += weights_sC * delayLine_sD;
@@ -417,7 +557,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 10][channel][ri] = sum_sA;

    sum_sB = weights_sF * delayLine_sB;
+#if defined DOPPLER_CORRECTION
+    delayLine_sB = REAL(time + 12)*FACTOR.x+IMAG(time + 12)*FACTOR.y;
+#else
    delayLine_sB = convertIntToFloat(SAMPLE(time + 12));
+#endif
    sum_sB += weights_sE * delayLine_sC;
    sum_sB += weights_sD * delayLine_sD;
    sum_sB += weights_sC * delayLine_sE;
@@ -436,7 +580,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 11][channel][ri] = sum_sB;

    sum_sC = weights_sF * delayLine_sC;
+#if defined DOPPLER_CORRECTION
+    delayLine_sC = REAL(time + 13)*FACTOR.x+IMAG(time + 13)*FACTOR.y;
+#else
    delayLine_sC = convertIntToFloat(SAMPLE(time + 13));
+#endif
    sum_sC += weights_sE * delayLine_sD;
    sum_sC += weights_sD * delayLine_sE;
    sum_sC += weights_sC * delayLine_sF;
@@ -455,7 +603,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 12][channel][ri] = sum_sC;

    sum_sD = weights_sF * delayLine_sD;
+#if defined DOPPLER_CORRECTION
+    delayLine_sD = REAL(time + 14)*FACTOR.x+IMAG(time + 14)*FACTOR.y;
+#else
    delayLine_sD = convertIntToFloat(SAMPLE(time + 14));
+#endif
    sum_sD += weights_sE * delayLine_sE;
    sum_sD += weights_sD * delayLine_sF;
    sum_sD += weights_sC * delayLine_s0;
@@ -474,7 +626,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 13][channel][ri] = sum_sD;

    sum_sE = weights_sF * delayLine_sE;
+#if defined DOPPLER_CORRECTION
+    delayLine_sE = REAL(time + 15)*FACTOR.x+IMAG(time + 15)*FACTOR.y;
+#else
    delayLine_sE = convertIntToFloat(SAMPLE(time + 15));
+#endif
    sum_sE += weights_sE * delayLine_sF;
    sum_sE += weights_sD * delayLine_s0;
    sum_sE += weights_sC * delayLine_s1;
@@ -511,10 +667,16 @@ __global__ void FIR_filter( void *filteredDataPtr,
    (*filteredData)[station][pol][time + 15][channel][ri] = sum_sF;
  }

-  for (unsigned time = 0; time < NR_TAPS - 1; time++)
+  for (unsigned tap= 0; tap < NR_TAPS - 1; tap++)
  {
-    (*historyData)[subbandIdx][station][time][channel][pol_ri] =
-      SAMPLE(NR_SAMPLES_PER_CHANNEL - (NR_TAPS - 1) + time);
+#if defined DOPPLER_CORRECTION 
+    const unsigned time = NR_SAMPLES_PER_CHANNEL - (NR_TAPS - 1) + tap;
+    (*historyData)[subbandIdx][station][tap][channel][pol_ri] = // subbandIdx=1 assumed here
+      REAL(time)*FACTOR.x+IMAG(time)*FACTOR.y;
+#else
+    (*historyData)[subbandIdx][station][tap][channel][pol_ri] = // subbandIdx=1 assumed here
+      SAMPLE(NR_SAMPLES_PER_CHANNEL - (NR_TAPS - 1) + tap);
+#endif
  }
 }
 }
--- a/RTCP/Cobalt/GPUProc/src/KernelParameters.cc
+++ b/RTCP/Cobalt/GPUProc/src/KernelParameters.cc
@@ -34,7 +34,8 @@ namespace LOFAR
      nrBitsPerSample(obsSettings.nrBitsPerSample),
      blockSize(obsSettings.blockSize),
      subbandWidth(obsSettings.subbandWidth()),
-      subbands(obsSettings.subbands)
+      subbands(obsSettings.subbands),
+      clockMHz(obsSettings.clockMHz)
    {
      ASSERT(obsSettings.antennaFieldNames.size() == obsSettings.antennaFields.size());
    }
@@ -69,4 +70,4 @@ namespace LOFAR
    }

  } // end namespace Cobalt
-} // end namespace LOFAR
\ No newline at end of file
+} // end namespace LOFAR
--- a/RTCP/Cobalt/GPUProc/src/KernelParameters.h
+++ b/RTCP/Cobalt/GPUProc/src/KernelParameters.h
@@ -54,6 +54,7 @@ namespace LOFAR
        unsigned blockSize;
        double subbandWidth;
        std::vector<ObservationSettings::Subband> subbands;
+        unsigned clockMHz;
      };

      Observation observation;
@@ -120,4 +121,4 @@ namespace LOFAR
  } // end namespace Cobalt
 } // end namespace LOFAR

-#endif
\ No newline at end of file
+#endif
--- a/RTCP/Cobalt/GPUProc/src/Kernels/DelayAndBandPassKernel.cc
+++ b/RTCP/Cobalt/GPUProc/src/Kernels/DelayAndBandPassKernel.cc
@@ -51,12 +51,14 @@ namespace LOFAR
      unsigned nrBitsPerSample_,
      unsigned nrChannels_,
      unsigned nrSamplesPerChannel_,
+      unsigned clockMHz_,
      double subbandBandwidth_,
      unsigned nrSAPs_,
      bool correlator_,
      bool delayCompensation_,
      bool correctBandPass_,
      bool transpose_,
+      bool dopplerCorrection_,
      bool dumpBuffers_,
      std::string dumpFilePattern_) :
      Kernel::Parameters(correlator_ ? "delayAndBandPass" : "delayCompensation"),
@@ -66,11 +68,13 @@ namespace LOFAR
      nrBitsPerSample(nrBitsPerSample_),
      nrChannels(nrChannels_),
      nrSamplesPerChannel(nrSamplesPerChannel_),
+      clockMHz(clockMHz_),
      subbandBandwidth(subbandBandwidth_),
      nrSAPs(nrSAPs_),
      delayCompensation(delayCompensation_),
      correctBandPass(correctBandPass_),
-      transpose(transpose_)
+      transpose(transpose_),
+      dopplerCorrection(dopplerCorrection_)
    {
      if (correlator_) {
        // Use identity mappnig for station indices
@@ -110,7 +114,7 @@ namespace LOFAR
          delayIndices.size() * sizeof delayIndices[0];
      case DelayAndBandPassKernel::DELAYS:
        return 
-          (size_t) nrSAPs * nrDelays * 
+          (size_t) nrSAPs * nrDelays *  
            NR_POLARIZATIONS * sizeof(double);
      case DelayAndBandPassKernel::PHASE_ZEROS:
        return
@@ -225,6 +229,11 @@ namespace LOFAR
      if (itsParameters.transpose)
        defs["DO_TRANSPOSE"] = "1";

+      if (itsParameters.dopplerCorrection) {
+        defs["DOPPLER_CORRECTION"] = "1";
+        defs["CLOCK_MHZ"] = lexical_cast<string>(itsParameters.clockMHz);
+      }
+
      return defs;
    }
  }

--- a/RTCP/Cobalt/GPUProc/src/Kernels/DelayAndBandPassKernel.h
+++ b/RTCP/Cobalt/GPUProc/src/Kernels/DelayAndBandPassKernel.h
@@ -62,12 +62,14 @@ namespace LOFAR
          unsigned nrBitsPerSample,
          unsigned nrChannels,
          unsigned nrSamplesPerChannel,
+          unsigned clockMHz,
          double subbandBandwidth,
          unsigned nrSAPs,
          bool correlator,
          bool delayCompensation,
          bool correctBandPass,
          bool transpose,
+          bool dopplerCorrection,
          bool dumpBuffers = false,
          std::string dumpFilePattern = "");

@@ -78,6 +80,9 @@ namespace LOFAR

        unsigned nrChannels;
        unsigned nrSamplesPerChannel;
+        // Clock freq used to calculate input samplie freq=clockMHz/1024 MHz
+        // for Doppler correction
+        unsigned clockMHz;
        double subbandBandwidth;

        unsigned nrSAPs;
@@ -85,6 +90,10 @@ namespace LOFAR
        bool delayCompensation;
        bool correctBandPass;
        bool transpose;
+        // if true, 
+        // Doppler correction has ALREADY been applied in the FIR_Filter,
+        // so only incremental DelayCompensation and BandPass correction is done
+        bool dopplerCorrection;

        unsigned nrSamplesPerSubband() const;
        unsigned nrBytesPerComplexSample() const;

--- a/RTCP/Cobalt/GPUProc/src/Kernels/FIR_FilterKernel.cc
+++ b/RTCP/Cobalt/GPUProc/src/Kernels/FIR_FilterKernel.cc
@@ -47,9 +47,11 @@ namespace LOFAR
      unsigned nrSTABs,
      unsigned nrBitsPerSample,
      bool inputIsStationData,
+      bool dopplerCorrection,
      unsigned nrSubbands,
      unsigned nrChannels,
      unsigned nrSamplesPerChannel,
+      unsigned clockMHz,
      float scaleFactor,
      const std::string &name,
      const bool dumpBuffers_,
@@ -62,9 +64,12 @@ namespace LOFAR
      nrSamplesPerChannel(nrSamplesPerChannel),

      nrSubbands(nrSubbands),
+      clockMHz(clockMHz),
      scaleFactor(scaleFactor),
-      inputIsStationData(inputIsStationData)
+      inputIsStationData(inputIsStationData),
+      dopplerCorrection(dopplerCorrection)
    {
+      ASSERTSTR(dopplerCorrection?inputIsStationData:true,"Doppler correction only works if inputIsStationData=true");
      dumpBuffers = dumpBuffers_;
      dumpFilePattern = dumpFilePattern_;
    }
@@ -106,10 +111,16 @@ namespace LOFAR
            sizeof(float);
      case FIR_FilterKernel::HISTORY_DATA:
        // History is split over 2 bytes in 4-bit mode, to avoid unnecessary packing/unpacking
+        // If Doppler corr. enabled, history is a float buffer
        return
          (size_t) nrSubbands *
            nrHistorySamples() * nrSTABs * 
-            NR_POLARIZATIONS * (nrBitsPerSample == 4 ? 2U : nrBytesPerComplexSample());
+            NR_POLARIZATIONS * (dopplerCorrection? sizeof(std::complex<float>) 
+                : (nrBitsPerSample == 4 ? 2U : nrBytesPerComplexSample()));
+      case FIR_FilterKernel::DELAYS:
+        return (dopplerCorrection?
+          (size_t) 1 * nrSTABs * // nrSAPs=1 here
+            NR_POLARIZATIONS * sizeof(double) : 0);
      default:
        THROW(GPUProcException, "Invalid bufferType (" << bufferType << ")");
      }
@@ -124,12 +135,16 @@ namespace LOFAR
      h_filterWeights(stream.getContext(), params.bufferSize(FILTER_WEIGHTS)),
      d_filterWeights(stream.getContext(), params.bufferSize(FILTER_WEIGHTS)),
      historySamples(stream.getContext(), params.bufferSize(HISTORY_DATA)),
-      historyFlags(boost::extents[params.nrSubbands][params.nrSTABs])
+      historyFlags(boost::extents[params.nrSubbands][params.nrSTABs]),
+      delaysAtBegin(stream.getContext(), params.bufferSize(DELAYS)),
+      delaysAfterEnd(stream.getContext(), params.bufferSize(DELAYS))
    {
      setArg(0, buffers.output);
      setArg(1, buffers.input);
      setArg(2, d_filterWeights);
      setArg(3, historySamples);
+      setArg(4, delaysAtBegin);
+      setArg(5, delaysAfterEnd);

      unsigned totalNrThreads = params.nrChannels * NR_POLARIZATIONS * 2;
      unsigned nrPasses = ceilDiv(totalNrThreads, maxThreadsPerBlock);
@@ -173,9 +188,10 @@ namespace LOFAR
    }

    void FIR_FilterKernel::enqueue(const BlockID &blockId,
-                                   unsigned subbandIdx)
+                                   unsigned subbandIdx, double subbandFrequency)
    {
-      setArg(4, subbandIdx);
+      setArg(6, subbandIdx);
+      setArg(7, subbandFrequency);
      Kernel::enqueue(blockId);
    }

@@ -222,6 +238,11 @@ namespace LOFAR
      if (itsParameters.inputIsStationData)
        defs["INPUT_IS_STATIONDATA"] = "1";

+      if (itsParameters.dopplerCorrection) {
+        defs["DOPPLER_CORRECTION"] = "1";
+        defs["CLOCK_MHZ"] = lexical_cast<string>(itsParameters.clockMHz);
+      }
+
      return defs;
    }
  }

--- a/RTCP/Cobalt/GPUProc/src/Kernels/FIR_FilterKernel.h
+++ b/RTCP/Cobalt/GPUProc/src/Kernels/FIR_FilterKernel.h
@@ -44,7 +44,8 @@ namespace LOFAR
        INPUT_DATA,
        OUTPUT_DATA,
        FILTER_WEIGHTS,
-        HISTORY_DATA
+        HISTORY_DATA,
+        DELAYS
      };

      // Parameters that must be passed to the constructor of the
@@ -55,9 +56,11 @@ namespace LOFAR
          unsigned nrSTABs,
          unsigned nrBitsPerSample,
          bool inputIsStationData,
+          bool dopplerCorrection,
          unsigned nrSubbands,
          unsigned nrChannels,
          unsigned nrSamplesPerChannel,
+          unsigned clockMHz,
          float scaleFactor,
          const std::string &name = "FIR",
          const bool dumpBuffers = false,
@@ -85,6 +88,10 @@ namespace LOFAR
        // The number of history samples used for each block
        unsigned nrHistorySamples() const;

+        // Clock freq used to calculate input samplie freq=clockMHz/1024 MHz
+        // for Doppler correction
+        unsigned clockMHz;
+
        // Additional scale factor (e.g. for FFT normalization).
        // Derived differently from nrChannelsPerSubband for correlation
        // and beamforming, so must be passed into this class.
@@ -97,6 +104,10 @@ namespace LOFAR
        // pipeline: float[stab][pol][sample]
        bool inputIsStationData;

+        // if true, 
+        // enable Doppler correction
+        bool dopplerCorrection;
+
        size_t bufferSize(FIR_FilterKernel::BufferType bufferType) const;
      };

@@ -106,7 +117,8 @@ namespace LOFAR
                       const Parameters& param);

      void enqueue(const BlockID &blockId,
-                   unsigned subbandIdx);
+                   unsigned subbandIdx,
+                   double subbandFrequency=0.0); // needed for Doppler corr.

      // Put the historyFlags[subbandIdx] in front of the given inputFlags,
      // and update historyFlags[subbandIdx] with the flags of the last samples
@@ -128,6 +140,10 @@ namespace LOFAR
      //
      // Dimensions: [nrSubbands][nrStations]
      MultiDimArray<SparseSet<unsigned>, 2> historyFlags;
+
+    public:
+      // Delay compensation constants to be written by the caller before enqueue()
+      gpu::DeviceMemory delaysAtBegin, delaysAfterEnd;
    };

    //# --------  Template specializations for KernelFactory  -------- #//

--- a/RTCP/Cobalt/GPUProc/src/SubbandProcs/BeamFormerCoherentStep.cc
+++ b/RTCP/Cobalt/GPUProc/src/SubbandProcs/BeamFormerCoherentStep.cc
@@ -88,9 +88,11 @@ namespace LOFAR
            bfParameters.maxNrCoherentTABsPerSAP,
            obsParameters.nrBitsPerSample,
            false, // inputIsStationData
+            false, // dopplerCorrection
            nrSubbandsPerSubbandProc,
            bfParameters.coherentSettings.nrChannels,
            obsParameters.blockSize / bfParameters.coherentSettings.nrChannels,
+            obsParameters.clockMHz,
            static_cast<float>(bfParameters.coherentSettings.nrChannels),
            "FIR (coherent, final)",
            cobParameters.kernel.dumpFIR_FilterKernel,

--- a/RTCP/Cobalt/GPUProc/src/SubbandProcs/BeamFormerIncoherentStep.cc
+++ b/RTCP/Cobalt/GPUProc/src/SubbandProcs/BeamFormerIncoherentStep.cc
@@ -74,10 +74,12 @@ namespace LOFAR
        ? new KernelFactory<FIR_FilterKernel>(FIR_FilterKernel::Parameters(
            bfParameters.preStationIndices.size(),
            obsParameters.nrBitsPerSample,
-            false,
+            false, // inputIsStationData
+            false, // dopplerCorrection
            nrSubbandsPerSubbandProc,
            bfParameters.incoherentSettings.nrChannels,
            obsParameters.blockSize / bfParameters.incoherentSettings.nrChannels,
+            obsParameters.clockMHz,
            static_cast<float>(bfParameters.incoherentSettings.nrChannels),
            "FIR (incoherent, final)"))
        : NULL),

--- a/RTCP/Cobalt/GPUProc/src/SubbandProcs/BeamFormerPreprocessingStep.cc
+++ b/RTCP/Cobalt/GPUProc/src/SubbandProcs/BeamFormerPreprocessingStep.cc
@@ -84,12 +84,14 @@ namespace LOFAR
          obsParameters.nrBitsPerSample,
          preParameters.nrDelayCompensationChannels,
          obsParameters.blockSize / preParameters.nrDelayCompensationChannels,
+          obsParameters.clockMHz, //not needed in beamformer pipeline
          obsParameters.subbandWidth,
          obsParameters.nrSAPs,
          false, // correlator
          preParameters.delayCompensationEnabled,
-          false , // correctBandPass
+          false, // correctBandPass
          false, // transpose
+          false, // dopplerCorrection
          cobParameters.kernel.dumpDelayAndBandPassKernel,
          str(boost::format("L%d_SB%%03d_BL%%03d_BFPre_DelayAndBandPassKernel_%c%c%c.dat") %
            obsParameters.observationID %

--- a/RTCP/Cobalt/GPUProc/src/SubbandProcs/CorrelatorStep.cc
+++ b/RTCP/Cobalt/GPUProc/src/SubbandProcs/CorrelatorStep.cc
@@ -55,10 +55,11 @@ namespace LOFAR
            obsParameters.nrStations,
            obsParameters.nrBitsPerSample,
            true, // inputIsStationData
+            corParameters.dopplerCorrection, // DopplerCorrection
            nrSubbandsPerSubbandProc,
            corParameters.nrChannels,
            obsParameters.blockSize / corParameters.nrChannels,
-
+            obsParameters.clockMHz,
            // Scale to always output visibilities or stokes with the same flux scale.
            // With the same bandwidth, twice the (narrower) channels _average_ (not
            // sum) to the same fluxes (and same noise). Twice the channels (twice the
@@ -97,12 +98,14 @@ namespace LOFAR
        obsParameters.nrBitsPerSample,
        corParameters.nrChannels,
        obsParameters.blockSize / corParameters.nrChannels,
+        obsParameters.clockMHz,
        obsParameters.subbandWidth,
        obsParameters.nrSAPs,
-        preParameters.delayCompensationEnabled,
        true, // correlator
+        preParameters.delayCompensationEnabled,
        preParameters.bandPassCorrectionEnabled,
        true, // transpose
+        corParameters.dopplerCorrection, // DopplerCorrection
        cobParameters.kernel.dumpDelayAndBandPassKernel,
        str(boost::format("L%d_SB%%03d_BL%%03d_Cor_DelayAndBandPassKernel_%c%c%c.dat") %
          obsParameters.observationID %
@@ -348,6 +351,13 @@ namespace LOFAR
    {
      htodStream->waitEvent(executeFinished);

+      if (correlatorPPF && corParameters.dopplerCorrection) { // check nrChannels>1 for Doppler corr. 
+        htodStream->writeBuffer(firFilterKernel->delaysAtBegin,
+          input.delaysAtBegin, false);
+        htodStream->writeBuffer(firFilterKernel->delaysAfterEnd,
+          input.delaysAfterEnd, false);
+      }
+
      if (preParameters.delayCompensationEnabled) {
        htodStream->writeBuffer(delayAndBandPassKernel->delaysAtBegin,
          input.delaysAtBegin, false);
@@ -368,7 +378,8 @@ namespace LOFAR
      if (correlatorPPF) {
        // The subbandIdx immediate kernel arg must outlive kernel runs.
        firFilterKernel->enqueue(input.blockID, 
-                                 input.blockID.subbandProcSubbandIdx);
+                                 input.blockID.subbandProcSubbandIdx,
+                                 obsParameters.subbands[input.blockID.globalSubbandIdx].centralFrequency);
        fftKernel->enqueue(input.blockID);

        // Process flags enough to determine which data to zero

--- a/RTCP/Cobalt/GPUProc/test/Kernels/tDelayAndBandPassKernel.cc
+++ b/RTCP/Cobalt/GPUProc/test/Kernels/tDelayAndBandPassKernel.cc
@@ -67,12 +67,14 @@ int main(int argc, char *argv[])
      ps.settings.nrBitsPerSample,
      ps.settings.beamFormer.nrDelayCompensationChannels,
      ps.settings.blockSize / ps.settings.beamFormer.nrDelayCompensationChannels,
+      ps.settings.clockMHz,
      ps.settings.subbandWidth(),
      ps.settings.SAPs.size(),
      ps.settings.delayCompensation.enabled,
      correlator,
      false, // correctBandPass
-      false // transpose
+      false, // transpose
+      false // dopplerCorrection
    ));

  gpu::DeviceMemory
No results found