Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • ro/lofar
1 result
Show changes
Commits on Source (86)
Showing
with 395 additions and 25 deletions
......@@ -1321,6 +1321,8 @@ namespace LOFAR
// The correlator.enabled setting is used as default value for sap.correlatorEnabled
// and thus has thus be set before addSAPs is called.
settings.correlator.enabled = getBool("Observation.DataProducts.Output_Correlated.enabled", false);
// Doppler correction in correlator
settings.correlator.dopplerCorrection= getBool("Cobalt.Correlator.dopplerCorrection", false);
// Pointing information
addSAPs(settings);
......
......@@ -508,6 +508,11 @@ namespace LOFAR
// set to: subbandWidth() / nrChannels
double channelWidth;
// Doppler correction
//
// key: Cobalt.Correlator.dopplerCorrection, default false
bool dopplerCorrection;
// The number of samples in one block of one channel.
//
// key: OLAP.CNProc.integrationSteps
......
......@@ -963,6 +963,20 @@ SUITE(correlator) {
}
}
TEST(dopplerCorrection) {
LOG_INFO("Test correlator dopplerCorrection");
Parset ps = makeDefaultTestParset();
// set
ps.replace("Observation.DataProducts.Output_Correlated.enabled", "true");
ps.replace("Cobalt.Correlator.dopplerCorrection", "true");
ps.updateSettings();
// verify settings
CHECK_EQUAL(true, ps.settings.correlator.dopplerCorrection);
}
TEST(nrSamplesPerChannel) {
LOG_INFO("Test correlator nrSamplesPerChannel");
......
# Doppler correction
This is a brief description of how Doppler correction is implemented in the FIR_Filter and subsequent DelayAndBandpass correction.
Let the intrinsic sky signal be *g(t)* with its Fourier transform *G(f)*. Due to the Doppler shift, the measured signal will have a frequency shift *f_0* and in order to get back the orignal signal, we shift the measured signal Fourier transform as *G(f-f_0)*, which is done in continuous time by multiplication of *g(t)* with *exp(-j 2 pi f_0 t)*.
<img src="phaseramp.png" width="900"/>
The signal streaming to FIR_Filter is a discrete time signal, about 190k samples per one block of about 1 second duration (see the figure). The sampling frequency is *f_s* = clock frequency/1024.
The delays at the start of the block of data and after the end of the block are *tau0* and *tau1*, respectively.
Let *t* denote time within the block and *T* be the duration of the block, so *t* in *[0,T]*.
The linear delay for the sample at time *t* is *tau=tau1 (t/T) + tau0(1-t/T)*. The corresponding discrete time exponential is *exp(-j 2 pi f tau)*. Discarding the constant terms (that do not vary with time), the exponential becomes *exp(-j 2 pi f (tau1-tau0)/T t)*. The subband frequency is *f*.
As seen in the figure, each block is divided to many FFT blocks, so within each block, we use the same delay gradient *(tau0-tau0)/T* to calculate the phase ramp. In other words, the constant part of the delay ramp is not used as it will only increase decorelation, not affecting the needed shift in frequency.
To summarize, the exact term used at channel *chan* is *j 2 pi (f/f_s) (tau1-tau0)/NR_FFT_BLOCKS chan/NR_CHANNELS*.
Due to this correction, the shift in channels is *(tau1-tau0)/NR_FFT_BLOCKS (f/f_s)*. The corresponding shift in frequency is *(tau1-tau0)/NR_FFT_BLOCKS (f/NR_CHANNELS)*.
The Doppler correction is not affected by the rotation of the dipoles, so both polarizations get the same correction (unless the delays for each polarization is different).
The bandpass correction is modified by linear interpolation of the bandpass correction weights with the appropriate channel shift.
\ No newline at end of file
RTCP/Cobalt/GPUProc/doc/doppler-correction/phaseramp.png

29.1 KiB

This document describes the inner workings of the CUDA kernel used in quantizing the 32 bit input data into 8 bit data.
The Parset keys used in quantization of beamformed data are
For coherent Stokes:
* `Cobalt.BeamFormer.CoherentStokes.quantize=false`
* `Cobalt.BeamFormer.CoherentStokes.quantizeBits=8`
* `Cobalt.BeamFormer.CoherentStokes.quantizeScaleMax=5`
* `Cobalt.BeamFormer.CoherentStokes.quantizeScaleMin=-5`
* `Cobalt.BeamFormer.CoherentStokes.quantizeIpositive=false`
For incoherent Stokes:
* `Cobalt.BeamFormer.IncoherentStokes.quantize=false`
* `Cobalt.BeamFormer.IncoherentStokes.quantizeBits=8`
* `Cobalt.BeamFormer.IncoherentStokes.quantizeScaleMax=5`
* `Cobalt.BeamFormer.IncoherentStokes.quantizeScaleMin=-5`
* `Cobalt.BeamFormer.IncoherentStokes.quantizeIpositive=false`
The values for each key shown above are the default values (if that particular key is not defined in the parset). The description of the keys:
* `.quantize=true|false`: If true, the output will be quantized (instead of using 32 bit float as output datatype, a reduced number of bits will be used).
* `.quantizeBits=8`: Currently, 8 bits will be used to store each quantized data point. This implies the output data type will be signed char (int8_t) or unsigned char (uint8_t). In addtion, scale and offset of each data block will also be produced as 32 bit float values.
* `.quantizeScaleMax` and .`quantizeScaleMin`: These two keys will be used to cut off extreme data points above or below a threshold, prior to quantization.
* `.quantizeIpositive=true|false`: If `quantizeScaleMin` is negative, the usable range for Stokes I will also include some values below zero. However, by definition, Stokes I is always positive. By setting this key to true, we can override this to only consider the positive range of Stokes I for quantization. In this way, the available number of bits in the quantizer are not wasted by representing values that do not exist in the orignal data.
Let us call the values defined by `quantizeScaleMax` and `quantizeScaleMin` as `Smax` and `Smin`, respectively.
<img src="quantization.png" alt="How quantization works" width="700"/>
We can explain the workings of the quantization kernel by looking at the probability density function (PDF) from the input 32 bit data to output 8 bit data as shown in the above figure.
* The input PDF will have data in the range -inf...inf with one exception, for Stokes I, it will be 0...inf. (The notation inf represents infinity). This is shown on the top plot.
* Using the user defined *Smax* and *Smin*. the range of the data to be quantized is selected. In most cases it is the range between *Smin*&times;&sigma; and *Smax*&times;&sigma;. (&sigma; is the standard deviation). The user can specify the best values for *Smax* and *Smin* depending on how the input data are distributed (i.e., by looking at the PDF). The input data that fall outside this range are cut-off as show in in the middle plot. In addition the mean (&mu;) of the original data will be subtracted in the cases of Stokes Q,U,V.
* Finally, the original range is mapped to values in the range 0 to 255 (for unsigned data, i.e., Stokes I) or in the range -128 to 127 (for signed data) as shown in the bottom plot. The output PDF will be the area-sampled version of the input PDF (after cutoff) as shown in this plot.
* A special case is quantization of Stokes I, because the original data have values in 0...inf. If the user-specified *Smin* value is negative, some useful number of quantized levels will be used to represent negative data that does not exist. Therefore, in this case, the user supplied *Smin* value will be over-ridden to use 0 instead. This behaviour is enabled by setting `.quantizeIpositive=true`.
In order to (approximately) recover the original data from the quantized data, the scale and offset that are used to transform the data to the ranges 0 to 255 or -128 to 127 are also produced as output. The scale and offset are determined per each block of data (duration approximately 1 sec) and per each channel and polarization. The number of data samples per each block is dependent on the integration factor and the sampling clock frequency.
RTCP/Cobalt/GPUProc/doc/quantization/quantization.png

37.2 KiB

......@@ -45,6 +45,8 @@
* - @c DELAY_COMPENSATION: defined or not
* - @c BANDPASS_CORRECTION: defined or not
* - @c DO_TRANSPOSE: defined or not
* - @c DOPPLER_CORRECTION: if defined, DELAY_COMPENSATION and CLOCK_MHZ also must be defined
* - @c CLOCK_MHZ: clock frequency in MHz, normally 200 or 160
*/
#include "gpu_math.cuh"
......@@ -62,6 +64,15 @@
# undef BANDPASS_CORRECTION
#endif
#if defined DOPPLER_CORRECTION
#ifndef CLOCK_MHZ
#error DOPPLER_CORRECTION=1 but CLOCK_MHZ not defined
#endif
#ifndef DELAY_COMPENSATION
#error DOPPLER_CORRECTION=1 but DELAY_COMPENSATION not enabled
#endif
#endif
#if defined DO_TRANSPOSE
typedef fcomplex(*OutputDataType)[NR_STATIONS][NR_CHANNELS][NR_SAMPLES_PER_CHANNEL][NR_POLARIZATIONS];
#else
......@@ -128,7 +139,7 @@ extern "C" {
const fcomplex * filteredDataPtr,
const unsigned * delayIndices,
double subbandFrequency,
unsigned beam,
unsigned beam, // =nrSAPS
const double * delaysAtBeginPtr,
const double * delaysAfterEndPtr,
const double * phase0sPtr,
......@@ -139,7 +150,9 @@ extern "C" {
/* The z dimension is NR_STATIONS wide. */
const unsigned station = blockIdx.z * blockDim.z + threadIdx.z;
#if defined DELAY_COMPENSATION
const unsigned delayIdx = delayIndices[station];
#endif
/*
* channel: will cover all channels
......@@ -173,10 +186,12 @@ extern "C" {
#endif
#if defined BANDPASS_CORRECTION
#ifndef DOPPLER_CORRECTION
BandPassFactorsType bandPassFactors = (BandPassFactorsType)bandPassFactorsPtr;
float weight((*bandPassFactors)[channel]);
#endif
#endif
#if defined DELAY_COMPENSATION
DelaysType delaysAtBegin = (DelaysType)delaysAtBeginPtr;
......@@ -222,9 +237,38 @@ extern "C" {
// Calculate the angles to rotate for for the first and (beyond the) last sample.
//
// We need to undo the delay, so we rotate BACK, resulting in a negative constant factor.
#if defined DOPPLER_CORRECTION
const double2 freqOffset=(( delayAfterEnd - delayAtBegin ))*((subbandFrequency / NR_CHANNELS)/(NR_SAMPLES_PER_CHANNEL)); //divide this with (CLOCK_MHZ*1e6/1024.0)/NR_CHANNELS to get channel offset
// Since Doppler correction has already been applied at subbandFrequency,
// we shift frequencies
const double2 phiAtBegin = make_double2(-2.0 * M_PI * (frequency+freqOffset.x) * delayAtBegin.x - ((*phase0s)[delayIdx]).x,
-2.0 * M_PI * (frequency+freqOffset.y) * delayAtBegin.y - ((*phase0s)[delayIdx]).y);
const double2 phiAfterEnd = make_double2(-2.0 * M_PI * (frequency+freqOffset.x) * delayAfterEnd.x - ((*phase0s)[delayIdx]).x,
-2.0 * M_PI * (frequency+freqOffset.y) * delayAfterEnd.y - ((*phase0s)[delayIdx]).y);
#if defined BANDPASS_CORRECTION
BandPassFactorsType bandPassFactors = (BandPassFactorsType)bandPassFactorsPtr;
// positive offset means moving to right ->->
const double2 chanOffset=freqOffset*(NR_CHANNELS/(CLOCK_MHZ*1e6/1024.0)); //mult this with (CLOCK_MHZ*1e6/1024.0)/NR_CHANNELS to get freq
const float2 chanShifted={chanOffset.x+channel,chanOffset.y+channel};
unsigned chanlow[2]={__float2uint_rd(chanShifted.x),__float2uint_rd(chanShifted.y)};
// check and adjust to valid range
chanlow[0]=(chanlow[0]>NR_CHANNELS-1?NR_CHANNELS-1:chanlow[0]);
chanlow[1]=(chanlow[1]>NR_CHANNELS-1?NR_CHANNELS-1:chanlow[1]);
const unsigned chanhigh[2]={(chanlow[0]<NR_CHANNELS-1?chanlow[0]+1:chanlow[0]),
(chanlow[1]<NR_CHANNELS-1?chanlow[1]+1:chanlow[1])};
const float2 w1={chanShifted.x-chanlow[0],chanShifted.y-chanlow[1]};
float2 weight=make_float2((*bandPassFactors)[chanlow[0]]*(1.0f-w1.x)+(*bandPassFactors)[chanhigh[0]]*w1.x,
(*bandPassFactors)[chanlow[1]]*(1.0f-w1.y)+(*bandPassFactors)[chanhigh[1]]*w1.y);
#endif
#else
const double2 phiAtBegin = -2.0 * M_PI * frequency * delayAtBegin - (*phase0s)[delayIdx];
const double2 phiAfterEnd = -2.0 * M_PI * frequency * delayAfterEnd - (*phase0s)[delayIdx];
#endif
#endif // DOPPLER_CORRECTION
#endif // DELAY_COMPENSATION
for (unsigned time = timeStart; time < NR_SAMPLES_PER_CHANNEL; time += timeInc)
{
......@@ -250,10 +294,17 @@ extern "C" {
#endif
#if defined BANDPASS_CORRECTION
#if defined DOPPLER_CORRECTION
sampleX.x *= weight.x;
sampleX.y *= weight.x;
sampleY.x *= weight.y;
sampleY.y *= weight.y;
#else
sampleX.x *= weight;
sampleX.y *= weight;
sampleY.x *= weight;
sampleY.y *= weight;
#endif
#endif
// Support all variants of NR_CHANNELS and DO_TRANSPOSE for testing etc.
......
......@@ -71,8 +71,14 @@ typedef float SampleType;
typedef signed char (*SampledDataType)[NR_STABS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][NR_POLARIZATIONS];
#define SAMPLE(time) extractRI((*sampledData)[station][time][channel][pol], ri)
# else
#ifndef DOPPLER_CORRECTION
typedef SampleType (*SampledDataType)[NR_STABS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][NR_POLARIZATIONS * COMPLEX];
#define SAMPLE(time) (*sampledData)[station][time][channel][pol_ri]
#else //DOPPLER_CORRECTION
typedef SampleType (*SampledDataType)[NR_STABS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][NR_POLARIZATIONS][COMPLEX];
#define REAL(time) convertIntToFloat((*sampledData)[station][time][channel][pol][0])
#define IMAG(time) convertIntToFloat((*sampledData)[station][time][channel][pol][1])
#endif //DOPPLER_CORRECTION
#endif
#else
......@@ -88,10 +94,38 @@ inline __device__ float convertIntToFloat(float x)
}
#endif
// subbandIdx=1 is assumed below
#ifdef DOPPLER_CORRECTION
typedef float (*HistoryDataType)[1][NR_STABS][NR_TAPS - 1][NR_CHANNELS][NR_POLARIZATIONS * COMPLEX];
#else
typedef SampleType (*HistoryDataType)[1][NR_STABS][NR_TAPS - 1][NR_CHANNELS][NR_POLARIZATIONS * COMPLEX];
#endif
typedef float (*FilteredDataType)[NR_STABS][NR_POLARIZATIONS][NR_SAMPLES_PER_CHANNEL][NR_CHANNELS][COMPLEX];
typedef const float (*WeightsType)[NR_CHANNELS][NR_TAPS];
#ifdef DOPPLER_CORRECTION
// Check if CLOCK_MHZ is also defined
#ifndef CLOCK_MHZ
#error DOPPLER_CORRECTION=1 but CLOCK_MHZ not defined
#endif
// this is faster than doing "pol ? sin(phi) : cos(phi)"
// because that statement forces CUDA to still compute both
// as GPUs always compute both branches.
inline __device__ float2 sincos_d2f_select(double phi, int ri)
{
double r[2];
sincos(phi, &r[1], &r[0]);
float c=__double2float_rz(r[0]);
float s=__double2float_rz(r[1]);
// return: ri==0 (cos,-sin) ri==1 (sin,cos)
return make_float2((ri?s:c),(ri?c:-s));
}
typedef const double(*DelaysType)[1][NR_STABS][NR_POLARIZATIONS]; // 2 Polarizations; in seconds
typedef const double(*Phase0sType)[NR_STABS][NR_POLARIZATIONS]; // 2 Polarizations; in radians
#endif /* DOPPLER_CORRECTION */
/*!
* Applies the Finite Input Response filter defined by the weightsPtr array
......@@ -122,7 +156,8 @@ typedef const float (*WeightsType)[NR_CHANNELS][NR_TAPS];
* COMPLEX | 2 | size of complex in number of floats/doubles
* INPUT_IS_STATIONDATA | defined or not | if true, input is intX[stabs][samples][pol]
* | | if false, input is float[stabs][pol][samples]
*
* DOPPLER_CORRECTION | defined or not | If true, apply subband-based delay compensation
* CLOCK_MHZ | >0 clock Freq in MHz | Must be defined with DOPPLER_CORRECTION
* Execution configuration: (TODO: enforce using __attribute__ reqd_work_group_size)
* - Work dim == 2 (can be 1 iff NR_STABS == 1)
* + Inner dim: the channel, pol, real/imag the thread processes
......@@ -137,7 +172,10 @@ __global__ void FIR_filter( void *filteredDataPtr,
const void *sampledDataPtr,
const void *weightsPtr,
void *historyDataPtr,
unsigned subbandIdx)
const double * delaysAtBeginPtr,
const double * delaysAfterEndPtr,
unsigned subbandIdx,
double subbandFrequency)
{
SampledDataType sampledData = (SampledDataType) sampledDataPtr;
FilteredDataType filteredData = (FilteredDataType) filteredDataPtr;
......@@ -159,6 +197,38 @@ __global__ void FIR_filter( void *filteredDataPtr,
#endif
unsigned station = blockIdx.y;
#ifdef DOPPLER_CORRECTION
DelaysType delaysAtBegin = (DelaysType)delaysAtBeginPtr;
DelaysType delaysAfterEnd = (DelaysType)delaysAfterEndPtr;
const double delayAtBegin = (*delaysAtBegin)[0][station][pol];
const double delayAfterEnd = (*delaysAfterEnd)[0][station][pol];
// Calculate the angles to rotate for for the first and (beyond the) last sample.
//
// We need to undo the delay, so we rotate BACK, resulting in a negative constant factor.
// Note we use the sample frequency=CLOCK_MHZ/1024 (MHz) to normalize the frequency
// Let tau_0: delay at begin, tau_1: delay after end
// t : time within a block, T: total time (duration) of block,
// So, t in [0,T] and t/T=blockoffset in [0,1]
// then delay at t = tau_1 (t/T) + tau_0 (1 - t/T)
// exponent at frqeuency f = -j 2 pi (f tau)
// simplifying, exponent = -j 2 pi f tau_0 - j 2 pi f (tau_1 -tau_0) (t/T)
// We only need the term that changes with t, so discard the rest
// and only keep : 2 pi f (tau_1 - tau_0) (t/T) for the rest of calculations
// also replace f with f/f_s where f_s is sample frequency = clock/1024
// phi = phiGradient x blockOffset
// Offset of this sample between begin and end. = t/T fraction, within one FFT block of NR_CHANNELS
const double phi = 2.0 * M_PI * (subbandFrequency / (CLOCK_MHZ*1e6/1024.0) )*( delayAfterEnd - delayAtBegin )/NR_SAMPLES_PER_CHANNEL * double(channel)/NR_CHANNELS;
//
// Use double precision here, when phi~=0, error in cos() is minimum
// but error in sin() is highest, and will affect most baselines (whose Doppler effect ~=0)
// Note: both X and Y polarizations will have same correction
// so real=cos(phi), imag=sin(phi) correction factor
// phi=phiGradient * blockOffset
const float2 FACTOR=sincos_d2f_select(phi, ri);
#endif
//# const float16 weights = (*weightsData)[channel];
const float weights_s0 = (*weightsData)[channel][0];
const float weights_s1 = (*weightsData)[channel][1];
......@@ -183,6 +253,23 @@ __global__ void FIR_filter( void *filteredDataPtr,
delayLine_s8, delayLine_s9, delayLine_sA, delayLine_sB,
delayLine_sC, delayLine_sD, delayLine_sE, delayLine_sF;
#if defined DOPPLER_CORRECTION
delayLine_s0 = ((*historyData)[subbandIdx][station][0][channel][pol_ri]);
delayLine_s1 = ((*historyData)[subbandIdx][station][1][channel][pol_ri]);
delayLine_s2 = ((*historyData)[subbandIdx][station][2][channel][pol_ri]);
delayLine_s3 = ((*historyData)[subbandIdx][station][3][channel][pol_ri]);
delayLine_s4 = ((*historyData)[subbandIdx][station][4][channel][pol_ri]);
delayLine_s5 = ((*historyData)[subbandIdx][station][5][channel][pol_ri]);
delayLine_s6 = ((*historyData)[subbandIdx][station][6][channel][pol_ri]);
delayLine_s7 = ((*historyData)[subbandIdx][station][7][channel][pol_ri]);
delayLine_s8 = ((*historyData)[subbandIdx][station][8][channel][pol_ri]);
delayLine_s9 = ((*historyData)[subbandIdx][station][9][channel][pol_ri]);
delayLine_sA = ((*historyData)[subbandIdx][station][10][channel][pol_ri]);
delayLine_sB = ((*historyData)[subbandIdx][station][11][channel][pol_ri]);
delayLine_sC = ((*historyData)[subbandIdx][station][12][channel][pol_ri]);
delayLine_sD = ((*historyData)[subbandIdx][station][13][channel][pol_ri]);
delayLine_sE = ((*historyData)[subbandIdx][station][14][channel][pol_ri]);
#else
delayLine_s0 = convertIntToFloat((*historyData)[subbandIdx][station][0][channel][pol_ri]);
delayLine_s1 = convertIntToFloat((*historyData)[subbandIdx][station][1][channel][pol_ri]);
delayLine_s2 = convertIntToFloat((*historyData)[subbandIdx][station][2][channel][pol_ri]);
......@@ -198,6 +285,7 @@ __global__ void FIR_filter( void *filteredDataPtr,
delayLine_sC = convertIntToFloat((*historyData)[subbandIdx][station][12][channel][pol_ri]);
delayLine_sD = convertIntToFloat((*historyData)[subbandIdx][station][13][channel][pol_ri]);
delayLine_sE = convertIntToFloat((*historyData)[subbandIdx][station][14][channel][pol_ri]);
#endif
float sum_s0, sum_s1, sum_s2, sum_s3,
sum_s4, sum_s5, sum_s6, sum_s7,
......@@ -206,9 +294,21 @@ __global__ void FIR_filter( void *filteredDataPtr,
for (unsigned time = 0; time < NR_SAMPLES_PER_CHANNEL; time += NR_TAPS)
{
#if defined DOPPLER_CORRECTION
//(X,Y): sample real,imag parts, output a X + b Y
// FACTOR (cos,sin) = (c,s)
// ri=0 : a=cos, b=-sin ; ri=1: a=sin, b=cos
delayLine_sF = REAL(time + 0)*FACTOR.x+IMAG(time + 0)*FACTOR.y;
#else
delayLine_sF = convertIntToFloat(SAMPLE(time + 0));
#endif
sum_s0 = weights_sF * delayLine_s0;
#if defined DOPPLER_CORRECTION
delayLine_s0 = REAL(time + 1)*FACTOR.x+IMAG(time + 1)*FACTOR.y;
#else
delayLine_s0 = convertIntToFloat(SAMPLE(time + 1));
#endif
sum_s0 += weights_sE * delayLine_s1;
sum_s0 += weights_sD * delayLine_s2;
sum_s0 += weights_sC * delayLine_s3;
......@@ -227,7 +327,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 0][channel][ri] = sum_s0;
sum_s1 = weights_sF * delayLine_s1;
#if defined DOPPLER_CORRECTION
delayLine_s1 = REAL(time + 2)*FACTOR.x+IMAG(time + 2)*FACTOR.y;
#else
delayLine_s1 = convertIntToFloat(SAMPLE(time + 2));
#endif
sum_s1 += weights_sE * delayLine_s2;
sum_s1 += weights_sD * delayLine_s3;
sum_s1 += weights_sC * delayLine_s4;
......@@ -246,7 +350,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 1][channel][ri] = sum_s1;
sum_s2 = weights_sF * delayLine_s2;
#if defined DOPPLER_CORRECTION
delayLine_s2 = REAL(time + 3)*FACTOR.x+IMAG(time + 3)*FACTOR.y;
#else
delayLine_s2 = convertIntToFloat(SAMPLE(time + 3));
#endif
sum_s2 += weights_sE * delayLine_s3;
sum_s2 += weights_sD * delayLine_s4;
sum_s2 += weights_sC * delayLine_s5;
......@@ -265,7 +373,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 2][channel][ri] = sum_s2;
sum_s3 = weights_sF * delayLine_s3;
#if defined DOPPLER_CORRECTION
delayLine_s3 = REAL(time + 4)*FACTOR.x+IMAG(time + 4)*FACTOR.y;
#else
delayLine_s3 = convertIntToFloat(SAMPLE(time + 4));
#endif
sum_s3 += weights_sE * delayLine_s4;
sum_s3 += weights_sD * delayLine_s5;
sum_s3 += weights_sC * delayLine_s6;
......@@ -284,7 +396,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 3][channel][ri] = sum_s3;
sum_s4 = weights_sF * delayLine_s4;
#if defined DOPPLER_CORRECTION
delayLine_s4 = REAL(time + 5)*FACTOR.x+IMAG(time + 5)*FACTOR.y;
#else
delayLine_s4 = convertIntToFloat(SAMPLE(time + 5));
#endif
sum_s4 += weights_sE * delayLine_s5;
sum_s4 += weights_sD * delayLine_s6;
sum_s4 += weights_sC * delayLine_s7;
......@@ -303,7 +419,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 4][channel][ri] = sum_s4;
sum_s5 = weights_sF * delayLine_s5;
#if defined DOPPLER_CORRECTION
delayLine_s5 = REAL(time + 6)*FACTOR.x+IMAG(time + 6)*FACTOR.y;
#else
delayLine_s5 = convertIntToFloat(SAMPLE(time + 6));
#endif
sum_s5 += weights_sE * delayLine_s6;
sum_s5 += weights_sD * delayLine_s7;
sum_s5 += weights_sC * delayLine_s8;
......@@ -322,7 +442,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 5][channel][ri] = sum_s5;
sum_s6 = weights_sF * delayLine_s6;
#if defined DOPPLER_CORRECTION
delayLine_s6 = REAL(time + 7)*FACTOR.x+IMAG(time + 7)*FACTOR.y;
#else
delayLine_s6 = convertIntToFloat(SAMPLE(time + 7));
#endif
sum_s6 += weights_sE * delayLine_s7;
sum_s6 += weights_sD * delayLine_s8;
sum_s6 += weights_sC * delayLine_s9;
......@@ -341,7 +465,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 6][channel][ri] = sum_s6;
sum_s7 = weights_sF * delayLine_s7;
#if defined DOPPLER_CORRECTION
delayLine_s7 = REAL(time + 8)*FACTOR.x+IMAG(time + 8)*FACTOR.y;
#else
delayLine_s7 = convertIntToFloat(SAMPLE(time + 8));
#endif
sum_s7 += weights_sE * delayLine_s8;
sum_s7 += weights_sD * delayLine_s9;
sum_s7 += weights_sC * delayLine_sA;
......@@ -360,7 +488,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 7][channel][ri] = sum_s7;
sum_s8 = weights_sF * delayLine_s8;
#if defined DOPPLER_CORRECTION
delayLine_s8 = REAL(time + 9)*FACTOR.x+IMAG(time + 9)*FACTOR.y;
#else
delayLine_s8 = convertIntToFloat(SAMPLE(time + 9));
#endif
sum_s8 += weights_sE * delayLine_s9;
sum_s8 += weights_sD * delayLine_sA;
sum_s8 += weights_sC * delayLine_sB;
......@@ -379,7 +511,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 8][channel][ri] = sum_s8;
sum_s9 = weights_sF * delayLine_s9;
#if defined DOPPLER_CORRECTION
delayLine_s9 = REAL(time + 10)*FACTOR.x+IMAG(time + 10)*FACTOR.y;
#else
delayLine_s9 = convertIntToFloat(SAMPLE(time + 10));
#endif
sum_s9 += weights_sE * delayLine_sA;
sum_s9 += weights_sD * delayLine_sB;
sum_s9 += weights_sC * delayLine_sC;
......@@ -398,7 +534,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 9][channel][ri] = sum_s9;
sum_sA = weights_sF * delayLine_sA;
#if defined DOPPLER_CORRECTION
delayLine_sA = REAL(time + 11)*FACTOR.x+IMAG(time + 11)*FACTOR.y;
#else
delayLine_sA = convertIntToFloat(SAMPLE(time + 11));
#endif
sum_sA += weights_sE * delayLine_sB;
sum_sA += weights_sD * delayLine_sC;
sum_sA += weights_sC * delayLine_sD;
......@@ -417,7 +557,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 10][channel][ri] = sum_sA;
sum_sB = weights_sF * delayLine_sB;
#if defined DOPPLER_CORRECTION
delayLine_sB = REAL(time + 12)*FACTOR.x+IMAG(time + 12)*FACTOR.y;
#else
delayLine_sB = convertIntToFloat(SAMPLE(time + 12));
#endif
sum_sB += weights_sE * delayLine_sC;
sum_sB += weights_sD * delayLine_sD;
sum_sB += weights_sC * delayLine_sE;
......@@ -436,7 +580,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 11][channel][ri] = sum_sB;
sum_sC = weights_sF * delayLine_sC;
#if defined DOPPLER_CORRECTION
delayLine_sC = REAL(time + 13)*FACTOR.x+IMAG(time + 13)*FACTOR.y;
#else
delayLine_sC = convertIntToFloat(SAMPLE(time + 13));
#endif
sum_sC += weights_sE * delayLine_sD;
sum_sC += weights_sD * delayLine_sE;
sum_sC += weights_sC * delayLine_sF;
......@@ -455,7 +603,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 12][channel][ri] = sum_sC;
sum_sD = weights_sF * delayLine_sD;
#if defined DOPPLER_CORRECTION
delayLine_sD = REAL(time + 14)*FACTOR.x+IMAG(time + 14)*FACTOR.y;
#else
delayLine_sD = convertIntToFloat(SAMPLE(time + 14));
#endif
sum_sD += weights_sE * delayLine_sE;
sum_sD += weights_sD * delayLine_sF;
sum_sD += weights_sC * delayLine_s0;
......@@ -474,7 +626,11 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 13][channel][ri] = sum_sD;
sum_sE = weights_sF * delayLine_sE;
#if defined DOPPLER_CORRECTION
delayLine_sE = REAL(time + 15)*FACTOR.x+IMAG(time + 15)*FACTOR.y;
#else
delayLine_sE = convertIntToFloat(SAMPLE(time + 15));
#endif
sum_sE += weights_sE * delayLine_sF;
sum_sE += weights_sD * delayLine_s0;
sum_sE += weights_sC * delayLine_s1;
......@@ -511,10 +667,16 @@ __global__ void FIR_filter( void *filteredDataPtr,
(*filteredData)[station][pol][time + 15][channel][ri] = sum_sF;
}
for (unsigned time = 0; time < NR_TAPS - 1; time++)
for (unsigned tap= 0; tap < NR_TAPS - 1; tap++)
{
(*historyData)[subbandIdx][station][time][channel][pol_ri] =
SAMPLE(NR_SAMPLES_PER_CHANNEL - (NR_TAPS - 1) + time);
#if defined DOPPLER_CORRECTION
const unsigned time = NR_SAMPLES_PER_CHANNEL - (NR_TAPS - 1) + tap;
(*historyData)[subbandIdx][station][tap][channel][pol_ri] = // subbandIdx=1 assumed here
REAL(time)*FACTOR.x+IMAG(time)*FACTOR.y;
#else
(*historyData)[subbandIdx][station][tap][channel][pol_ri] = // subbandIdx=1 assumed here
SAMPLE(NR_SAMPLES_PER_CHANNEL - (NR_TAPS - 1) + tap);
#endif
}
}
}
......@@ -34,7 +34,8 @@ namespace LOFAR
nrBitsPerSample(obsSettings.nrBitsPerSample),
blockSize(obsSettings.blockSize),
subbandWidth(obsSettings.subbandWidth()),
subbands(obsSettings.subbands)
subbands(obsSettings.subbands),
clockMHz(obsSettings.clockMHz)
{
ASSERT(obsSettings.antennaFieldNames.size() == obsSettings.antennaFields.size());
}
......@@ -69,4 +70,4 @@ namespace LOFAR
}
} // end namespace Cobalt
} // end namespace LOFAR
\ No newline at end of file
} // end namespace LOFAR
......@@ -54,6 +54,7 @@ namespace LOFAR
unsigned blockSize;
double subbandWidth;
std::vector<ObservationSettings::Subband> subbands;
unsigned clockMHz;
};
Observation observation;
......@@ -120,4 +121,4 @@ namespace LOFAR
} // end namespace Cobalt
} // end namespace LOFAR
#endif
\ No newline at end of file
#endif
......@@ -51,12 +51,14 @@ namespace LOFAR
unsigned nrBitsPerSample_,
unsigned nrChannels_,
unsigned nrSamplesPerChannel_,
unsigned clockMHz_,
double subbandBandwidth_,
unsigned nrSAPs_,
bool correlator_,
bool delayCompensation_,
bool correctBandPass_,
bool transpose_,
bool dopplerCorrection_,
bool dumpBuffers_,
std::string dumpFilePattern_) :
Kernel::Parameters(correlator_ ? "delayAndBandPass" : "delayCompensation"),
......@@ -66,11 +68,13 @@ namespace LOFAR
nrBitsPerSample(nrBitsPerSample_),
nrChannels(nrChannels_),
nrSamplesPerChannel(nrSamplesPerChannel_),
clockMHz(clockMHz_),
subbandBandwidth(subbandBandwidth_),
nrSAPs(nrSAPs_),
delayCompensation(delayCompensation_),
correctBandPass(correctBandPass_),
transpose(transpose_)
transpose(transpose_),
dopplerCorrection(dopplerCorrection_)
{
if (correlator_) {
// Use identity mappnig for station indices
......@@ -110,7 +114,7 @@ namespace LOFAR
delayIndices.size() * sizeof delayIndices[0];
case DelayAndBandPassKernel::DELAYS:
return
(size_t) nrSAPs * nrDelays *
(size_t) nrSAPs * nrDelays *
NR_POLARIZATIONS * sizeof(double);
case DelayAndBandPassKernel::PHASE_ZEROS:
return
......@@ -225,6 +229,11 @@ namespace LOFAR
if (itsParameters.transpose)
defs["DO_TRANSPOSE"] = "1";
if (itsParameters.dopplerCorrection) {
defs["DOPPLER_CORRECTION"] = "1";
defs["CLOCK_MHZ"] = lexical_cast<string>(itsParameters.clockMHz);
}
return defs;
}
}
......
......@@ -62,12 +62,14 @@ namespace LOFAR
unsigned nrBitsPerSample,
unsigned nrChannels,
unsigned nrSamplesPerChannel,
unsigned clockMHz,
double subbandBandwidth,
unsigned nrSAPs,
bool correlator,
bool delayCompensation,
bool correctBandPass,
bool transpose,
bool dopplerCorrection,
bool dumpBuffers = false,
std::string dumpFilePattern = "");
......@@ -78,6 +80,9 @@ namespace LOFAR
unsigned nrChannels;
unsigned nrSamplesPerChannel;
// Clock freq used to calculate input samplie freq=clockMHz/1024 MHz
// for Doppler correction
unsigned clockMHz;
double subbandBandwidth;
unsigned nrSAPs;
......@@ -85,6 +90,10 @@ namespace LOFAR
bool delayCompensation;
bool correctBandPass;
bool transpose;
// if true,
// Doppler correction has ALREADY been applied in the FIR_Filter,
// so only incremental DelayCompensation and BandPass correction is done
bool dopplerCorrection;
unsigned nrSamplesPerSubband() const;
unsigned nrBytesPerComplexSample() const;
......
......@@ -47,9 +47,11 @@ namespace LOFAR
unsigned nrSTABs,
unsigned nrBitsPerSample,
bool inputIsStationData,
bool dopplerCorrection,
unsigned nrSubbands,
unsigned nrChannels,
unsigned nrSamplesPerChannel,
unsigned clockMHz,
float scaleFactor,
const std::string &name,
const bool dumpBuffers_,
......@@ -62,9 +64,12 @@ namespace LOFAR
nrSamplesPerChannel(nrSamplesPerChannel),
nrSubbands(nrSubbands),
clockMHz(clockMHz),
scaleFactor(scaleFactor),
inputIsStationData(inputIsStationData)
inputIsStationData(inputIsStationData),
dopplerCorrection(dopplerCorrection)
{
ASSERTSTR(dopplerCorrection?inputIsStationData:true,"Doppler correction only works if inputIsStationData=true");
dumpBuffers = dumpBuffers_;
dumpFilePattern = dumpFilePattern_;
}
......@@ -106,10 +111,16 @@ namespace LOFAR
sizeof(float);
case FIR_FilterKernel::HISTORY_DATA:
// History is split over 2 bytes in 4-bit mode, to avoid unnecessary packing/unpacking
// If Doppler corr. enabled, history is a float buffer
return
(size_t) nrSubbands *
nrHistorySamples() * nrSTABs *
NR_POLARIZATIONS * (nrBitsPerSample == 4 ? 2U : nrBytesPerComplexSample());
NR_POLARIZATIONS * (dopplerCorrection? sizeof(std::complex<float>)
: (nrBitsPerSample == 4 ? 2U : nrBytesPerComplexSample()));
case FIR_FilterKernel::DELAYS:
return (dopplerCorrection?
(size_t) 1 * nrSTABs * // nrSAPs=1 here
NR_POLARIZATIONS * sizeof(double) : 0);
default:
THROW(GPUProcException, "Invalid bufferType (" << bufferType << ")");
}
......@@ -124,12 +135,16 @@ namespace LOFAR
h_filterWeights(stream.getContext(), params.bufferSize(FILTER_WEIGHTS)),
d_filterWeights(stream.getContext(), params.bufferSize(FILTER_WEIGHTS)),
historySamples(stream.getContext(), params.bufferSize(HISTORY_DATA)),
historyFlags(boost::extents[params.nrSubbands][params.nrSTABs])
historyFlags(boost::extents[params.nrSubbands][params.nrSTABs]),
delaysAtBegin(stream.getContext(), params.bufferSize(DELAYS)),
delaysAfterEnd(stream.getContext(), params.bufferSize(DELAYS))
{
setArg(0, buffers.output);
setArg(1, buffers.input);
setArg(2, d_filterWeights);
setArg(3, historySamples);
setArg(4, delaysAtBegin);
setArg(5, delaysAfterEnd);
unsigned totalNrThreads = params.nrChannels * NR_POLARIZATIONS * 2;
unsigned nrPasses = ceilDiv(totalNrThreads, maxThreadsPerBlock);
......@@ -173,9 +188,10 @@ namespace LOFAR
}
void FIR_FilterKernel::enqueue(const BlockID &blockId,
unsigned subbandIdx)
unsigned subbandIdx, double subbandFrequency)
{
setArg(4, subbandIdx);
setArg(6, subbandIdx);
setArg(7, subbandFrequency);
Kernel::enqueue(blockId);
}
......@@ -222,6 +238,11 @@ namespace LOFAR
if (itsParameters.inputIsStationData)
defs["INPUT_IS_STATIONDATA"] = "1";
if (itsParameters.dopplerCorrection) {
defs["DOPPLER_CORRECTION"] = "1";
defs["CLOCK_MHZ"] = lexical_cast<string>(itsParameters.clockMHz);
}
return defs;
}
}
......
......@@ -44,7 +44,8 @@ namespace LOFAR
INPUT_DATA,
OUTPUT_DATA,
FILTER_WEIGHTS,
HISTORY_DATA
HISTORY_DATA,
DELAYS
};
// Parameters that must be passed to the constructor of the
......@@ -55,9 +56,11 @@ namespace LOFAR
unsigned nrSTABs,
unsigned nrBitsPerSample,
bool inputIsStationData,
bool dopplerCorrection,
unsigned nrSubbands,
unsigned nrChannels,
unsigned nrSamplesPerChannel,
unsigned clockMHz,
float scaleFactor,
const std::string &name = "FIR",
const bool dumpBuffers = false,
......@@ -85,6 +88,10 @@ namespace LOFAR
// The number of history samples used for each block
unsigned nrHistorySamples() const;
// Clock freq used to calculate input samplie freq=clockMHz/1024 MHz
// for Doppler correction
unsigned clockMHz;
// Additional scale factor (e.g. for FFT normalization).
// Derived differently from nrChannelsPerSubband for correlation
// and beamforming, so must be passed into this class.
......@@ -97,6 +104,10 @@ namespace LOFAR
// pipeline: float[stab][pol][sample]
bool inputIsStationData;
// if true,
// enable Doppler correction
bool dopplerCorrection;
size_t bufferSize(FIR_FilterKernel::BufferType bufferType) const;
};
......@@ -106,7 +117,8 @@ namespace LOFAR
const Parameters& param);
void enqueue(const BlockID &blockId,
unsigned subbandIdx);
unsigned subbandIdx,
double subbandFrequency=0.0); // needed for Doppler corr.
// Put the historyFlags[subbandIdx] in front of the given inputFlags,
// and update historyFlags[subbandIdx] with the flags of the last samples
......@@ -128,6 +140,10 @@ namespace LOFAR
//
// Dimensions: [nrSubbands][nrStations]
MultiDimArray<SparseSet<unsigned>, 2> historyFlags;
public:
// Delay compensation constants to be written by the caller before enqueue()
gpu::DeviceMemory delaysAtBegin, delaysAfterEnd;
};
//# -------- Template specializations for KernelFactory -------- #//
......
......@@ -88,9 +88,11 @@ namespace LOFAR
bfParameters.maxNrCoherentTABsPerSAP,
obsParameters.nrBitsPerSample,
false, // inputIsStationData
false, // dopplerCorrection
nrSubbandsPerSubbandProc,
bfParameters.coherentSettings.nrChannels,
obsParameters.blockSize / bfParameters.coherentSettings.nrChannels,
obsParameters.clockMHz,
static_cast<float>(bfParameters.coherentSettings.nrChannels),
"FIR (coherent, final)",
cobParameters.kernel.dumpFIR_FilterKernel,
......
......@@ -74,10 +74,12 @@ namespace LOFAR
? new KernelFactory<FIR_FilterKernel>(FIR_FilterKernel::Parameters(
bfParameters.preStationIndices.size(),
obsParameters.nrBitsPerSample,
false,
false, // inputIsStationData
false, // dopplerCorrection
nrSubbandsPerSubbandProc,
bfParameters.incoherentSettings.nrChannels,
obsParameters.blockSize / bfParameters.incoherentSettings.nrChannels,
obsParameters.clockMHz,
static_cast<float>(bfParameters.incoherentSettings.nrChannels),
"FIR (incoherent, final)"))
: NULL),
......
......@@ -84,12 +84,14 @@ namespace LOFAR
obsParameters.nrBitsPerSample,
preParameters.nrDelayCompensationChannels,
obsParameters.blockSize / preParameters.nrDelayCompensationChannels,
obsParameters.clockMHz, //not needed in beamformer pipeline
obsParameters.subbandWidth,
obsParameters.nrSAPs,
false, // correlator
preParameters.delayCompensationEnabled,
false , // correctBandPass
false, // correctBandPass
false, // transpose
false, // dopplerCorrection
cobParameters.kernel.dumpDelayAndBandPassKernel,
str(boost::format("L%d_SB%%03d_BL%%03d_BFPre_DelayAndBandPassKernel_%c%c%c.dat") %
obsParameters.observationID %
......
......@@ -55,10 +55,11 @@ namespace LOFAR
obsParameters.nrStations,
obsParameters.nrBitsPerSample,
true, // inputIsStationData
corParameters.dopplerCorrection, // DopplerCorrection
nrSubbandsPerSubbandProc,
corParameters.nrChannels,
obsParameters.blockSize / corParameters.nrChannels,
obsParameters.clockMHz,
// Scale to always output visibilities or stokes with the same flux scale.
// With the same bandwidth, twice the (narrower) channels _average_ (not
// sum) to the same fluxes (and same noise). Twice the channels (twice the
......@@ -97,12 +98,14 @@ namespace LOFAR
obsParameters.nrBitsPerSample,
corParameters.nrChannels,
obsParameters.blockSize / corParameters.nrChannels,
obsParameters.clockMHz,
obsParameters.subbandWidth,
obsParameters.nrSAPs,
preParameters.delayCompensationEnabled,
true, // correlator
preParameters.delayCompensationEnabled,
preParameters.bandPassCorrectionEnabled,
true, // transpose
corParameters.dopplerCorrection, // DopplerCorrection
cobParameters.kernel.dumpDelayAndBandPassKernel,
str(boost::format("L%d_SB%%03d_BL%%03d_Cor_DelayAndBandPassKernel_%c%c%c.dat") %
obsParameters.observationID %
......@@ -348,6 +351,13 @@ namespace LOFAR
{
htodStream->waitEvent(executeFinished);
if (correlatorPPF && corParameters.dopplerCorrection) { // check nrChannels>1 for Doppler corr.
htodStream->writeBuffer(firFilterKernel->delaysAtBegin,
input.delaysAtBegin, false);
htodStream->writeBuffer(firFilterKernel->delaysAfterEnd,
input.delaysAfterEnd, false);
}
if (preParameters.delayCompensationEnabled) {
htodStream->writeBuffer(delayAndBandPassKernel->delaysAtBegin,
input.delaysAtBegin, false);
......@@ -368,7 +378,8 @@ namespace LOFAR
if (correlatorPPF) {
// The subbandIdx immediate kernel arg must outlive kernel runs.
firFilterKernel->enqueue(input.blockID,
input.blockID.subbandProcSubbandIdx);
input.blockID.subbandProcSubbandIdx,
obsParameters.subbands[input.blockID.globalSubbandIdx].centralFrequency);
fftKernel->enqueue(input.blockID);
// Process flags enough to determine which data to zero
......
......@@ -67,12 +67,14 @@ int main(int argc, char *argv[])
ps.settings.nrBitsPerSample,
ps.settings.beamFormer.nrDelayCompensationChannels,
ps.settings.blockSize / ps.settings.beamFormer.nrDelayCompensationChannels,
ps.settings.clockMHz,
ps.settings.subbandWidth(),
ps.settings.SAPs.size(),
ps.settings.delayCompensation.enabled,
correlator,
false, // correctBandPass
false // transpose
false, // transpose
false // dopplerCorrection
));
gpu::DeviceMemory
......