bug 1153 : upadete rjn

a2fea135 · Ronald Nijboer · 0a8f3725 · a2fea135
Commit a2fea135 authored 17 years ago by Ronald Nijboer
--- a/doc/papers/2008/IEEE-SIG/lofar.tex
+++ b/doc/papers/2008/IEEE-SIG/lofar.tex
@@ -989,37 +989,37 @@ and image the data.

 \section{Central Processing: Calibration}

-The off-line processing of LOFAR data has to deal with a number of challenges~\cite{Noordam:04,Nijboer:07}. First of all, the data volumes are huge. Second, compared to traditional steel dishes, the phased array station beams are far more variable (in time, in frequency, as well as over the different stations), they yield a higher degree of instrumental polarization, and they have relatively high sidelobes. All these issues complicate the processing of the data, especially since a high dynamic range must be reached. 
+The off-line processing of LOFAR data has to deal with a number of challenges~\cite{Noordam:04,Nijboer:07}. First of all, the data volumes are huge. Second, compared to traditional steel dishes, the phased array station beams are far more variable (in time, in frequency, as well as over the different stations), they have a high degree of instrumental polarization that varies with scan angle, and they have relatively high sidelobes. All these issues complicate the processing of the data, especially since a high dynamic range must be reached. 

 The third category of challenges lies in the sky itself. At the low frequencies where LOFAR observes there are very bright sources so that a high dynamic range and, hence, a high accuracy is needed to see the faint background sources. The sky will also be filled with a large number of sources, giving rise to confusion. Last, but not least, the Earths ionosphere seriously defocusses the images.

-An introduction to signal processing for radio astronomical arrays can be found in~\cite{Veen1:04,Veen2:04,Boonstra:05}. With LOFAR we enter a new regime in radio astronomical data processing.  The challenges imply that for LOFAR we have to reconsider existing processing strategies and algorithms and develop new strategies and algorithms. The off-line processing therefore remains a work in progress of which we give an overview of the current status.
+An introduction to signal processing for radio astronomical arrays can be found in~\cite{Veen1:04,Veen2:04,Boonstra:05}. With LOFAR we enter a new regime in radio astronomical data processing.  The challenges imply that for LOFAR we have to reconsider exisiting processing strategies and algorithms and develop new strategies and algorithms. Therefore,the off-line processing is still a work in progress of which we give an overview of the current status.
 \label{sec:offline}

 \subsection{Processing large data volumes}

-The total amount of data that is produced is determined by the total number of stations that are used in the observation. This number depends on the particular mode of observation. The correlator produces a data stream of the order of a few Gbyte / s, which yields of the order of several tens of Tbytes of data after a typical observation of four hours. Since a permanent data storage is not part of the LOFAR telescope these data volumes have to be processed near real time. Fortunately, the non-imaging LOFAR applications are not so data intense, so that for every 1 hour of observation approximately 4 hours are available to further process the data. With this in mind data I/O becomes an issue. Obviously the data needs to be processed in a parallelized and distributed way minimizing the I/O that is needed~\cite{Loose:08,Diepen:08}.  
+The total amount of data that is produced is determined by the total number of stations that are used in the observation. This number depends on the particular mode of observation. The correlator produces a data stream of the order of a few Gbyte / s, which yields of the order of several tens of Tbytes of data after a typical observation of four hours. Since a permanent data storage is not part of the LOFAR telescope these data volumes have to be processed near real time. Fortunately, the non-imaging LOFAR applications are not so data intense, so that for every 1 hour of observation approximately 4 hours are available to further process the data off-line. With this in mind data I/O becomes an issue. Obviously the data needs to be processed in a parallelized and distributed way minimizing the I/O that is needed~\cite{Loose:08,Diepen:08}.  
 
-Data can be distributed over a large number of processing nodes in a number of ways. Distribution over baselines is not very suitable for imaging, where data from all baselines must be combined to produce an image. Distribution over time has the disadvantage that up to several Gbytes/s have to be sent to a single processing node. Frequency, therefore, seems to be the best way. This distribution scheme matches with the design of the correlator. It is also a convenient scheme for the imager, where images are created per (combined) frequency channel. 
+Data can be distributed over a large number of processing nodes in a number of ways. Distribution over baselines is not very suitable for imaging, where data from all baselines must be combined to produce an image. Distribution over time has the disadvantage that up to sevaral Gbytes/s have to be sent to a single processing node. Frequency, therefore, seems to be the best way. This distribution scheme matches with the design of the correlator. It is also a convenient scheme for the imager, where images are created per (combined) frequency channel.

-A consequence of distribution over frequency is that in the self-calibration step solver equations from different compute nodes may need to be combined. The combining of solver equations, however, involves far less data then the underlying observed visibility data.   
+A consequence of distribution over frequency is that in the self-calibration step solver equations from different compute nodes may need to be combined allowing estimation of parameters using data that is distributed over several nodes. The combining of solver equations, however, involves far less data I/O then the underlying observed visibility data.   

 Even though the processing of the data will be done on a large cluster of computers, the total amount of data can be such that we expect the quality of the final result to be processing limited. This means that for all the algorithms we have to weight accuracy against the amount of Flops needed. It also means that the LOFAR instrument can be improved by upgrading the processing cluster in the future. 

 \subsection{Processing steps}

-LOFAR calibration is a joint estimation problem for both instrumental parameters and source parameters. At its heart lies the ``Measurement Equation'' that is used to model the observed data~\cite{Hamaker:96}. A signal processing data model and a Cramer-Rao lower bound analysis are given in ~\cite{Tol:07}. The latter paper also provides a good introduction to the signal processing aspects of LOFAR Self-Calibration.   
+LOFAR calibration is a joint estimation problem for both instrumental parameters, environmental parameters, and source parameters. At its heart lies the ``Measurement Equation'' that is used to model the observed data~\cite{Hamaker:96}. A detailed description of all steps involved can be found in~\cite{Noordam:05}. A signal processing data model and a Cramer-Rao lower bound analysis are given in~\cite{Tol:07}. The latter paper also provides a good introduction to the signal processing aspects of LOFAR Self-Calibration.   

 \label{sec:RFI}
-The final LOFAR calibration strategy is still under development. However, we foresee that the following steps and iterations will be part of it. The first step consists of removing bad data points, which are due to e.g. Radio Frequency Interference (RFI). After this step the contaminating contribution of a couple of very strong sources (like CasA, CygA, TauA, VirA) that enter through the station beam sidelobes needs to be removed. Since modelling the station beam sidelobes is infeasible due to the large number of parameters involved, the combined effect of the sources and the instrumental effects has to be estimated and subtracted from the data. 
+The current LOFAR calibration strategy consists of the following steps. The first step consists of removing bad data points, which are due to e.g. RFI. After this step the contaminating contribution of a couple of very strong sources (like CasA, CygA, TauA, VirA) that enter through the station beam sidelobes needs to be removed. Since modelling the station beam sidelobes is infeasible due to the large number of parameters involved, the combined effect of the sources and the instrumental effects has to be estimated and subtracted from the data. 

-Once the interfering signals are removed from the data, the data may be further integrated. The high resolution in frequency is only needed for removing RFI. The final resolution is determined by bandwidth smearing requirements~\cite{SIRAII:99}. In the frequency direction the data may be reduced by a factor of 3 to 10, depending on the size of the array used for the observation {\bf CHECK}. In principle the data may also be integrated along the time axis. Here, however, we have to make sure that the effect of the ionosphere remains constant over a time sample. The maximal reduction factor determined by time-average smearing ranges from 3 tot 10, again depending on array size {\bf CHECK}.
+Once the interfering signals are removed from the data, the data is further integrated. The final resolution is determined by bandwidth and time-average smearing requirements that follow from the desired FoV and the maximum baseline~\cite{SIRAII:99}. In the frequency direction the data may be reduced by maximal one order of magnitude. In principle the data is also integrated along the time axis. Here, however, the effect of the ionosphere has to remain constant over the integration period. The maximal reduction factor determined by time-average smearing is also maximally one order of magnitude.

-Next an iterative loop, dubbed the ``Major Cycle'', is entered where we first estimate instrumental and source parameters using the visibility data, then image the data, and finally refine the estimation  of the source parameters using image data. Since not all parameters are estimated jointly, the Major Cycle will be traversed a number of times~\cite{Nijboer:07}. 
+Next an iterative loop, dubbed the ``Major Cycle'', is entered where we first estimate instrumental, environmental, and source parameters using the visibility data, then image the data, and finally refine the estimation  of the source parameters using image data. Since not all parameters are estimated jointly, the Major Cycle will be traversed a number of times in order to iteratively refine the estimates~\cite{Nijboer:07}. 

-After initial operation of the LOFAR instrument the parameters for the strongest sources will be known. From then on the strongest sources can be used in every observation to estimate ionospheric parameters, instrumental parameters, and to refine the estimate for the station beams that is available from the station calibration. 
+After initial operation of the LOFAR instrument the parameters for the strongest sources will be known. From then on the strongest sources can be used in every observation to estimate ionospheric parameters, instrumental parameters, and to refine the estimate for the station beams that is available from the station calibration. It is the direction dependent estimation of ionospheric parameters that is the most challenging part of this estimation problem. 

-In~\cite{Tol:07} it is shown that the unconstrained direction dependent calibration problem is ambiguous. The authors, however, present three physical constraints to get an unambiguous solution: 
+In~\cite{Tol:07} it is shown that the unconstrained direction dependent calibration problem is ambiguous. However, three physical contraints to get an unambigous solution are presented: 
 %
 \begin{enumerate}
 \item use a calibrated subarray to calibrate the rest of the array,
@@ -1028,29 +1028,21 @@ In~\cite{Tol:07} it is shown that the unconstrained direction dependent calibrat
 \end{enumerate}
 % 

-In the first approach the LOFAR core is calibrated first, where use is made of the fact that the core stations all share the same ionosphere. This is a simpler problem. Van der Tol et al. show that in this case the remote stations can be calibrated, provided the number of calibration sources is less then the number of core stations~\cite{Tol:07,Tol:05}. 
+In the first approach the LOFAR core is calibrated first, where use is made of the fact that the core stations all share the same ionosphere. This is a simpler problem. Van der Tol et al.~\cite{Tol:07,Tol:05} show that in this case the remote stations can be calibrated, provided the number of independent calibration directions is less then the number of core stations. 

 In the second approach, use is made of the fact that the effect of the ionosphere has a predictable frequency dependence~\cite{Tol:05}. The number of parameters that need to be estimated may be further reduced by using suitable base functions for the spatial dependence of the ionosphere. The use of Karhunen-Loeve base functions seems very promising in this respect~\cite{Tol2:07}.  

-In the third approach, multiple samples in frequency and time are combined in a joint estimation, where the time and frequency dependence is modelled by e.g. polynomials and in this way the number of parameters that need to be estimated is reduced from 1 per individual sample to the polynomial coefficients for all samples together. In~\cite{Tol:07} it is reported however that this approach needs good initial estimates, since the continuous phase polynomial is ambiguous to integer multiples of $2\pi$.
+In the third approach, multiple samples in frequency and time are combined in a joint estimation, where the time and frequency dependence is modelled by e.g. polynomials and in this way the number of parameters that need to be estimated is reduced from 1 per individual sample to the polynomial coefficients for all samples together. Here use can be made from prior knowledge that not all parameters vary on the same time and frequency scales. In~\cite{Tol:07} it is reported however that this approach needs good initial estimates, since for instance the continuous phase polynomial is ambiguous to integer multiples of $2\pi$.

-For LOFAR all three approaches will used and they will be combined with the so-called ``Peeling'' approach~\cite{Noordam:04,Tol:07}.
+The sky image is the Fourier transform of the visibility domain. Due to the fact that the visibility domain is only discretely sampled, sources in the sky image are convolved with a Point Spread Function (PSF). The contribution from sources that have PSF far sidelobes that are higher than the image noise level should be subtracted from the visibility data. Using the solutions to the parameter estimation problem the contributions from the strongest sources in the FoV are removed from the visibility data. The remaining residual visibility data is then corrected and imaged.

-\ldots {\bf Peeling} \ldots
-
-The sky image is the Fourier transform of the visibility domain. Due to the fact that the visibility domain is only discretely sampled, sources in the sky image are convolved with a Point Spread Function (PSF). The contribution from sources that generate PSF far sidelobes that are higher than the image noise level should be subtracted from the visibility data. Using the solutions to the parameter estimation problem on the visibility data the contributions from the strongest sources are removed from the visibility data. The remaining residual visibility data is then corrected and imaged.
-
-One visibility sample is the summation of contributions from all sources in the sky. Since LOFAR has a large Field of View (FoV), the contribution from different sources is distorted by different ionospheric and beam effects. When imaging the visibility data, however, it is only possible to correct the data for one direction in the sky. This would mean that the image would be sharp for the direction of correction and the image quality would degrade outwards. To overcome this problem LOFAR images will be made in facets, where we can correct the data for the center of each facet. 
+One visibility sample contains the combined contribution from all sources in the sky. Since LOFAR has a large FoV, the contribution from different sources is distorted by different ionospheric and beam effects. When imaging the visibility data, however, it is only possible to correct the data for one direction in the sky. This would mean that the image would be sharp for the direction of correction and the image quality would degrade outwards. To overcome this problem LOFAR images will be made in facets, where we can correct the data for the center of each facet. 

 Facet imaging is a well known technique to overcome the problem related to the so-called ``w-term'', which are due to the fact that the baselines are non-coplanar~\cite{SIRAII:99}. However, the non-coplanar baseline problem is better solved by the w-projection algorithm~\cite{Cornwell:05}. Therefore, we will apply the w-projection technique per facet and the facet size will only be determined by the variability of the station beam and the ionosphere.

 Since we correct the data per facet, this means multiplying the total amount of data with the number of facets. Fortunately, the facet size will be far smaller then the total FoV. This allows us to shift the data to the center of the facet and then integrate the data in both time and frequency. Hence, the total amount of data will be more or less the same.
  
-Once the image is produced, source finding and extraction algorithms may be used to estimate source parameters. This would then lead to an updated source model and we are ready to enter a new cycle of the Major Cycle. 
-
-By sampling the data in each iteration of the Major Cycle, doubling the sampling density in every cycle, and using only the full resolution data in the last cycle, we effectively have not more then twice the I/O that is needed for the full resolution data. We expect that will seriously improve the total speed of the processing.
-
-%The correlator produces visibility data on a 1 second and 0.62 kHz of 0.78 kHz resolution depending on the sampling frequency of the station processing. The frequency resolution is needed for excision of RFI signals. Strong RFI sources can in principle be suppressed on the station level by applying spatial filtering techniques~\cite{Veen1:04,Veen2:04,Boonstra:05}. However, applying these techniques on the station level would imply that all station beams will be different. That would mean that the number of station beam parameters to be estimated in the following self-calibration step would increase considerably. We foresee that this will be impractical in the initial LOFAR operations. However, in later upgrades these spatial filtering techniques may be incorporated. Initially, LOFAR processing will use traditional flagging techniques~\cite{Renting:07}.  
+Once the image is produced, source finding and source extraction algorithms will be used to estimate source parameters. This would then lead to an updated source model and a new cycle of the Major Cycle is entered. 

 \section{Current state and future work}

@@ -1076,13 +1068,15 @@ Figure \ref{fig:skymap} shows a series of images that were made from data using
 \includegraphics[width=0.32\textwidth]{LBA_observed.eps}
 \includegraphics[width=0.32\textwidth]{LBA_calibrated.eps}
 \includegraphics[width=0.32\textwidth]{LBA_peeled.eps}
-\caption{Images from CS1 using 48 hours and about 20 subbands of data. Observed: an image of the flagged, non-calibrated data. Calibrated: an image of the flagged, calibrated data showing CasA and CygA. Residual: an image of the flagged, calibrated data where CasA and CygA are removed from the data. Images courtesy of S.B. Yatawatta.}
+\caption{All sky images from the LOFAR CS1 configuration using 48 hours and about 20 subbands of data. Observed: an image of the flagged, non-calibrated data. Calibrated: an image of the flagged, calibrated data showing CasA and CygA. Residual: an image of the flagged, calibrated data where CasA and CygA are removed from the data. Images courtesy of S.B. Yatawatta.}
 \label{fig:skymap}
 \end{figure*}

-The following calibration is performed in two steps. In the first step, a point source model is used for both CasA and CygA, both at 20000 Jy flux and no polarization. An analytical beam shape is used and a solve is done for a single complex gain for the whole sky. In this way an estimate for the instrumental complex gains (due to e.g. clock drifts) and ionospheric phase differences is obtained. After correcting the data a second step is performed, where the complex gain is estimated in both the direction of CasA and CygA. In this second step no assumptions on the beam are made. For the middle image (``calibrated'') the data is corrected using the estimates for the direction of CasA. In this middle image CasA and CygA can be clearly seen as point sources. 
+Figure \ref{fig:skymap} shows a series of images that were made from data using the LOFAR CS1 configuration. 16 Microstations, each consisting of a single dipole with essentially an all sky FoV, were used. The images are centered on the North Celestial Pole and contain 48 hours and about 20 subbands of data. First the data is flagged for RFI and an image of the flagged-only data is shown on the left (``observed''). 
+
+The following calibration is performed in two steps. In the first step, a point source model is used for both CasA and CygA, both at 20000 Jy flux and no polarization. An analytical beam shape is used and we solve for a single complex gain per station for the whole sky. In this way an estimate for the instrumental complex gains (e.g. due to clock drifts) and ionospheric phase differences is obtained. After correcting the data, a second step is performed where we estimate the complex gains in both the direction of CasA and CygA. In this second step no assumptions on the beam are made. For the middle image (``calibrated'') the data is corrected using the estimates for the direction of CasA. In this middle image CasA and CygA can be clearly seen as point sources. 

-CasA and CygA completely overshine the background sources, since they are at least 50 times stronger then the average background source. After subtracting the contributions from CasA and CygA from the data the other sources become visible. This is shown in the right panel (``residual''), where now some hundred other sources are visible. 
+CasA and CygA completely dominate the background sources, since they are at least 50 times stronger then the average background source. After subtracting the contributions from CasA and CygA from the data some 100 other sources become visible. This is shown in the right panel (``residual''). 

 \subsection{International stations}
 International countries have shown interest in building stations as well in order to achieve longer baselines. In this context the first LBA station, consisting of 96 antennas is currently operational in Effelsberg (Germany) and will be connected to the rest of the LOFAR system. Two additional German stations will be installed in installed in Garching and Tautenberg. Furthermore, stations are planned in Sweden, France and the UK and interest is shown by Potsdam (Germany), Poland and Italy.