Skip to content
Snippets Groups Projects
Commit 8a69c9ec authored by Jan David Mol's avatar Jan David Mol
Browse files

bug 1362: paper update, references update (replaced - by --)

parent 2d477e67
No related branches found
No related tags found
No related merge requests found
...@@ -14,7 +14,7 @@ PNG_SOURCES = ...@@ -14,7 +14,7 @@ PNG_SOURCES =
STY_SOURCES = STY_SOURCES =
SVG_SOURCES = ION-processing.svg pencilbeams.svg SVG_SOURCES = pencilbeams.svg
AUX_FILES = $(TEX_SOURCES:%.tex=%.aux) AUX_FILES = $(TEX_SOURCES:%.tex=%.aux)
GEN_FIGURES = $(FIG_SOURCES:%.fig=%.pdf) $(JGR_SOURCES:%.jgr=%.pdf) $(SVG_SOURCES:%.svg=%.pdf) GEN_FIGURES = $(FIG_SOURCES:%.fig=%.pdf) $(JGR_SOURCES:%.jgr=%.pdf) $(SVG_SOURCES:%.svg=%.pdf)
......
No preview for this file type
...@@ -55,17 +55,17 @@ Another novelty is the elaborate use of software to process the telescope data i ...@@ -55,17 +55,17 @@ Another novelty is the elaborate use of software to process the telescope data i
For processing LOFAR data, we use an IBM BlueGene/P (BG/P) supercomputer. The LOFAR antennas are grouped into stations, and each station sends its data (up to 198 Gb/s for all stations) to the BG/P super computer. Inside the BG/P, the data are split and recombined using both real-time signal processing routines as well as two all-to-all exchanges. The output data streams are sufficiently reduced in size in order to be able to stream them out of the BG/P and store them on disks in our storage cluster. For processing LOFAR data, we use an IBM BlueGene/P (BG/P) supercomputer. The LOFAR antennas are grouped into stations, and each station sends its data (up to 198 Gb/s for all stations) to the BG/P super computer. Inside the BG/P, the data are split and recombined using both real-time signal processing routines as well as two all-to-all exchanges. The output data streams are sufficiently reduced in size in order to be able to stream them out of the BG/P and store them on disks in our storage cluster.
The stations can be configured to observe in several directions in parallel, but have to divide their output bandwidth among them. In this paper, we present the \emph{beam former}, an extension to the LOFAR software which allows the telescope to be aimed in tens of directions simultaneously at LOFAR's full observational bandwidth, and in hundreds of directions at reduced bandwidth. Both feats cannot be matched by any other telescope. The data streams corresponding to each observational direction, called \emph{beams}, are generated through (weighted) summations of the station inputs, which are demultiplexed using an all-to-all exchange, and routed to the storage cluster. The stations can be configured to observe in several directions in parallel, but have to divide their output bandwidth among them. In this paper, we present the \emph{beam former}, an extension to the LOFAR software which allows the telescope to be aimed in tens of directions simultaneously at LOFAR's full observational bandwidth, and in hundreds of directions at reduced bandwidth. Both feats cannot be matched by any other telescope. The data streams corresponding to each observational direction, called \emph{tied-array beams}, are generated through (weighted) summations of the station inputs, which are demultiplexed using an all-to-all exchange, and routed to the storage cluster.
The primary scientific use case driving the work presented in this paper is pulsar research. A pulsar is a rapidly rotating, highly magnetised neutron star, which emits electromagnetic radiation from its poles. Similar to the behaviour of a lighthouse, the radiation is visible to us only if one of the poles points towards Earth, and subsequently appears to us as a very regular series of pulses, with a period as low as 1.4~ms~\cite{Hessels:06}. Pulsars are relatively weak radio sources, and their individual pulses often do not rise above the background noise that fills our universe. LOFAR is one of the few telescopes which operates in the frequency range (10 -- 240 MHz) in which pulsars are typically at their brightest. Our beam former also makes LOFAR the only telescope that can observe in hundreds of directions simultaneously with high sensitivity. These aspects make LOFAR an ideal instrument to discover unknown pulsars by doing a sensitive sky survey in a short amount of time, as well as an ideal instrument to study known pulsars in more detail. Astronomers can also use our beam former to focus on planets, exoplanets, the sun, and other radio objects, with unprecedented sensitivity. Furthermore, our pipeline allows fast broad-sky surveys to discover not only new pulsars but also other radio sources. The primary scientific use case driving the work presented in this paper is pulsar research. A pulsar is a rapidly rotating, highly magnetised neutron star, which emits electromagnetic radiation from its poles. Similar to the behaviour of a lighthouse, the radiation is visible to us only if one of the poles points towards Earth, and subsequently appears to us as a very regular series of pulses, with a period as low as 1.4~ms~\cite{Hessels:06}. Pulsars are weak radio sources, and their individual pulses often do not rise above the background noise that fills our universe. LOFAR is one of the few telescopes which operates in the frequency range (10 -- 240 MHz) in which pulsars are typically at their brightest. Our beam former also makes LOFAR the only telescope that can observe in hundreds of directions simultaneously with high sensitivity. These aspects make LOFAR an ideal instrument to discover unknown pulsars by doing a sensitive sky survey in a short amount of time, as well as an ideal instrument to study known pulsars in more detail. Apart from pulsar research, our beam former can be used to focus on planets, exoplanets, the sun, and other radio objects, with unprecedented sensitivity. Furthermore, our pipeline allows fast broad-sky surveys to discover not only new pulsars but also other radio sources.
In this paper, we will show how a software solution and the use of massive parallelism allows us to achieve this feat. We provide an in-depth study on all performance aspects, real-time behaviour, and scaling characteristics. The paper is organised as follows. In this paper, we will show how a software solution and the use of a massively parallel machine allows us to achieve this feat. We provide an in-depth study on all performance aspects, real-time behaviour, and scaling characteristics.
\section{Related Work} \section{Related Work}
MWA. Traditional radio dishes are unsuitable for beam forming due to their narrow field-of-view. A radio dish can be extended to focus on multiple sources by deploying multiple receivers in its focal point (a \emph{focal plane array})~\cite{Staveley-Smith:96}, a solution which does not scale. The Murchison Widefield Array (MWA) uses a design similar to LOFAR (omnidirectional antennas), and plans to build a beam former, but is still under construction~\cite{Lonsdale:09}. The LOFAR beam former is thus the only beam former capable of producing hundreds of tied-array beams.
LOFAR imaging pipeline \cite{Romein:10a} This paper builds upon the design of the imaging pipeline~\cite{Romein:10a}, further showing the flexibility and the power of a software telescope. Although many parameters in our imaging pipeline are platform-specific, a comparison across several hardware platforms~\cite{Nieuwpoort:09} showed that the issues faced are platform agnostic.
\section{LOFAR} \section{LOFAR}
\label{Sec:LOFAR} \label{Sec:LOFAR}
...@@ -202,7 +202,7 @@ Up to this point, processing chunks from different stations can be done independ ...@@ -202,7 +202,7 @@ Up to this point, processing chunks from different stations can be done independ
The beam former creates the beams as described in Section \ref{Sec:Beamforming}. First, the different weights required for the different beams are computed, based on the station positions and the beam directions. Note that the data in the chunks are already delay compensated with respect to the source at which the stations are pointed. Any delay compensation performed by the beam former is therefore to compensate the delay differences between the desired beams and the station's source. The reason for this two-stage approach is flexibility. By already compensating for the station's source in the previous step, the resulting data can not only be fed to the beam former, but also to other pipelines, such as the imaging pipeline. Because we have a software pipeline, we can implement and connect different processing pipelines with only a small increase in complexity. The beam former creates the beams as described in Section \ref{Sec:Beamforming}. First, the different weights required for the different beams are computed, based on the station positions and the beam directions. Note that the data in the chunks are already delay compensated with respect to the source at which the stations are pointed. Any delay compensation performed by the beam former is therefore to compensate the delay differences between the desired beams and the station's source. The reason for this two-stage approach is flexibility. By already compensating for the station's source in the previous step, the resulting data can not only be fed to the beam former, but also to other pipelines, such as the imaging pipeline. Because we have a software pipeline, we can implement and connect different processing pipelines with only a small increase in complexity.
The delays are applied to the station data through complex multiplications and additions, programmed in assembly. In order to take full advantage of the L1 cache and the available registers, data is processed in sets of 6 stations, producing 3 beams, or a subset thereof to cover the remaining stations and beams. While the exact ideal set size in which the data is to be processed depends on the architecture at hand, we have shown in previous work that similar tradeoffs exist for similar problems across different architectures~\cite{Nieuwpoort:09,BAR}. The delays are applied to the station data through complex multiplications and additions, programmed in assembly. In order to take full advantage of the L1 cache and the available registers, data is processed in sets of 6 stations, producing 3 beams, or a subset thereof to cover the remaining stations and beams. While the exact ideal set size in which the data is to be processed is platform specific, we have shown in previous work that similar tradeoffs exist for similar problems across different architectures~\cite{Nieuwpoort:09}.
Because each beam is an accumulation of the data from all stations, the bandwidth of each beam is equal to the bandwidth of data from a single station, which is 6.2~Gb/s now that the samples are 32-bit floats. Once the beams are formed, they are kept as XY polarisations or transformed into the Stokes IQUV or the Stokes I parameters. In the latter case, the beams can also be integrated temporally to reduce the resulting data rate. Because each beam is an accumulation of the data from all stations, the bandwidth of each beam is equal to the bandwidth of data from a single station, which is 6.2~Gb/s now that the samples are 32-bit floats. Once the beams are formed, they are kept as XY polarisations or transformed into the Stokes IQUV or the Stokes I parameters. In the latter case, the beams can also be integrated temporally to reduce the resulting data rate.
......
...@@ -229,6 +229,11 @@ ...@@ -229,6 +229,11 @@
PAMIV = {Parallel Algorithms for Machine Intelligence and Vision} PAMIV = {Parallel Algorithms for Machine Intelligence and Vision}
} }
@string
{
PASA = {Publications Astronomical Society of Australia}
}
@string @string
{ {
PDC = {Principles of Distributed Computing} PDC = {Principles of Distributed Computing}
...@@ -439,7 +444,7 @@ ...@@ -439,7 +444,7 @@
title = {{Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture}}, title = {{Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture}},
author = {D.I. August and D.A. Connors and S.A. Mahlke and J.W. Sias and K.M. Crozier and B.-C. Cheng and P.R. Eaton and Q.B. Olaniran and W.-m.W. Hwu}, author = {D.I. August and D.A. Connors and S.A. Mahlke and J.W. Sias and K.M. Crozier and B.-C. Cheng and P.R. Eaton and Q.B. Olaniran and W.-m.W. Hwu},
booktitle = ISCA, booktitle = ISCA,
pages = {227-237}, pages = {227--237},
month = {July}, month = {July},
year = {1998}, year = {1998},
} }
...@@ -908,7 +913,7 @@ ...@@ -908,7 +913,7 @@
title = {{Efficiently Searching the 15-Puzzle}}, title = {{Efficiently Searching the 15-Puzzle}},
author = {J. Culberson and J. Schaeffer}, author = {J. Culberson and J. Schaeffer},
institution = {Department of Computing Science, University of Alberta}, institution = {Department of Computing Science, University of Alberta},
number = {94-08}, number = {94--08},
year = {1994} year = {1994}
} }
...@@ -1589,7 +1594,7 @@ ...@@ -1589,7 +1594,7 @@
title = {{Group Communication in the Amoeba Distributed Operating System}}, title = {{Group Communication in the Amoeba Distributed Operating System}},
author = {M.F. Kaashoek and A.S. Tanenbaum}, author = {M.F. Kaashoek and A.S. Tanenbaum},
booktitle = ICDCS, booktitle = ICDCS,
pages = {222-230}, pages = {222--230},
address = {Arlington, TX}, address = {Arlington, TX},
month = {May}, month = {May},
year = {1991} year = {1991}
...@@ -1695,7 +1700,7 @@ ...@@ -1695,7 +1700,7 @@
journal = MICRO, journal = MICRO,
volume = {23}, volume = {23},
number = {2}, number = {2},
pages = {56-65}, pages = {56--65},
month = {March}, month = {March},
year = {2003} year = {2003}
} }
...@@ -1818,7 +1823,7 @@ ...@@ -1818,7 +1823,7 @@
journal = ICGA, journal = ICGA,
volume = {23}, volume = {23},
number = {3}, number = {3},
pages = {131-138}, pages = {131--138},
month = {September}, month = {September},
year = {2000} year = {2000}
} }
...@@ -1831,7 +1836,7 @@ ...@@ -1831,7 +1836,7 @@
journal = ICGA, journal = ICGA,
volume = {23}, volume = {23},
number = {3}, number = {3},
pages = {173-174}, pages = {173--174},
month = {September}, month = {September},
year = {2000} year = {2000}
} }
...@@ -1846,6 +1851,19 @@ ...@@ -1846,6 +1851,19 @@
year = {2002} year = {2002}
} }
@article
{
Lonsdale:09,
title = {{The Murchison Widefield Array: Design Overview}},
author = {C.J. Lonsdale et al},
journal = IEEE,
volume = {97},
number = {8},
pages = {1497--1506},
month = {August},
year = {2009}
}
@misc @misc
{ {
LOFAR_SPECS, LOFAR_SPECS,
...@@ -2156,7 +2174,7 @@ ...@@ -2156,7 +2174,7 @@
author = {G.L. Peterson}, author = {G.L. Peterson},
journal = IPL, journal = IPL,
volume = {12}, volume = {12},
pages = {115-116}, pages = {115--116},
month = {June}, month = {June},
year = {1981} year = {1981}
} }
...@@ -2580,7 +2598,7 @@ ...@@ -2580,7 +2598,7 @@
journal = PAMI, journal = PAMI,
volume = {11}, volume = {11},
number = {11}, number = {11},
pages = {1203-1212}, pages = {1203--1212},
year = {1989} year = {1989}
} }
...@@ -2733,6 +2751,18 @@ ...@@ -2733,6 +2751,18 @@
year = {1984} year = {1984}
} }
@article
{
Staveley-Smith:96,
author = {L. Staveley-Smith et al},
title = {{The Parkes 21cm Multibeam Receiver}},
volume = {13},
number = {3},
pages = {243--248},
month = {November},
year = {1996}
}
@inproceedings @inproceedings
{ {
Stern:97, Stern:97,
...@@ -2798,7 +2828,7 @@ ...@@ -2798,7 +2828,7 @@
booktitle = ACC # { 3}, booktitle = ACC # { 3},
editor = {M.R.B. Clarke}, editor = {M.R.B. Clarke},
publisher = {Pergamon Press, Oxford}, publisher = {Pergamon Press, Oxford},
pages = {55-56}, pages = {55--56},
year = {1982} year = {1982}
} }
...@@ -2845,7 +2875,7 @@ ...@@ -2845,7 +2875,7 @@
booktitle = ICPP, booktitle = ICPP,
volume = {3}, volume = {3},
address = {Bloomingdale, IL}, address = {Bloomingdale, IL},
pages = {156-165}, pages = {156--165},
month = {August}, month = {August},
year = {1996} year = {1996}
} }
...@@ -2868,7 +2898,7 @@ ...@@ -2868,7 +2898,7 @@
title = {{Cluster Computers and Grid Processing in the First Radio-Telescope of a New Generation}}, title = {{Cluster Computers and Grid Processing in the First Radio-Telescope of a New Generation}},
author = {{C.M. de} Vos and {K. van der} Schaaf and J.D. Bregman}, author = {{C.M. de} Vos and {K. van der} Schaaf and J.D. Bregman},
booktitle = CCGRID, booktitle = CCGRID,
pages = {156-160}, pages = {156--160},
month = {May}, month = {May},
year = {2001} year = {2001}
} }
...@@ -3004,7 +3034,7 @@ ...@@ -3004,7 +3034,7 @@
OPTkey = {}, OPTkey = {},
volume = {52}, volume = {52},
number = {1/2}, number = {1/2},
pages = {199-220}, pages = {199--220},
month = {January/March}, month = {January/March},
OPTnote = {}, OPTnote = {},
OPTannote = {} OPTannote = {}
...@@ -3051,7 +3081,7 @@ ...@@ -3051,7 +3081,7 @@
OPTkey = {}, OPTkey = {},
volume = {26}, volume = {26},
number = {2}, number = {2},
pages = {10-24}, pages = {10--24},
OPTmonth = {}, OPTmonth = {},
OPTnote = {}, OPTnote = {},
OPTannote = {} OPTannote = {}
...@@ -3117,7 +3147,7 @@ ...@@ -3117,7 +3147,7 @@
OPTkey = {}, OPTkey = {},
volume = {22}, volume = {22},
number = {3}, number = {3},
pages = {151-273}, pages = {151--273},
OPTmonth = {}, OPTmonth = {},
OPTnote = {DOI: 10.1007/s10686-008-9124-7}, OPTnote = {DOI: 10.1007/s10686-008-9124-7},
OPTannote = {The future of cm and m-wave astronomy lies with the OPTannote = {The future of cm and m-wave astronomy lies with the
...@@ -3195,7 +3225,7 @@ Keywords: Radio astronomy techniques, Radio telescopes, Square ...@@ -3195,7 +3225,7 @@ Keywords: Radio astronomy techniques, Radio telescopes, Square
OPTkey = {}, OPTkey = {},
OPTvolume = {17}, OPTvolume = {17},
OPTnumber = {1--3}, OPTnumber = {1--3},
OPTpages = {65-77}, OPTpages = {65--77},
OPTmonth = {june}, OPTmonth = {june},
OPTnote = {}, OPTnote = {},
OPTannote = {}, OPTannote = {},
...@@ -3231,7 +3261,7 @@ Keywords: Radio astronomy techniques, Radio telescopes, Square ...@@ -3231,7 +3261,7 @@ Keywords: Radio astronomy techniques, Radio telescopes, Square
OPTcrossref = {}, OPTcrossref = {},
OPTkey = {}, OPTkey = {},
booktitle = {ACM Transactions on Graphics, Proceedings of SIGGRAPH 2004}, booktitle = {ACM Transactions on Graphics, Proceedings of SIGGRAPH 2004},
pages = {777-786}, pages = {777--786},
year = {2004}, year = {2004},
OPTeditor = {}, OPTeditor = {},
OPTvolume = {}, OPTvolume = {},
...@@ -3284,7 +3314,7 @@ optimized CPU version on an Intel 2.4 GHz Core 2 with a 4 MB L2 cache. ...@@ -3284,7 +3314,7 @@ optimized CPU version on an Intel 2.4 GHz Core 2 with a 4 MB L2 cache.
booktitle = {Proceedings of the 22nd ACM International Conference on Supercomputing}, booktitle = {Proceedings of the 22nd ACM International Conference on Supercomputing},
keywords = {gpu}, keywords = {gpu},
month = {June}, month = {June},
pages = {309-318}, pages = {309--318},
title = {{Efficient Computation of Sum-products on GPUs Through Software-Managed Cache}}, title = {{Efficient Computation of Sum-products on GPUs Through Software-Managed Cache}},
year = {2008} year = {2008}
} }
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment