Skip to content
Snippets Groups Projects
Commit a27e0936 authored by Marcel Loose's avatar Marcel Loose :sunglasses:
Browse files

BugID: 1038

Merged with Joris' work.
parent 6b92b905
No related branches found
No related tags found
No related merge requests found
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
\usepackage{layout} \usepackage{layout}
\usepackage{color} \usepackage{color}
\usepackage{xspace} \usepackage{xspace}
\usepackage{url} %% could also use package hyperref.
%\usepackage[colorlinks=false]{hyperref} %\usepackage[colorlinks=false]{hyperref}
% %
\newcounter{decision} \newcounter{decision}
...@@ -60,19 +61,16 @@ ...@@ -60,19 +61,16 @@
\subsection{Purpose of This Document} \subsection{Purpose of This Document}
\label{subsec:purpose} \label{subsec:purpose}
%
\textsc{aips++}
\aips
\meqtree
This document provides a detailed description of the architectural and This document provides a detailed description of the architectural and
software design of the Blackboard Selfcal System (BBS) that will be used for software design of the Blackboard Selfcal System (BBS) that will be used for
the off-line calibration of the LOFAR observations. The primary goal of this the calibration of the LOFAR observations. The primary goal of this document
document is to provide information that is detailed enough to help the reader is to provide information that is detailed enough to help the reader
understand the design considerations, choice of software architecture and understand the design considerations, choice of software architecture and
global design. We will not delve into the level of detailed software design, global design. We will not delve into the level of detailed software design,
since this will likely cause discrepancies between the actual code and this since this will likely cause discrepancies between the actual code and this
document. For this level of detail, the reader is suggested to consult the document. For this level of detail, the reader is suggested to consult the
online code documentation. online code documentation. This document supersedes the previous version of
the BBS SDD~\cite{LOFAR-ASTRON-SDD-052}.
\subsection{Executive Summary} \subsection{Executive Summary}
\label{subsec:summary} \label{subsec:summary}
...@@ -83,7 +81,7 @@ online code documentation. ...@@ -83,7 +81,7 @@ online code documentation.
\item [BBS] BlackBoard Selfcal System \item [BBS] BlackBoard Selfcal System
\end{description} \end{description}
\pagebreak \cleardoublepage
\section{Architectural Design} \section{Architectural Design}
\label{sec:architectural-design} \label{sec:architectural-design}
...@@ -93,9 +91,9 @@ online code documentation. ...@@ -93,9 +91,9 @@ online code documentation.
\subsection{Design Considerations} \subsection{Design Considerations}
\label{subsec:considerations} \label{subsec:considerations}
The BlackBoard SelfCal (BBS) system is designed to do the calibration of LOFAR The BlackBoard SelfCal (BBS) system is designed to do the calibration of LOFAR
in an efficient way. Although BBS is mainly developed for LOFAR, it may also be in an efficient way. Although BBS is mainly developed for LOFAR, it may also
used to calibrate other instruments as soon as their specific algorithms are be used to calibrate other instruments as soon as their specific algorithms
plugged in. are plugged in.
\subsubsection{Data Volume} \subsubsection{Data Volume}
\label{subsubsec:data-volume} \label{subsubsec:data-volume}
...@@ -115,40 +113,41 @@ computers. Each computer will have to store and manipulate part of the data. ...@@ -115,40 +113,41 @@ computers. Each computer will have to store and manipulate part of the data.
The Selfcal application will be running on the off-line and auxiliary The Selfcal application will be running on the off-line and auxiliary
processing clusters of the central processing facility (see processing clusters of the central processing facility (see
\cite{LOFAR-ASTRON-ADD-012}). These clusters consist of Linux PCs in a high \cite{LOFAR-ASTRON-ADD-012}). These clusters consist of Linux PCs in a high
bandwidth network. The BBS application will typically run on 50 to 500 of these bandwidth network. The BBS application will run on a large cluster, typically
nodes. Data stored in the CEP intermediate storage facility will be distributed consisting of several hundred nodes. Data stored in the CEP intermediate
over multiple disks and will be accessed by multiple nodes concurrently. storage facility will be distributed over multiple disks and will be accessed
Reordering tens of terabytes of data takes a lot of time and should be avoided. by multiple nodes concurrently. Reordering tens of terabytes of data takes too
Therefore the data should be distributed such that the various applications much time and should be avoided. Therefore the data should be distributed
(e.g., calibration and imaging) can operate well without reordering. The such that the various applications (e.g., calibration and imaging) can operate
distribution should be such that large chunks of data can be well without reordering. The distribution should be such that large chunks of
processed locally and only small amounts of data need to be sent to other data can be processed locally and only small amounts of data need to be sent
machines. There are a few axes along which the data may be distributed: to other machines. There are a few axes along which the data may be
distributed:
\begin{description} \begin{description}
\item [Time] is a bad candidate, because a time slot contains a lot of data (up \item [Time] is probably not a good candidate, because a time slot contains a
to 0.7~GBytes during initial operation). This may lead to problems in the lot of data (up to 0.7~GBytes during initial operation). This may lead to
online system when all data of a time slot are sent to a single machine and problems in the online system when all data of a time slot are sent to a
written there. Another problem is that parallelization of imaging gets hard single machine and written there. Another problem is that parallelization of
because the data of all time slots have to be combined. imaging gets hard because the data of all time slots have to be combined.
\item [Baseline] seems a better candidate, but will lead to imaging problems as \item [Baseline] seems a better candidate, but will lead to imaging problems.
well. This is because a single image needs data from different machines, so This is because a single image needs data from different machines, so large
large amounts of gridded or FFT-ed data have to be sent around. amounts of gridded or FFT-ed data have to be sent around.
\item [Frequency] seems to be the best candidate. Creating an image is usually \item [Frequency] seems to be the best candidate. Creating an image is usually
done per channel or for a few channels, so in principle the whole imaging done per channel or for a few channels, so in principle the whole imaging
process can be done locally. It will result in an image cube distributed over process can be done locally. It will result in an image cube distributed over
many machines, so the image display and analysis software have to be able to many machines, so the image display and analysis software have to be able to
handle this. The image cube can be very large (e.g., 256~GBytes for handle this. The image cube can be very large (e.g., 256~GBytes for
1000~channels of $4000 \times 4000$ images for the 4~Stokes parameters). \\ 1000~channels of $4000 \times 4000$ pixels for the 4~Stokes parameters). \\
Distribution in frequency means that each subband is stored on a separate Distribution in frequency means that, e.g., each subband is stored on a
machine. If needed, each subband can be distributed further. Of course, each separate machine. If needed, each subband can be distributed further. Of
machine should contain about the same amount of data to get good load course, each machine should contain about the same amount of data to get good
balancing. \\ load balancing. \\
Note that this distribution matches well with the way the correlator and Note that this distribution matches well with the way the
online system is designed. correlator and online system is designed.
\end{description} \end{description}
The BBS calibration software is not dependent on a specific distribution, so The BBS calibration software is not dependent on a specific distribution, so
in the future other distributions can be used when applicable. However, it has in the future other distributions can be used when applicable. However, it has
not been decided yet if that is also true for the imaging software. not been evaluated yet if that is also true for the imaging software.
\subsubsection{Scalable Architecture} \subsubsection{Scalable Architecture}
\label{subsubsec:scalable-architecture} \label{subsubsec:scalable-architecture}
...@@ -158,14 +157,15 @@ be avoided as much as possible. When distributing data over frequency, we can ...@@ -158,14 +157,15 @@ be avoided as much as possible. When distributing data over frequency, we can
almost completely decouple the computing nodes, as we saw in the previous almost completely decouple the computing nodes, as we saw in the previous
section. Another way to reduce coupling is to make communication indirect as section. Another way to reduce coupling is to make communication indirect as
well. Computing nodes should communicate through some kind of global shared well. Computing nodes should communicate through some kind of global shared
memory. There are several architectural patterns that describe this approach. memory. There are several architectural patterns that describe this
One of the oldest and best known is the Blackboard pattern, which we will approach. One of the oldest and best known is the Blackboard pattern, which we
describe briefly below. will describe briefly below.
Computing nodes should communicate through some kind of global shared Computing nodes should communicate through some kind of global shared
memory. One obvious candidate for such shared memory is a database system. It memory. One obvious candidate for such shared memory is a database system. It
provides locking and notification (trigger) mechanisms, and sometimes even provides locking and notification (trigger) mechanisms, and sometimes even
command queueing. command queueing. We have to be careful, though, that the database will not
become a bottleneck.
\subsection{Blackboard Pattern} \subsection{Blackboard Pattern}
\label{subsec:blackboard} \label{subsec:blackboard}
...@@ -180,8 +180,7 @@ control component evaluates the current state of processing and coordinates ...@@ -180,8 +180,7 @@ control component evaluates the current state of processing and coordinates
the specialized programs. This data-directed control regime makes the specialized programs. This data-directed control regime makes
experimentation with different algorithms possible, and allows experimentally experimentation with different algorithms possible, and allows experimentally
derived heuristics to control processing. This architecture is described in derived heuristics to control processing. This architecture is described in
\cite{Buschmann1996} and \cite{Buschmann1996} and \cite{LOFAR-ASTRON-SDD-002}.
\cite{LOFAR-ASTRON-SDD-002}.
The Blackboard architecture is ideal for solving problems for which no The Blackboard architecture is ideal for solving problems for which no
predetermined algorithm or solve strategy is known. However, for the design of predetermined algorithm or solve strategy is known. However, for the design of
...@@ -189,7 +188,12 @@ the BBS system, we've come to the conclusion that the operational system will ...@@ -189,7 +188,12 @@ the BBS system, we've come to the conclusion that the operational system will
benefit in terms of performance when using a predefined solving strategy. The benefit in terms of performance when using a predefined solving strategy. The
"best" algorithm to perform a self-calibration run can be chosen from a "best" algorithm to perform a self-calibration run can be chosen from a
relatively short list of calibration strategies in advance (based, e.g., on relatively short list of calibration strategies in advance (based, e.g., on
heuristics, or suggested by research done with the \meqtree system). heuristics, or suggested by research done with the \meqtree system). In fact,
the Shared Respository pattern~\cite{Lalanda1998}, which can be seen as a
generalization of the Blackboard pattern, is probably a better match for the
BBS system. It realizes indirect communication using a repository as shared
memory. Figure~\ref{fig:shared-repository-pattern} show the specialization
hierarchy of patterns based on the Shared Repository pattern.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
...@@ -214,12 +218,6 @@ heuristics, or suggested by research done with the \meqtree system). ...@@ -214,12 +218,6 @@ heuristics, or suggested by research done with the \meqtree system).
\label{fig:shared-repository-pattern} \label{fig:shared-repository-pattern}
\end{figure} \end{figure}
In fact, the Shared Respository pattern~\cite{Lalanda1998}, which can be seen
as a generalization of the Blackboard pattern, is probably a better match for
the BBS system. It realizes indirect communication using a repository as
shared memory. Figure~\ref{fig:shared-repository-pattern} show the
specialization hierarchy of patterns based on the Shared Repository pattern.
For BBS, we will need a global controller, which could be implemented using For BBS, we will need a global controller, which could be implemented using
the Controller pattern; and a notification or trigger mechanism to inform the the Controller pattern; and a notification or trigger mechanism to inform the
computing nodes of changes to the shared memory, which could be implemented computing nodes of changes to the shared memory, which could be implemented
...@@ -235,29 +233,18 @@ will contain the values and quality of the (partial) solutions calculated by ...@@ -235,29 +233,18 @@ will contain the values and quality of the (partial) solutions calculated by
each computing node. The database can be used as an external source for each computing node. The database can be used as an external source for
various assessments of the solutions. various assessments of the solutions.
% \cleardoublepage
%\subsubsection{Controller}
%\label{subsubsec:controller}
%
%\subsubsection{Knowledge Sources}
%\label{subsubsec:ks}
%
%\subsubsection{Blackboard}
%\label{subsubsec:bb}
%
\pagebreak
\section{System Overview} \section{System Overview}
\label{sec:overview} \label{sec:overview}
\subsection{Subsystems} \subsection{Subsystems}
\label{subsec:subsystems} \label{subsec:subsystems}
BBS is split into two parts which are described in detail in other BBS is split into two parts. BBS Control takes care of the distributed
chapters. The BBS Control takes care of the distributed processing by means of processing by means of the Blackboard pattern. BBS Kernel does the actual
the Blackboard pattern. The BBS Kernel does the actual processing; it executes processing; it executes a series of steps where each step consists of an
a series of steps where each step consists of an operation like solve or operation like solve or correct.
correct.
\subsubsection{BBS Control} \subsubsection{BBS Control}
\label{subsubsec:sys-control} \label{subsubsec:sys-control}
...@@ -266,20 +253,21 @@ The BBS Control subsystem is responsible for controlling the execution of a ...@@ -266,20 +253,21 @@ The BBS Control subsystem is responsible for controlling the execution of a
self calibration strategy. A strategy consists of an ordered list of commands, self calibration strategy. A strategy consists of an ordered list of commands,
which will be executed by the BBS Kernel subsystem. which will be executed by the BBS Kernel subsystem.
The key idea is that a subset of the data (the so-called "work domain") is The key idea is that a subset of the data (the so-called \emph{work domain})
kept in memory; as many commands as possible are executed on these data before is kept in memory; as many commands as possible are executed on these data
the next data chunk is accessed. A strategy defines the size of the work before the next data chunk is accessed. A strategy defines the size of the
domain (in time and frequency) and optionally which stations and correlations work domain (in time and frequency) and optionally which stations and
are contained in the work domain. It is also possible to define an integration correlations are contained in the work domain. It is also possible to define
interval in time and frequency to achieve that, say, a longer time interval an integration interval in time and frequency to achieve that, say, a longer
can be used. The basic concept is that on each machine the data contained in time interval can be used. The basic concept is that on each machine the data
the work domain have to fit in memory. The BBS Kernel iterates over the work contained in the work domain have to fit in memory. The BBS Kernel iterates
domains to process all the data. For each strategy a number of steps can be over the work domains to process all the data. For each strategy a number of
defined. For instance, when peeling 10 Cat I sources, 20 steps can be steps can be defined. For instance, when peeling 10 Cat I sources, at least 30
defined. For each source step 1 is solving for the gain in the direction of steps can be defined. For each source, step~1 is solving for the gain in the
the source and step 2 is subtracting the source. Note that only after the last direction of the source, step~2 is subtracting the source, and step~3 shifts
subtraction the residual data need to be written. In this way the data are to the next source. Note that only after the last subtraction the residual
read and/or written only once per strategy. data need to be written. In this way the data are read and/or written only
once per strategy.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
...@@ -291,8 +279,8 @@ read and/or written only once per strategy. ...@@ -291,8 +279,8 @@ read and/or written only once per strategy.
The calibration process is controlled by the BBS Control The calibration process is controlled by the BBS Control
subsystem. Figure~\ref{fig:bbs-control-global-design} depicts the general subsystem. Figure~\ref{fig:bbs-control-global-design} depicts the general
control structure. The BBS Control subsystem consists of one global control structure. The BBS Control subsystem consists of one global
controller, which acts as the main process, and multiple local controller, controller, which acts as the main process, and multiple local controllers,
which control the BBS Kernel subsystem. The global controller posts one or each controlling one BBS Kernel subsystem. The global controller posts one or
more commands (steps) to the Command Queue. Each local controller fetches the more commands (steps) to the Command Queue. Each local controller fetches the
next command from the Command Queue and forwards the command to the BBS Kernel next command from the Command Queue and forwards the command to the BBS Kernel
subsystem. The kernel returns parameter solutions and their quality metrics to subsystem. The kernel returns parameter solutions and their quality metrics to
...@@ -303,7 +291,7 @@ which action should be taken next. ...@@ -303,7 +291,7 @@ which action should be taken next.
Since all communication takes places via the Blackboard, there is no need for Since all communication takes places via the Blackboard, there is no need for
a direct connection between the BBS Control and the BBS Kernel subsystems. a direct connection between the BBS Control and the BBS Kernel subsystems.
The Blackboard contains all the relevant information about the current state The Blackboard contains all the relevant information about the current state
of the self calibration process. This information that can be used by other of the self calibration process. This information can be used by other
(external) processes to monitor the calibration process and to plot results. (external) processes to monitor the calibration process and to plot results.
See~\cite{LOFAR-ASTRON-SDD-002} for more details on the Blackboard See~\cite{LOFAR-ASTRON-SDD-002} for more details on the Blackboard
architecture and roles of the controller. architecture and roles of the controller.
...@@ -338,7 +326,7 @@ architecture and roles of the controller. ...@@ -338,7 +326,7 @@ architecture and roles of the controller.
\subsubsection{BBS Database} \subsubsection{BBS Database}
\label{subsubsec:interf-database} \label{subsubsec:interf-database}
\pagebreak \cleardoublepage
\section{Software Design} \section{Software Design}
\label{sec:software-design} \label{sec:software-design}
...@@ -348,16 +336,43 @@ architecture and roles of the controller. ...@@ -348,16 +336,43 @@ architecture and roles of the controller.
\subsubsection{BBS Strategy} \subsubsection{BBS Strategy}
\label{subsubsec:design-strategy} \label{subsubsec:design-strategy}
One iteration in the so-called \emph{Major
Cycle}~\cite[sec.~4.1]{LOFAR-ASTRON-SDD-050} can be described by a BBS
Strategy. A strategy defines a relationship between the data set of a given
observation, which is stored in a measurement set, and the parameter database
holding (intermediate) values of the model parameters that will be estimated
as part of the self calibration process. At least two models are used in the
current self calibration setup: the Local Sky Model (LSM) and the Instrument
Model. The Data Selection associated with a BBS Strategy defines the selection
of the observed data that will be used for the complete strategy. Here you
can, for example, specify which frequency bands, time intervals, and baselines
should be used during this self calibration run. A strategy is defined in
terms of one or more BBS Steps (see section~\ref{subsubsec:design-step}
below).
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
\includegraphics[width=0.5\textwidth]{images/bbs-strategy-class-diagram} \includegraphics[width=0.5\textwidth]{images/bbs-strategy-class-diagram}
\caption{The BBS Strategy class defines the strategy to be used for the current self calibration run.}
\label{fig:bbsstrategy}
\end{figure} \end{figure}
\subsubsection{BBS Step} \subsubsection{BBS Step}
\label{subsubsec:design-step} \label{subsubsec:design-step}
\begin{figure}[!htb] A BBS Strategy is defined in terms of one or more BBS Steps. The BBS Step
class is designed as a Composite pattern~\cite{Gamma1995}, which means that
each BBS Step can itself be made up of one or more BBS Steps. The Composite
pattern provides an easy way to define a tree-like structure. Leaf classes,
like BBS SolveStep cannot be further subdivided; they describe one single
piece of work that can be handed over to the BBS Kernel. Currently, there is a
total of six leaf classes, each defining one single piece of work.
\begin{figure}[!ht]
\centering \centering
\includegraphics[width=0.8\textwidth]{images/bbs-step-class-diagram} \includegraphics[width=0.8\textwidth]{images/bbs-step-class-diagram}
\caption{The BBS Step class family defines single pieces of work that can be
executed by the BBS Kernel as part of the current self calibration run.}
\label{fig:bbsstep}
\end{figure} \end{figure}
\subsubsection{Global Control} \subsubsection{Global Control}
...@@ -834,13 +849,13 @@ A database can still be used as a logging mechanism and for monitoring by extern ...@@ -834,13 +849,13 @@ A database can still be used as a logging mechanism and for monitoring by extern
\end{itemize} \end{itemize}
\pagebreak \cleardoublepage
% References % References
\bibliographystyle{unsrt} \bibliographystyle{unsrt}
\bibliography{lofar} \bibliography{lofar}
\pagebreak \cleardoublepage
\appendix \appendix
\section{Configuration Syntax} \section{Configuration Syntax}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment