Skip to content
Snippets Groups Projects
Commit a218a448 authored by Marcel Loose's avatar Marcel Loose :sunglasses:
Browse files

BugID: 1038

Commit of Friday's work.
parent dae4b360
No related branches found
No related tags found
No related merge requests found
...@@ -81,7 +81,8 @@ the BBS SDD~\cite{LOFAR-ASTRON-SDD-052}. ...@@ -81,7 +81,8 @@ the BBS SDD~\cite{LOFAR-ASTRON-SDD-052}.
\begin{tabular}{@{}ll} \begin{tabular}{@{}ll}
ACC & Application Configuration and Control \\ ACC & Application Configuration and Control \\
BBS & BlackBoard Selfcal system \\ BBS & BlackBoard Selfcal system \\
OLAP & On-Line Appication Processing \\ OLAP & On-Line Application Processing \\
SAS & Specification And Scheduling \\
\end{tabular} \end{tabular}
\cleardoublepage \cleardoublepage
...@@ -93,17 +94,17 @@ OLAP & On-Line Appication Processing \\ ...@@ -93,17 +94,17 @@ OLAP & On-Line Appication Processing \\
\subsection{Design Considerations} \subsection{Design Considerations}
\label{subsec:considerations} \label{subsec:considerations}
The BlackBoard SelfCal (BBS) system is designed to do the calibration of \lofar The BlackBoard SelfCal (BBS) system is designed to do the calibration of
in an efficient way. Although BBS is mainly developed for \lofar, it may also \lofar in an efficient way. Although BBS is mainly developed for \lofar, it
be used to calibrate other instruments as soon as their specific algorithms may also be used to calibrate other instruments as soon as their specific
are plugged in. algorithms are plugged in.
\subsubsection{Data Volume} \subsubsection{Data Volume}
\label{subsubsec:data-volume} \label{subsubsec:data-volume}
The volume of the data coming from the \lofar correlator is \emph{very} large. The volume of the data coming from the \lofar correlator is \emph{very}
During initial operation (mid 2008) the amount of data generated during an large. During initial operation (mid 2008) the amount of data generated during
average observation will be in the order of several terabytes. Once \lofar is an average observation will be in the order of several terabytes. Once \lofar
fully operational, this number will have increased to almost a hundred is fully operational, this number will have increased to almost a hundred
terabytes. Given the output data rate of the correlator and the storage terabytes. Given the output data rate of the correlator and the storage
capacity of harddisks, it is obvious that the data cannot be stored on a capacity of harddisks, it is obvious that the data cannot be stored on a
single system, not even if an array of harddisks were used. The only feasible single system, not even if an array of harddisks were used. The only feasible
...@@ -329,8 +330,18 @@ architecture and roles of the controller. ...@@ -329,8 +330,18 @@ architecture and roles of the controller.
\label{subsubsec:sys-database} \label{subsubsec:sys-database}
The BBS Database (or blackboard) actually consists of two different databases. The BBS Database (or blackboard) actually consists of two different databases.
\begin{description} \begin{description}
\item [Command Queue] stores the commands to be executed by the BBS Kernel and the status results returned by each kernel. In principle, commands are executed in the order they were posted by Global Control. However, in the future, we may need a way to send an \emph{out-of-band} command, which could be implemented as a high priority command. This has not been fully decided yet.
\item [Parameter Database] stores (intermediate) solutions of the model parameters calcuated by the BBS Kernel. Access to the Parameter database is minimized in order to avoid any performance penalties. If partial solutions can be kept in memory of a local (kernel) node, they will not be written to the database, unless requested explicitly. \item [Command Queue] stores the commands to be executed by the BBS Kernel and
the status results returned by each kernel. In principle, commands are
executed in the order they were posted by Global Control. However, in the
future, we may need a way to send an \emph{out-of-band} command, which could
be implemented as a high priority command. This has not been fully decided
yet.
\item [Parameter Database] stores (intermediate) solutions of the model
parameters calcuated by the BBS Kernel. Access to the Parameter database is
minimized in order to avoid any performance penalties. If partial solutions
can be kept in memory of a local (kernel) node, they will not be written to
the database, unless requested explicitly.
\end{description} \end{description}
\subsection{Interfaces} \subsection{Interfaces}
...@@ -338,7 +349,8 @@ The BBS Database (or blackboard) actually consists of two different databases. ...@@ -338,7 +349,8 @@ The BBS Database (or blackboard) actually consists of two different databases.
\subsubsection{Context Diagram} \subsubsection{Context Diagram}
\label{subsubsec:context} \label{subsubsec:context}
The BlackBoard Selfcal system interfaces with several other components. The context of BBS is shown in figure~\ref{fig:bbs-context-diagram}. The BlackBoard Selfcal system interfaces with several other components. The
context of BBS is shown in figure~\ref{fig:bbs-context-diagram}.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
...@@ -352,25 +364,63 @@ is both read and updated by BBS.} ...@@ -352,25 +364,63 @@ is both read and updated by BBS.}
\end{figure} \end{figure}
\begin{description} \begin{description}
\item [ACC] configures and controls BBS. Configuration is done using a so-called \emph{parset} file, which is generated by ACC prior to starting the BBS appications. Each BBS applications reads, during initiaization, its \emph{parset} file, containing a number of key-value pairs that set several run-time configuration parameters of the applications. All BBS applications implement the control interface that is provided by ACC~\cite{LOFAR-ASTRON-SDD-037}, which enables ACC to control these applications. \item [ACC] configures and controls BBS. Configuration is done using a
\item [OLAP] stores the observational data as visibilities into one more Measurement Sets. In section~\ref{subsubsec:distributed-processing} we argued that the visibility data could probably best be distributed along the frequency axis; this also matches with the way the data are produced by the correlator~\cite{LOFAR-ASTRON-SDD-036}. So, in the current design, we assume that each BBS Kernel will process one or more subbands of data. so-called \emph{parset} file, which is generated by ACC prior to starting the
\item [Imager] will be operating on the residual visibilities that are produced by BBS. It will convert the UV-data from the updated Measurement Sets to the image plane. BBS appications. Each BBS applications reads, during initiaization, its
\item [Parameter database] \sloppy will store the parameters---or more precisely, the polynomial coefficients describing the parameters---of the different models used during the self calibration process. Examples of such models are: Local Sky Model, Minimal Ionospheric Model, and Instrument Model. These parameters are solved for by the BBS Kernel, during one or more self calibration runs. \emph{parset} file, containing a number of key-value pairs that set several
run-time configuration parameters of the applications. All BBS applications
implement the control interface that is provided by
ACC~\cite{LOFAR-ASTRON-SDD-037}, which enables ACC to control these
applications.
\item [OLAP] stores the observational data as visibilities into one more
Measurement Sets. In section~\ref{subsubsec:distributed-processing} we argued
that the visibility data could probably best be distributed along the
frequency axis; this also matches with the way the data are produced by the
correlator~\cite{LOFAR-ASTRON-SDD-036}. So, in the current design, we assume
that each BBS Kernel will process one or more subbands of data.
\item [Imager] will be operating on the residual visibilities that are
produced by BBS. It will convert the UV-data from the updated Measurement Sets
to the image plane.
\item [Parameter database] \sloppy will store the parameters---or more
precisely, the polynomial coefficients describing the parameters---of the
different models used during the self calibration process. Examples of such
models are: Local Sky Model, Minimal Ionospheric Model, and Instrument
Model. These parameters are solved for by the BBS Kernel, during one or more
self calibration runs.
\end{description} \end{description}
\subsubsection{BBS Control} \subsubsection{BBS Control}
\label{subsubsec:interf-control} \label{subsubsec:interf-control}
In this section we will briefly describe the most important interfaces that are provided or implemented by the BBS Control package. In this section we will briefly describe the most important interfaces that
are provided or implemented by the BBS Control package.
\begin{description} \begin{description}
\item [ACC] provides the Process Control interface~\cite{LOFAR-ASTRON-SDD-037} that will be implemented by each executable in the BBS Control package. It provides commands like \texttt{define}, \texttt{init}, \texttt{run}, and \texttt{quit}. \item [ACC] provides the Process Control interface~\cite{LOFAR-ASTRON-SDD-037}
\item [BBS Strategy] describes the strategy to be used for the current self calibration run. Configurable strategy parameters are read from the \emph{parset} file that is supplied by ACC. A strategy contains one or more steps. that will be implemented by each executable in the BBS Control package. It
\item [BBS Step] describes a (single or multi) step to be executed for the current self calibration run. Configurable step parameter are read from the \emph{parset} file that is supplied by ACC. provides commands like \texttt{define}, \texttt{init}, \texttt{run}, and
\item [Command Queue] contains the queue of commands to be executed by the BBS Kernel. Commands are posted by Global Control and retrieved by each Local Control. Commands are usually executed in the order in which they appear in the queue, unless a particular command is marked \emph{high priority} (TBD). A command can be a single step (a manageable piece of work forwarded to the kernel), or a control message like \texttt{initialize}. \texttt{quit}. Furthermore, ACC provides each executable with a so-called
\item [Parameter database] contains the parameters that must be solved for during the current self calibration run. It will be updated by Local Controls. Global Control wil regularly check the quality of the solutions and take appriopriate action (e.g., repeat the last step). \emph{parset} file, which contains important configuration parameters. See
appendix~\ref{sec:configuration-syntax} for a complete list of all key-value
pairs that are defined for the BBS applications.
\item [BBS Strategy] describes the strategy to be used for the current self
calibration run. Configurable strategy parameters are read from the
\emph{parset} file that is supplied by ACC. A strategy consists of one or more
steps.
\item [BBS Step] describes a (single or multi) step to be executed for the
current self calibration run. Configurable step parameters are read from the
\emph{parset} file that is supplied by ACC.
\item [Command Queue] contains the queue of commands to be executed by the BBS
Kernel. Commands are posted by Global Control and retrieved by each Local
Control. Commands are usually executed in the order in which they appear in
the queue, unless a particular command is marked \emph{high priority} (TBD). A
command can be a single step (a manageable piece of work that is forwarded to
the kernel), or a control message like \texttt{initialize}.
\item [Parameter database] contains the parameters that must be solved for
during the current self calibration run. It will be updated by Local
Controls. Global Control wil regularly check the quality of the solutions and
take appriopriate action (e.g., repeat the last step).
\end{description} \end{description}
\subsubsection{BBS Kernel} \subsubsection{BBS Kernel}
\label{subsubsec:interf-kernel} \label{subsubsec:interf-kernel}
...@@ -432,11 +482,11 @@ valid\\ ...@@ -432,11 +482,11 @@ valid\\
\subsubsection{Models, parameters, funklets, and coefficients} \subsubsection{Models, parameters, funklets, and coefficients}
\label{subsubsec:models-parameters-funklets-coefficients} \label{subsubsec:models-parameters-funklets-coefficients}
Self calibration revolves around fitting a \emph{model} to the observed data by Self calibration revolves around fitting a \emph{model} to the observed data
adjusting the parameters of the model. On the coarsest scale a model for the by adjusting the parameters of the model. On the coarsest scale a model for
calibration of \lofar describes the instrument, the environment, and the sky. the calibration of \lofar describes the instrument, the environment, and the
This model can be decomposed into smaller (sub)models, such as a model for the sky. This model can be decomposed into smaller (sub)models, such as a model
beamshape, the bandpass, or the ionosphere (see also \cite[sec. for the beamshape, the bandpass, or the ionosphere (see also \cite[sec.
2]{LOFAR-ASTRON-SDD-050}). 2]{LOFAR-ASTRON-SDD-050}).
\todo{ref Hamaker, aips++ note of Jan N.} \todo{ref Hamaker, aips++ note of Jan N.}
...@@ -650,19 +700,19 @@ memory. ...@@ -650,19 +700,19 @@ memory.
\subsubsection{BBS Strategy} \subsubsection{BBS Strategy}
\label{subsubsec:design-strategy} \label{subsubsec:design-strategy}
One iteration in the so-called \emph{Major One iteration in the so-called \emph{Major
Cycle}~\cite[sec.~4.1]{LOFAR-ASTRON-SDD-050} can be described by a BBS Cycle}~\cite[sec.~4.1]{LOFAR-ASTRON-SDD-050} can be described by a BBS
Strategy. A strategy defines a relationship between the data set of a given Strategy. A strategy defines a relationship between the data set of a given
observation, which is stored in a Measurement Set~\cite{aips++note229}, and the parameter database observation, which is stored in a Measurement Set~\cite{aips++note229}, and
holding (intermediate) values of the model parameters that will be estimated the parameter database holding (intermediate) values of the model parameters
as part of the self calibration process. At least two models are used in the that will be estimated as part of the self calibration process. At least two
current self calibration setup: the Local Sky Model (LSM) and the Instrument models are used in the current self calibration setup: the Local Sky Model
Model. The Data Selection associated with a BBS Strategy defines the selection (LSM) and the Instrument Model. The Data Selection associated with a BBS
of the observed data that will be used for the complete strategy. Here you Strategy defines the selection of the observed data that will be used for the
can, for example, specify which frequency bands, time intervals, and baselines complete strategy. Here you can, for example, specify which frequency bands,
should be used during this self calibration run. A strategy is defined in time intervals, and baselines should be used during this self calibration
terms of one or more BBS Steps (see section~\ref{subsubsec:design-step} run. A strategy is defined in terms of one or more BBS Steps (see
below). section~\ref{subsubsec:design-step} below).
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
...@@ -675,10 +725,10 @@ current self calibration run.} ...@@ -675,10 +725,10 @@ current self calibration run.}
\subsubsection{BBS Step} \subsubsection{BBS Step}
\label{subsubsec:design-step} \label{subsubsec:design-step}
%A BBS Strategy is defined in terms of one or more BBS Steps. %A BBS Strategy is defined in terms of one or more BBS Steps.
The BBS Step class is designed as a Composite pattern~\cite{Gamma1995}, which means that The BBS Step class is designed as a Composite pattern~\cite{Gamma1995}, which
each BBS Step can itself be made up of one or more BBS Steps. The Composite means that each BBS Step can itself be made up of one or more BBS Steps. The
pattern provides an easy way to define a tree-like structure. Leaf classes, Composite pattern provides an easy way to define a tree-like structure. Leaf
like SolveStep cannot be further subdivided; they describe one single classes, like SolveStep cannot be further subdivided; they describe one single
piece of work that can be handed over to the BBS Kernel. Currently, there is a piece of work that can be handed over to the BBS Kernel. Currently, there is a
total of seven leaf classes, each defining one single piece of work. total of seven leaf classes, each defining one single piece of work.
...@@ -695,26 +745,105 @@ executed by the BBS Kernel as part of the current self calibration run.} ...@@ -695,26 +745,105 @@ executed by the BBS Kernel as part of the current self calibration run.}
\subsubsection{Global Control} \subsubsection{Global Control}
\label{subsubsec:design-global-control} \label{subsubsec:design-global-control}
BBS Global Control is reponsible for managing the execution of a self BBS Global Control is reponsible for managing the execution of a self
calibration run. The program flow is shown in the activity diagram (see figure~\ref{fig:global-control-activity}). calibration run. The program flow is shown in the activity diagram (see
figure~\ref{fig:global-control-activity-diagram}).
At start-up it reads in the \emph{parset} file that was supplied by ACC. This file contains, among other things, a unique identification of the observational data that must be self calibrated. Next it queries the Command Queue database to see if it is starting a new run, or recovering from an "aborted" run.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
\includegraphics[width=0.7\textwidth]{images/bbs-global-control-activity-diagram} \includegraphics[width=0.7\textwidth]{images/bbs-global-control-activity-diagram}
\caption{Activity diagram of the BBS Global Control} \caption{Activity diagram of the BBS Global Control}
\label{fig:global-control-activity} \label{fig:global-control-activity-diagram}
\end{figure} \end{figure}
\paragraph*{Main Flow}
At start-up, Global Control reads the \emph{parset} file that was supplied by
ACC, and initializes itself. Next, it queries the Command Queue database to
see if it is starting a new run, or recovering from an "aborted" run. If it
started a new run, it starts by posting the Strategy to the command queue,
followed by an \textit{initialize} command, which is needed to inform the
Local Controllers that a Strategy is available now. Next, it posts a
\textit{next chunk} command, indicating that the whole sequence of steps (as
represented by a strategy) should be repeated for the next chunk of data. The
size of a data chunk is determined by the work domain size.
Global Control now enters a loop, posting steps to the command queue, until
either there are no more steps left in the strategy, or the step posted last
is a synchronization point. In the former case, it will send a \textit{next
chunk} command an re-execute the loop. In the latter case, it will wait until
all Local Controllers have finshed processing all steps posted up to now,
before sending the next step. Iteration over a chunk of data ends when there
are no more steps left in the command queue.
\paragraph*{Alternative Flows}
If all Local Controllers return an \texttt{OUT\_OF\_DATA} result to the
\textit{next chunk} command, then the self calibration run is completed; send
a \textit{finalize} command to inform all Local Controllers and set the
\textit{done} flag for the current strategy.
If Global Control is recovering from an "aborted" run (possibly due to a crash
of itself), it sends a high priority \textit{recover} command and checks if
all Local Controllers respond to it. Next, Global Control queries the Command
Queue database to get the last step in the command queue and resumes
operation.
\paragraph*{Reminders}
\begin{itemize}
\item Determine if a step is a synchronization point or not.
\item Strategy needs a "done" flag, which must be set by the Global Controller.
\item Strategy needs a unique identifier which must be supplied by SAS.
\item Step needs a sequence number
\item Check result code (e.g., \texttt{OUT\_OF\_DATA}) of "next chunck" command
\item Once a cluster node is out of data it will respond with
\texttt{OUT\_OF\_DATA} on all subsequent "next step" or "next chunk" commands.
\item SAS should provide the number of Local Control nodes.
\end{itemize}
\subsubsection{Local Control} \subsubsection{Local Control}
\label{subsubsec:design-local-control} \label{subsubsec:design-local-control}
BBS Local Control is responsible for managing the processing of one command
(e.g., a BBS SingleStep) by the BBS Kernel. The program flow is show in the
activity diagram (see figure~\ref{fig:local-control-activity-diagram}).
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
\includegraphics[width=0.7\textwidth]{images/bbs-local-control-activity-diagram} \includegraphics[width=0.7\textwidth]{images/bbs-local-control-activity-diagram}
\caption{Activity diagram of the BBS Local Control} \caption{Activity diagram of the BBS Local Control}
\label{fig:local-control-activity} \label{fig:local-control-activity-diagram}
\end{figure} \end{figure}
\paragraph*{Main Flow}
At start-up, Local Control reads the \emph{parset} file that was supplied by
ACC, and initializes itself. Next, it queries the Command Queue database in
order to find out whether it is starting a new calibration run, or recovering
from an "aborted" run. The logic to derive whether a new run is started is
somewhat complicated. Here it is for completeness:
\begin{quote}
If a strategy with the (by SAS) given ID is not yet present in the database,
then this is a new run; else if the strategy is present and its \textit{done}
flag is set, then we're done; else if the next command in the command queue is
\textit{initialize}, or if there are no commands at all, then this is a new
run; else we're recovering from, e.g., a crash.
\end{quote}
If it is a new run, Local Control enters the main loop. It tries to retrieve a
command from the Command Queue database. If there are no commands present in
the queue, it will wait for a trigger from the database to retrieve the next
command. Currently, three different commands can be handled by the Local
Controller: \textit{initialize}, \textit{finalize}, and any of the BBS
SingleSteps. If the command is a BBS SingleStep, it will be forwarded to the
BBS Kernel, which will process it. The result returned by the kernel will be
posted to the result table of the command queue.
\paragraph*{Alternative Flows}
If Local Control is not starting a new run, it might be recovering from a
crash. Recovery is not yet completely modeled; therefore there is currently no
control flow from the Recovery action.
If the received command is \textit{initialize}, a Strategy with the given ID
will be retrieved from the command queue. It is an error if this strategy is
not present in the Command Queue database.
If the received command is \textit{finalize}, Local Control will clean up and
exit.
\subsection{BBS Kernel} \subsection{BBS Kernel}
\label{subsec:design-kernel} \label{subsec:design-kernel}
...@@ -955,29 +1084,35 @@ entirely local solve domains over a (unix domain) socket. ...@@ -955,29 +1084,35 @@ entirely local solve domains over a (unix domain) socket.
\subsection{BBS Database} \subsection{BBS Database}
\label{subsec:design-database} \label{subsec:design-database}
The BBS Database actually consists of two databases: a Command Queue database
and a Parameter Solutions database. However, they will probably reside on the
same node.
The current Prediffer implementation can handle parameters as follows: \subsubsection{Command Queue}
At the first iteration it reads the parameters from an NFS-visible Berkeley DB (BDB) database or AIPS++ table. \label{subsubsec:design-command-queue}
The Prediffer can handle the updated parameter values in 3 ways: \begin{figure}[!ht]
Read from an NFS-visible database or AIPS++ table. \includegraphics[width=0.8\textwidth]{images/command-queue-datamodel}
Read from a replicated BDB database. \caption{Data model of the Command Queue database}
Receive from the Controller. \end{figure}
Given the amount of parameters and domains, searching parameters and domains in The Command Queue is, as the name suggests, a command queue. Commands posted
the database can take some time. Optimization of the parameter data structure by Global Control are stored inside the Command Queue. Each entry in the
and evaluation of high-performance distributed embedded databases are ongoing. strategy table represents one BBS Strategy, which is associated with one or
more entries in the step table, representing the BBS SingleSteps in the BBS
\subsubsection{Data Model} Strategy. The Local Controllers fetch these steps from the command queue, one
\label{subsubsec:design-data-model} by one. When one step is completed, the status result is posted to the result
%\begin{figure}[!ht] table. For each single step, there are as many entries in the result table as
%\includegraphics[width=\textwidth]{images/blackboard-datamodel} there are Local Controllers, processing these steps. For example, suppose a
%\caption{Data model of the Blackboard database} Strategy consists of 10 SingleSteps and there are 20 Local Controllers. Then,
%\end{figure} when the Strategy is done, we will have $10\times20=200$ entries in the result
table.
\subsubsection{Work Orders}
\label{subsubsec:design-work-orders}
\subsubsection{Parameter Solutions} \subsubsection{Parameter Solutions}
\label{subsubsec:design-parmsolutions} \label{subsubsec:design-parameter-solutions}
Currently, the Parameter Solutions database is stored as an AIPS++ table, but
in the near future it will be migrated to a "real" database. A data model has
not been designed yet. Updated solutions of the parameters are only written at
the end of one solve operation (not for each iteration), in order to reduce
database I/O.
\subsection{Performance considerations} \subsection{Performance considerations}
\label{subsec:performance-considerations} \label{subsec:performance-considerations}
...@@ -1042,6 +1177,7 @@ external processes. ...@@ -1042,6 +1177,7 @@ external processes.
\appendix \appendix
\section{Configuration Syntax} \section{Configuration Syntax}
\label{sec:configuration-syntax}
This appendix describes the syntax of the BBS configuration file (a.k.a. This appendix describes the syntax of the BBS configuration file (a.k.a.
parset). Its goal is to foster a common understanding and terminology. At the parset). Its goal is to foster a common understanding and terminology. At the
......
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment