Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
LOFAR
Manage
Activity
Members
Labels
Plan
Issues
Wiki
Jira issues
Open Jira
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Code review analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
RadioObservatory
LOFAR
Commits
a27e0936
Commit
a27e0936
authored
18 years ago
by
Marcel Loose
Browse files
Options
Downloads
Patches
Plain Diff
BugID: 1038
Merged with Joris' work.
parent
6b92b905
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/BBS/BBS-SDD.tex
+102
-87
102 additions, 87 deletions
doc/BBS/BBS-SDD.tex
with
102 additions
and
87 deletions
doc/BBS/BBS-SDD.tex
+
102
−
87
View file @
a27e0936
...
...
@@ -13,6 +13,7 @@
\usepackage
{
layout
}
\usepackage
{
color
}
\usepackage
{
xspace
}
\usepackage
{
url
}
%% could also use package hyperref.
%\usepackage[colorlinks=false]{hyperref}
%
\newcounter
{
decision
}
...
...
@@ -60,19 +61,16 @@
\subsection
{
Purpose of This Document
}
\label
{
subsec:purpose
}
%
\textsc
{
aips++
}
\aips
\meqtree
This document provides a detailed description of the architectural and
software design of the Blackboard Selfcal System (BBS) that will be used for
the
off-line
calibration of the LOFAR observations. The primary goal of this
document
is to provide information that is detailed enough to help the reader
the calibration of the LOFAR observations. The primary goal of this
document
is to provide information that is detailed enough to help the reader
understand the design considerations, choice of software architecture and
global design. We will not delve into the level of detailed software design,
since this will likely cause discrepancies between the actual code and this
document. For this level of detail, the reader is suggested to consult the
online code documentation.
online code documentation. This document supersedes the previous version of
the BBS SDD~
\cite
{
LOFAR-ASTRON-SDD-052
}
.
\subsection
{
Executive Summary
}
\label
{
subsec:summary
}
...
...
@@ -83,7 +81,7 @@ online code documentation.
\item
[BBS] BlackBoard Selfcal System
\end{description}
\
pagebreak
\
cleardoublepage
\section
{
Architectural Design
}
\label
{
sec:architectural-design
}
...
...
@@ -93,9 +91,9 @@ online code documentation.
\subsection
{
Design Considerations
}
\label
{
subsec:considerations
}
The BlackBoard SelfCal (BBS) system is designed to do the calibration of LOFAR
in an efficient way. Although BBS is mainly developed for LOFAR, it may also
be
used to calibrate other instruments as soon as their specific algorithms
are
plugged in.
in an efficient way. Although BBS is mainly developed for LOFAR, it may also
be
used to calibrate other instruments as soon as their specific algorithms
are
plugged in.
\subsubsection
{
Data Volume
}
\label
{
subsubsec:data-volume
}
...
...
@@ -115,40 +113,41 @@ computers. Each computer will have to store and manipulate part of the data.
The Selfcal application will be running on the off-line and auxiliary
processing clusters of the central processing facility (see
\cite
{
LOFAR-ASTRON-ADD-012
}
). These clusters consist of Linux PCs in a high
bandwidth network. The BBS application will typically run on 50 to 500 of these
nodes. Data stored in the CEP intermediate storage facility will be distributed
over multiple disks and will be accessed by multiple nodes concurrently.
Reordering tens of terabytes of data takes a lot of time and should be avoided.
Therefore the data should be distributed such that the various applications
(e.g., calibration and imaging) can operate well without reordering. The
distribution should be such that large chunks of data can be
processed locally and only small amounts of data need to be sent to other
machines. There are a few axes along which the data may be distributed:
bandwidth network. The BBS application will run on a large cluster, typically
consisting of several hundred nodes. Data stored in the CEP intermediate
storage facility will be distributed over multiple disks and will be accessed
by multiple nodes concurrently. Reordering tens of terabytes of data takes too
much time and should be avoided. Therefore the data should be distributed
such that the various applications (e.g., calibration and imaging) can operate
well without reordering. The distribution should be such that large chunks of
data can be processed locally and only small amounts of data need to be sent
to other machines. There are a few axes along which the data may be
distributed:
\begin{description}
\item
[Time] is
a ba
d candidate, because a time slot contains a
lot of data (up
to 0.7~GBytes during initial operation). This may lead to
problems in the
online system when all data of a time slot are sent to a
single machine and
written there. Another problem is that parallelization of
imaging gets hard
because the data of all time slots have to be combined.
\item
[Baseline] seems a better candidate, but will lead to imaging problems
as
well.
This is because a single image needs data from different machines, so
large
amounts of gridded or FFT-ed data have to be sent around.
\item
[Time] is
probably not a goo
d candidate, because a time slot contains a
lot of data (up
to 0.7~GBytes during initial operation). This may lead to
problems in the
online system when all data of a time slot are sent to a
single machine and
written there. Another problem is that parallelization of
imaging gets hard
because the data of all time slots have to be combined.
\item
[Baseline] seems a better candidate, but will lead to imaging problems
.
This is because a single image needs data from different machines, so
large
amounts of gridded or FFT-ed data have to be sent around.
\item
[Frequency] seems to be the best candidate. Creating an image is usually
done per channel or for a few channels, so in principle the whole imaging
process can be done locally. It will result in an image cube distributed over
many machines, so the image display and analysis software have to be able to
handle this. The image cube can be very large (e.g., 256~GBytes for
1000~channels of
$
4000
\times
4000
$
image
s for the 4~Stokes parameters).
\\
Distribution in frequency means that each subband is stored on a
separate
machine. If needed, each subband can be distributed further. Of
course, each
machine should contain about the same amount of data to get good
load
balancing.
\\
Note that this distribution matches well with the way the
correlator and
online system is designed.
1000~channels of
$
4000
\times
4000
$
pixel
s for the 4~Stokes parameters).
\\
Distribution in frequency means that
, e.g.,
each subband is stored on a
separate
machine. If needed, each subband can be distributed further. Of
course, each
machine should contain about the same amount of data to get good
load
balancing.
\\
Note that this distribution matches well with the way the
correlator and
online system is designed.
\end{description}
The BBS calibration software is not dependent on a specific distribution, so
in the future other distributions can be used when applicable. However, it has
not been
decid
ed yet if that is also true for the imaging software.
not been
evaluat
ed yet if that is also true for the imaging software.
\subsubsection
{
Scalable Architecture
}
\label
{
subsubsec:scalable-architecture
}
...
...
@@ -158,14 +157,15 @@ be avoided as much as possible. When distributing data over frequency, we can
almost completely decouple the computing nodes, as we saw in the previous
section. Another way to reduce coupling is to make communication indirect as
well. Computing nodes should communicate through some kind of global shared
memory. There are several architectural patterns that describe this
approach.
One of the oldest and best known is the Blackboard pattern, which we
will
describe briefly below.
memory. There are several architectural patterns that describe this
approach.
One of the oldest and best known is the Blackboard pattern, which we
will
describe briefly below.
Computing nodes should communicate through some kind of global shared
memory. One obvious candidate for such shared memory is a database system. It
provides locking and notification (trigger) mechanisms, and sometimes even
command queueing.
command queueing. We have to be careful, though, that the database will not
become a bottleneck.
\subsection
{
Blackboard Pattern
}
\label
{
subsec:blackboard
}
...
...
@@ -180,8 +180,7 @@ control component evaluates the current state of processing and coordinates
the specialized programs. This data-directed control regime makes
experimentation with different algorithms possible, and allows experimentally
derived heuristics to control processing. This architecture is described in
\cite
{
Buschmann1996
}
and
\cite
{
LOFAR-ASTRON-SDD-002
}
.
\cite
{
Buschmann1996
}
and
\cite
{
LOFAR-ASTRON-SDD-002
}
.
The Blackboard architecture is ideal for solving problems for which no
predetermined algorithm or solve strategy is known. However, for the design of
...
...
@@ -189,7 +188,12 @@ the BBS system, we've come to the conclusion that the operational system will
benefit in terms of performance when using a predefined solving strategy. The
"best" algorithm to perform a self-calibration run can be chosen from a
relatively short list of calibration strategies in advance (based, e.g., on
heuristics, or suggested by research done with the
\meqtree
system).
heuristics, or suggested by research done with the
\meqtree
system). In fact,
the Shared Respository pattern~
\cite
{
Lalanda1998
}
, which can be seen as a
generalization of the Blackboard pattern, is probably a better match for the
BBS system. It realizes indirect communication using a repository as shared
memory. Figure~
\ref
{
fig:shared-repository-pattern
}
show the specialization
hierarchy of patterns based on the Shared Repository pattern.
\begin{figure}
[!ht]
\centering
...
...
@@ -214,12 +218,6 @@ heuristics, or suggested by research done with the \meqtree system).
\label
{
fig:shared-repository-pattern
}
\end{figure}
In fact, the Shared Respository pattern~
\cite
{
Lalanda1998
}
, which can be seen
as a generalization of the Blackboard pattern, is probably a better match for
the BBS system. It realizes indirect communication using a repository as
shared memory. Figure~
\ref
{
fig:shared-repository-pattern
}
show the
specialization hierarchy of patterns based on the Shared Repository pattern.
For BBS, we will need a global controller, which could be implemented using
the Controller pattern; and a notification or trigger mechanism to inform the
computing nodes of changes to the shared memory, which could be implemented
...
...
@@ -235,29 +233,18 @@ will contain the values and quality of the (partial) solutions calculated by
each computing node. The database can be used as an external source for
various assessments of the solutions.
%
%\subsubsection{Controller}
%\label{subsubsec:controller}
%
%\subsubsection{Knowledge Sources}
%\label{subsubsec:ks}
%
%\subsubsection{Blackboard}
%\label{subsubsec:bb}
%
\pagebreak
\cleardoublepage
\section
{
System Overview
}
\label
{
sec:overview
}
\subsection
{
Subsystems
}
\label
{
subsec:subsystems
}
BBS is split into two parts
which are described in detail in other
chapters. The BBS Control takes care of the distributed processing by means of
the Blackboard pattern. The BBS Kernel does th
e ac
tual processing; it executes
a series of steps where each step consists of an
operation like solve or
correct.
BBS is split into two parts
. BBS Control takes care of the distributed
processing by means of the Blackboard pattern. BBS Kernel does the actual
processing; it executes a series of steps wher
e
e
ac
h step consists of an
operation like solve or
correct.
\subsubsection
{
BBS Control
}
\label
{
subsubsec:sys-control
}
...
...
@@ -266,20 +253,21 @@ The BBS Control subsystem is responsible for controlling the execution of a
self calibration strategy. A strategy consists of an ordered list of commands,
which will be executed by the BBS Kernel subsystem.
The key idea is that a subset of the data (the so-called "work domain") is
kept in memory; as many commands as possible are executed on these data before
the next data chunk is accessed. A strategy defines the size of the work
domain (in time and frequency) and optionally which stations and correlations
are contained in the work domain. It is also possible to define an integration
interval in time and frequency to achieve that, say, a longer time interval
can be used. The basic concept is that on each machine the data contained in
the work domain have to fit in memory. The BBS Kernel iterates over the work
domains to process all the data. For each strategy a number of steps can be
defined. For instance, when peeling 10 Cat I sources, 20 steps can be
defined. For each source step 1 is solving for the gain in the direction of
the source and step 2 is subtracting the source. Note that only after the last
subtraction the residual data need to be written. In this way the data are
read and/or written only once per strategy.
The key idea is that a subset of the data (the so-called
\emph
{
work domain
}
)
is kept in memory; as many commands as possible are executed on these data
before the next data chunk is accessed. A strategy defines the size of the
work domain (in time and frequency) and optionally which stations and
correlations are contained in the work domain. It is also possible to define
an integration interval in time and frequency to achieve that, say, a longer
time interval can be used. The basic concept is that on each machine the data
contained in the work domain have to fit in memory. The BBS Kernel iterates
over the work domains to process all the data. For each strategy a number of
steps can be defined. For instance, when peeling 10 Cat I sources, at least 30
steps can be defined. For each source, step~1 is solving for the gain in the
direction of the source, step~2 is subtracting the source, and step~3 shifts
to the next source. Note that only after the last subtraction the residual
data need to be written. In this way the data are read and/or written only
once per strategy.
\begin{figure}
[!ht]
\centering
...
...
@@ -291,8 +279,8 @@ read and/or written only once per strategy.
The calibration process is controlled by the BBS Control
subsystem. Figure~
\ref
{
fig:bbs-control-global-design
}
depicts the general
control structure. The BBS Control subsystem consists of one global
controller, which acts as the main process, and multiple local controller,
whi
ch control
th
e BBS Kernel subsystem. The global controller posts one or
controller, which acts as the main process, and multiple local controller
s
,
ea
ch control
ling on
e BBS Kernel subsystem. The global controller posts one or
more commands (steps) to the Command Queue. Each local controller fetches the
next command from the Command Queue and forwards the command to the BBS Kernel
subsystem. The kernel returns parameter solutions and their quality metrics to
...
...
@@ -303,7 +291,7 @@ which action should be taken next.
Since all communication takes places via the Blackboard, there is no need for
a direct connection between the BBS Control and the BBS Kernel subsystems.
The Blackboard contains all the relevant information about the current state
of the self calibration process. This information
that
can be used by other
of the self calibration process. This information can be used by other
(external) processes to monitor the calibration process and to plot results.
See~
\cite
{
LOFAR-ASTRON-SDD-002
}
for more details on the Blackboard
architecture and roles of the controller.
...
...
@@ -338,7 +326,7 @@ architecture and roles of the controller.
\subsubsection
{
BBS Database
}
\label
{
subsubsec:interf-database
}
\
pagebreak
\
cleardoublepage
\section
{
Software Design
}
\label
{
sec:software-design
}
...
...
@@ -348,16 +336,43 @@ architecture and roles of the controller.
\subsubsection
{
BBS Strategy
}
\label
{
subsubsec:design-strategy
}
One iteration in the so-called
\emph
{
Major
Cycle
}
~
\cite
[sec.~4.1]
{
LOFAR-ASTRON-SDD-050
}
can be described by a BBS
Strategy. A strategy defines a relationship between the data set of a given
observation, which is stored in a measurement set, and the parameter database
holding (intermediate) values of the model parameters that will be estimated
as part of the self calibration process. At least two models are used in the
current self calibration setup: the Local Sky Model (LSM) and the Instrument
Model. The Data Selection associated with a BBS Strategy defines the selection
of the observed data that will be used for the complete strategy. Here you
can, for example, specify which frequency bands, time intervals, and baselines
should be used during this self calibration run. A strategy is defined in
terms of one or more BBS Steps (see section~
\ref
{
subsubsec:design-step
}
below).
\begin{figure}
[!ht]
\centering
\includegraphics
[width=0.5\textwidth]
{
images/bbs-strategy-class-diagram
}
\caption
{
The BBS Strategy class defines the strategy to be used for the current self calibration run.
}
\label
{
fig:bbsstrategy
}
\end{figure}
\subsubsection
{
BBS Step
}
\label
{
subsubsec:design-step
}
\begin{figure}
[!htb]
A BBS Strategy is defined in terms of one or more BBS Steps. The BBS Step
class is designed as a Composite pattern~
\cite
{
Gamma1995
}
, which means that
each BBS Step can itself be made up of one or more BBS Steps. The Composite
pattern provides an easy way to define a tree-like structure. Leaf classes,
like BBS SolveStep cannot be further subdivided; they describe one single
piece of work that can be handed over to the BBS Kernel. Currently, there is a
total of six leaf classes, each defining one single piece of work.
\begin{figure}
[!ht]
\centering
\includegraphics
[width=0.8\textwidth]
{
images/bbs-step-class-diagram
}
\caption
{
The BBS Step class family defines single pieces of work that can be
executed by the BBS Kernel as part of the current self calibration run.
}
\label
{
fig:bbsstep
}
\end{figure}
\subsubsection
{
Global Control
}
...
...
@@ -834,13 +849,13 @@ A database can still be used as a logging mechanism and for monitoring by extern
\end{itemize}
\
pagebreak
\
cleardoublepage
% References
\bibliographystyle
{
unsrt
}
\bibliography
{
lofar
}
\
pagebreak
\
cleardoublepage
\appendix
\section
{
Configuration Syntax
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment