Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
LOFAR
Manage
Activity
Members
Labels
Plan
Issues
Wiki
Jira issues
Open Jira
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Deploy
Releases
Package registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Code review analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
RadioObservatory
LOFAR
Commits
bb4204ba
Commit
bb4204ba
authored
15 years ago
by
Rob van Nieuwpoort
Browse files
Options
Downloads
Patches
Plain Diff
Bug 1198: s4 done
parent
e3f05d67
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/papers/2010/SPM/spm.tex
+41
-47
41 additions, 47 deletions
doc/papers/2010/SPM/spm.tex
with
41 additions
and
47 deletions
doc/papers/2010/SPM/spm.tex
+
41
−
47
View file @
bb4204ba
...
@@ -149,6 +149,17 @@ to other instruments.
...
@@ -149,6 +149,17 @@ to other instruments.
\section
{
Trends in radio astronomy
}
\section
{
Trends in radio astronomy
}
%% @@@
%% It is important that the authors take a tutorial
%% oriented style and more carefully introduce the
%% application context, including radio-astronomy
%% basics, instruments that they use (including
%% installation roadmap). The algorithm is quite
%% simple and so the strength of the paper lies in
%% the thoroughness of the analysis, and the
%% aforementioned tutorial background.
%% @@@
%- signal processing neemt een dominantere rol (meer antennes, etc)
%- signal processing neemt een dominantere rol (meer antennes, etc)
%- voorbeelden. pathfinders voor SKA.
%- voorbeelden. pathfinders voor SKA.
%- computationally intensive, SKA even more
%- computationally intensive, SKA even more
...
@@ -160,10 +171,6 @@ rely less on concrete, steel, and extreme cooling techniques, but more on
...
@@ -160,10 +171,6 @@ rely less on concrete, steel, and extreme cooling techniques, but more on
signal-processing techniques.
signal-processing techniques.
For example, LOFAR~
\cite
{
Butcher:04,deVos:09
}
is a distributed sensor network
For example, LOFAR~
\cite
{
Butcher:04,deVos:09
}
is a distributed sensor network
that combines the signals of tens of thousands of simple receiver elements.
that combines the signals of tens of thousands of simple receiver elements.
%Unlike traditional telescopes, that typically use custom-built hardware to
%process data, LOFAR uses programmable FPGAs for on-the-field station
%processing and a Blue Gene/P supercomputer to process data centrally, in real
%time.
Also, Aperture Array tiles like Embrace~
\cite
{
?
}
and Focal Plane Arrays
Also, Aperture Array tiles like Embrace~
\cite
{
?
}
and Focal Plane Arrays
like Apertif~
\cite
{
?
}
are novel multi-receiver concepts that require huge
like Apertif~
\cite
{
?
}
are novel multi-receiver concepts that require huge
amounts of processing power to combine the data from the receiving elements.
amounts of processing power to combine the data from the receiving elements.
...
@@ -173,37 +180,20 @@ and multiple, concurrent observation directions.
...
@@ -173,37 +180,20 @@ and multiple, concurrent observation directions.
%@@@ later even kijken: computing advances maken nieuwe signal processing technieken en telescopen / instrumenten mogelijk.
%@@@ later even kijken: computing advances maken nieuwe signal processing technieken en telescopen / instrumenten mogelijk.
%% @@@
%% It is important that the authors take a tutorial
%% oriented style and more carefully introduce the
%% application context, including radio-astronomy
%% basics, instruments that they use (including
%% installation roadmap). The algorithm is quite
%% simple and so the strength of the paper lies in
%% the thoroughness of the analysis, and the
%% aforementioned tutorial background.
%% @@@
%% SKA + pathfinders: EMBRACE, LOFAR, ASKAP, meerKAT
%% SKA + pathfinders: EMBRACE, LOFAR, ASKAP, meerKAT
The signal-processing hardware technology used to process telescope
data also changes rapidly. Only a decade ago, correlators required
special-purpose ASICs to keep up with the high data rates and
processing requirements. The advent of sufficiently fast FPGAs
The signal-processing hardware technology used to process telescope data
significantly lowered the developments times and costs of
also changes rapidly.
newer-generation correlators, and increased the flexibility
Only a decade ago, correlators required special-purpose ASICs to keep up with
substantially. LOFAR requires even more flexibility to support many
the high data rates and processing requirements.
different processing pipelines for various observation modes, and uses
The advent of sufficiently fast FPGAs significantly lowered the developments
FPGAs for on-the-field station processing and a Blue Gene/P
times and costs of newer-generation correlators, and increased the flexibility
supercomputer to perform real-time, central processing.
substantially.
LOFAR requires even more flexibility to support many different processing
Recent many-core architectures seem to be a viable complement to the aforementioned processing platforms.
pipelines for various observation modes, and uses a Blue Gene/P supercomputer
to perform real-time, central processing.
GPUs seem to be a viable complement to the aforementioned processing platforms.
GPUs provide more processing power and are more power-efficient than CPUs,
GPUs provide more processing power and are more power-efficient than CPUs,
while GPUs are more flexible and easier to program than FPGAs.
while GPUs are more flexible and easier to program than FPGAs.
Since GPUs of different vendors are mutually quite different, we did an
Since GPUs of different vendors are mutually quite different, we did an
...
@@ -291,16 +281,19 @@ since we need this later in the pipeline for calibration purposes.
...
@@ -291,16 +281,19 @@ since we need this later in the pipeline for calibration purposes.
The autocorrelations can be computed with half the number of instructions.
The autocorrelations can be computed with half the number of instructions.
We can implement the correlation operation very efficiently, with only
We can implement the correlation operation very efficiently, with only
four fused-multily-add (fma) instructions, doing eight floating-point operations in
four fused-multily-add (fma) instructions, doing eight floating-point
total. For each pair of receivers, we have to do this four times, once
operations in total. For each pair of receivers, we have to do this
for each combination of polarizations. Thus, in total we need 32
four times, once for each combination of polarizations. Thus, in total
operations. To perform these operations, we have to load the samples generated by two different receivers from memory.
we need 32 operations. To perform these operations, we have to load
As explained above, the samples each consist of four single precision floating point numbers (a real and imaginary part, and two polarizations).
the samples generated by two different receivers from memory. As
Therefore, we need to load 8 floats or 32 bytes in total.
explained above, the samples each consist of four single precision
This results in
\emph
{
exactly one FLOP/byte
}
. The number of operations that is performed per byte
floating point numbers (a real and imaginary part, and two
that has to be loaded from main memory is called the
\emph
{
arithmetic intensity
}
~
\cite
{
system-performance
}
.
polarizations). Therefore, we need to load 8 floats or 32 bytes in
For the correlation algorithm,
total. This results in
\emph
{
exactly one FLOP/byte
}
. The number of
the arithmetic intensity is extremely low.
operations that is performed per byte that has to be loaded from main
memory is called the
\emph
{
arithmetic
intensity
}
~
\cite
{
system-performance
}
. For the correlation
algorithm, the arithmetic intensity is extremely low.
...
@@ -380,10 +373,11 @@ a summary of the most important similarities and differences for signal processi
...
@@ -380,10 +373,11 @@ a summary of the most important similarities and differences for signal processi
\subsection
{
General Purpose multi-core CPU (Intel Core i7 920)
}
\subsection
{
General Purpose multi-core CPU (Intel Core i7 920)
}
As a reference, we implemented the correlator on a multi-core
As a reference, we implemented the correlator on a multi-core
general-purpose architecture, in this case an Intel core~i7. The theoretical peak performance of the
general-purpose architecture, in this case an Intel core~i7. The
system is 85~gflops, in single precision. The parallelism comes from
theoretical peak performance of the system is 85~gflops, in single
four cores with two-way hyperthreading, and a vector length of four
precision. The parallelism comes from four cores with two-way
floats, provided by the SSE4 instruction set.
hyperthreading, and a vector length of four floats, provided by the
SSE4 instruction set.
SSE4 does not provide fused multiply-add instructions, but the Core~i7
SSE4 does not provide fused multiply-add instructions, but the Core~i7
issues vector-multiply and vector-add instructions concurrently in
issues vector-multiply and vector-add instructions concurrently in
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment