Tensor-Core Correlator
The Tensor-Core Correlator is a GPU library that exploits the tensor cores of modern NVIDIA GPUs to compute cross/auto correlations 5-10 times more efficiently than regular GPU cores. Its primary use is to combine the signals of (many) receivers of a radio telescope. The library can be used in any FX correlator, but is not a full correlator application: it only computes the correlations. The rest of the application should take care of I/O, filtering, etc. For more information, see the paper (John W. Romein, The Tensor-Core Correlator, Astronomy and Astrophysics, 656(A32), pages 1-4, December 2021).
Brief overview on how to use the Tensor-Core Correlator library:
Clone the and build the repository, see below.
Include libtcc/Correlator.h
, and link with libtcc/libtcc.so
.
Create a tcc::Correlator
object with the number of receivers, channels, etc.
as arguments; this will automatically compile the CUDA code (at runtime).
Use the launchAsync method to correlate a block of samples; you must make
sure that the samples data is already in device memory.
The TCC adheres to RAII: any error will result in the failure to create
an tcc::Correlator()
object (and throw some explanatory exception).
test/SimpleExample/SimpleExample.cu
illustrates how the TCC library can be used.
The TCC internally uses wrappers around the CUDA driver API (util/cu.h
) and
the NVRTC library (util/nvrth.h
). The rest of the correlator code can use
these wrappers as well, use the CUDA driver API directly, use the CUDA
runtime API, or the OpenCL environment. See: test/SimpleExample/SimpleExample.cu
on how to use the CUDA runtime API; test/CorrelatorTest/CorrelatorTest.cc
on
how to use the CUDA driver API (wrappers); and
test/OpenCLCorrelatorTest/OpenCLCorrelatorTest.cc
on how to use TCC in an
OpenCL program. test/CorrelatorTest/CorrelatorTest.cc
is a much more versatile,
robust (and complex) example than test/SimpleExample/SimpleExample.cu
.
Input and output data types are defined as follows:
#if NR_BITS == 4
typedef complex_int4_t Sample;
typedef std::complex<int32_t> Visibility;
#elif NR_BITS == 8
typedef std::complex<int8_t> Sample;
typedef std::complex<int32_t> Visibility;
#elif NR_BITS == 16
typedef std::complex<__half> Sample;
typedef std::complex<float> Visibility;
#endif
#define NR_TIMES_PER_BLOCK (128 / NR_BITS)
typedef Sample Samples[NR_CHANNELS][NR_SAMPLES_PER_CHANNEL / NR_TIMES_PER_BLOCK][NR_RECEIVERS][NR_POLARIZATIONS][NR_TIMES_PER_BLOCK];
typedef Visibility Visibilities[NR_CHANNELS][NR_BASELINES][NR_POLARIZATIONS][NR_POLARIZATIONS];
Note that in 4-bit and 8-bit mode, the input samples may not contain -8 or -128
respectively, as these values cannot be conjugated properly.
The input data type (Samples
) is a weird format, but this seemed to be the only
format that yields good performance (tensor cores are very unforgiving).
Limitations:
-
NR_POLARIZATIONS
must be 2 -
NR_BITS
must be 4, 8, or 16 - the amount of samples over which is integrated) must be a multiple of 128 /
NR_BITS
(i.e., 32, 16, or 8 for 4-bit, 8-bit, or 16-bit input, respectively).
Building, testing, and installation
Clone the repository:
git clone https://git.astron.nl/RD/tensor-core-correlator.git
To build and install the project, run:
cmake -S . -B build
make -C build
make -C build install
To install in a custom location, e.g. ~/.local
, run:
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$HOME/.local
make -C build
make -C build install
To compile and run the tests, run:
cmake -S. -B build -DBUILD_TESTING=ON
make -C build
make -C build test
The tests require a GPU. On the DAS-6/ASTRON cluster you can request a GPU node and run the tests with the command:
srun -N 1 --gres=gpu:A4000:1 make -C build test
Note that in the command above a node with a NVIDIA A4000 GPU is requested, because the tests require a GPU that has tensor cores.
Example usage
The example
subdirectory has a minimal example that demonstrates how this
library can be integrated in another project. This example assumes that you
pre-installed both this library (libtcc
) and cudawrappers
. E.g. when
libtcc
is installed in <prefix>/libtcc
and cudawrappers
is installed in
<prefix>/cudawrappers
, you can build the example by running:
cmake . -DCMAKE_PREFIX_PATH="<prefix>/cudawrappers;<prefix>/tcc"
make
./example
Bugs/feedback
Contact John Romein (romein@astron.nl) to report bugs/feedback