Lofar imaging compression pipeline
This is a CWL workflow used to compress Lofar imaging data.
Workflow steps
There are two workflows defined. An graphical overview is given below.
download_and_compress_pipeline.cwl
This workflow is used by the LDV framework.

The data in this pipeline is first fetched, then the 'compress' step is ran. The compress step runs 'compress_pipeline.cwl', which does more than only compression. The steps take are mentioned in the section below.
compress_pipeline.cwl

Part of this pipeline identifies known issues and applies corrections if possible. It uses in place updates to prevent copying the measurement set.
NOTE: In-place update does not work well with CWL conditional steps, so a small bash script is used which either calls the fixing script or does nothing depending if the specific issue for the step was found
The checks that get performed are:
- abort if the target is the Sun
- run aoflagger if the input is raw data
- apply corrections based on the time of observation. These can be:
- FIX_ANTENNA_TABLE
- FIX_WEIGHT_SPECTRUM
- FIX_BROKEN_TILES
- FIX_STATION_ADDER
There are also some time ranges that cannot be solved. These issues include:
- INACCURATE_FLAGGING_LBA
- FAULTY_LBA_CALIBRATION_TABLES
- STATION_SENSITIVITY_ISSUE
- DELAY_COMPENSATION_ISSUE
The measurement set is then compressed the with Dysco if it was not already compressed. After all measure ment sets have been compressed, inspection plots are produced and metrics to quantify the amount of missing/flagged data are collected.
NOTE: Some plots may be skipped if the required metadata to make the plot is missing. This is often the case for "unspecified" datasets. When a plot is skipped, this is printed to stderr as a warning and thus is shown in the ATDB log file.
Quality metrics
The output json conatining a summary of the processing contains a set of quality flags, usually ranging between "poor", "moderate" and "good". The qua;ity is "unknown" if the (meta)data to calculate the metric is missing. The quality can also be "N/A" if the quality metric is not relevant for the dataset, for example a quality metric indicating dataloss of critical international stations will be "N/A" if only Dutch stations are present. A detailed outline of the quality metrics and their meaning can be found on the following confluence page: https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=LDV&title=Imaging+compression+workflow+metrics+v.2
Four plots are created, examples of these are:




The examples are taken from the following run: https://sdc-dev.astron.nl:5554/atdb/task_details/260179/1
Requirements
- CWL v1.2 compatible runner (e.g. cwltool/toil)
- Docker
Docker images
- git.astron.nl:5000/ldv/ldv-images/lofar-legacy:latest
- git.astron.nl:5000/ldv/ldv-images/lofar-ms-software:latest Docker images will be fetched the first time the workflow is used and be converted into a singularity image (sif file)
Running the workflow
In the repositories there are two workflows. One is capable to process the data as described in section workflow steps and can be executed as following
# Run the workflow
cwltool compress_pipeline.cwl [--flag_autocorrelation] --msin MEASUREMENT_SET
Another workflow, that is meant to be executed by the LDV infrastructure, takes as an input instead of a measurement set a SURL link of the data. Such a workflow can be execute with the command
# Run the workflow
cwltool download_and_compress_pipeline.cwl [--flag_autocorrelation] --surls [list of surl to process]
License
See LICENSE