Skip to content
Snippets Groups Projects
Commit c3466fb2 authored by alex's avatar alex
Browse files

Resolve RAP-202

Former-commit-id: df518679 [formerly 3105c505]
Former-commit-id: 7619b97b
Former-commit-id: 33052033
parent 0234484e
No related branches found
No related tags found
No related merge requests found
Showing
with 832 additions and 0 deletions
.. _pipeline_overview_old:
Pipeline overview
=================
.. note::
These instructions are outdated and only valid for **prefactor** 3.2 or older. Please check the :doc:`recent instrunctions page<pipelineoverview>`.
**Prefactor** is organized in three major parts to process **LOFAR** data:
.. image:: prefactor_workflow_sketch.png
``Pre-Facet-Calibrator``
Processes the (amplitude-)calibrator to derive direction-independent corrections. See :ref:`calibrator_pipeline_old` for details.
``Pre-Facet-Target``
Transfers the direction-independent corrections to the target and does direction-independent calibration of the target. See :ref:`target_pipeline_old` for details.
``Concatenate``
Concatenates the single-subband target data retrieved from the LTA into bands suitable for further processing with the initial-subtract pipeline. See :ref:`concatenate_pipeline` for details.
``Initial-Subtract``
Images the full FoV (and 1st side-lobe), generating a sky-model and subtracting it from the visibilities. See :ref:`initsubtract_pipeline` for details.
``Pre-Facet-Image``
Images the full FoV using the full bandwidth. See :ref:`image_pipeline` for details.
docs/source/polalign.png

361 KiB

598d381ba68be35a56e5ee8bbe0ce90c9f34cb1d
\ No newline at end of file
47e9728cbc118383209b6905b11f42452004a6ce
\ No newline at end of file
1bf6088377afb06425be9e86d48cfde4a07437ff
\ No newline at end of file
5301fd8f9a03325a97a56f4c443595e93362762e
\ No newline at end of file
docs/source/positional_offsets_sky.png

38 KiB

docs/source/prefactor_CWL_workflow_sketch.png

149 KiB

docs/source/prefactor_workflow_sketch.png

146 KiB

.. _data_preparation:
Preparing the data
------------------
**Prefactor** requires **LOFAR LBA** or **HBA** raw or pre-processed data. These data are
typically obtained from the LOFAR `Long-Term Archive`_.
- The calibrator and target data have to match, i.e., be observed close enough
in time that calibration values can be transferred.
- For each observation you should process all the calibrator data at once
together. Clock/TEC separation and flagging of bad amplitudes work better with
the full bandwidth.
- For the target pipeline you will need to have internet access from the machine you are running **prefactor**.
It is required in order to retrieve RM values from `CODE`_ and a global sky model (`TGSS`_ or `GSM`_). Both are hosted as online services.
It is also possible to provide an own target skymodel to **prefactor** (using the parameters ``target_skymodel``, and ``use_target``, see the :doc:`target<target>` pipeline parameter information).
.. note::
Processing of interleaved datasets is not currently supported.
**prefactor** can not handle multi-epoch observations at once.
Older versions of **prefactor**: All input measurement-sets for one pipeline run need to be in the same directory.
.. _CODE: ftp://ftp.aiub.unibe.ch/CODE/
.. _TGSS: http://tgssadr.strw.leidenuniv.nl/doku.php
.. _GSM: http://172.104.228.177/
.. _Long-Term Archive: https://lta.lofar.eu
.. _runprefactor:
Starting a pipeline
=====================
.. note::
If you are running the deprecated genericpipeline version of the pipeline (**prefactor** 3.2 or older), please check the :doc:`old instrunctions page<running_old>`.
Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with ``cwltool`` or ``toil`` for the HBA calibrator pipeline::
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json
$ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json
where ``prefactor.json`` is the input JSON file as described in the chapter :doc:`parset` and ``<install_dir>`` the location of the prefactor CWL description files.
.. note::
Instead of specifying all options in ``prefactor.json`` the user can also use command line options to override the defaults.
For standard LOFAR observations there are workflows available with pre-defined parameters (defaults) for **HBA** and **LBA** observations:
============================ ====================== =======================
**prefactor workflow** **HBA** **LBA**
---------------------------- ---------------------- -----------------------
``prefactor_calibrator.cwl`` ``HBA_calibrator.cwl`` ``LBA_calibrator.cwl``
``prefactor_target.cwl`` ``HBA_target.cwl`` ``LBA_target.cwl``
============================ ====================== =======================
.. note::
The **LBA** workflows are not (yet) available.
Pipeline options for ``cwltool``
--------------------------------
The following ``<cwl_options>`` are recommended to use for running **prefactor** with ``cwltool``:
* **---outdir**: specifies the location of the pipeline outputs
* **---tmpdir-prefix**: specifies the location of the intermediate data products
* **---parallel**: jobs will run in parallel
* **---no-container**: don't use Docker container (only for manual installation)
* **---preserve-entire-environment**: use system environment variables (only for manual installation)
* **---debug**: more verbose output (use only for debugging the pipeline)
.. note::
``cwltool`` has no option to resume a failed/crashed run. If you need this option have a look at ``toil``.
While the pipeline runs, the terminal will output the current state of the pipeline. It is also possible to pipe this output into a runtime logfile::
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json > logfile 2>&1
In the specified ``--tmpdir_prefix`` all temporary folders and files are generated. At the end of the run those files can be deleted.
Pipeline options for ``toil``
--------------------------------
The following ``<cwl_options>`` are recommended to use for running **prefactor** with ``toil``:
* **---workDir**: specifies the location of the intermediate data products
* **---outDir**: specificies the lcoation of the final data products
* **---jobStore**: location of the jobStore ("statefile")
* **---writeLogs**: location of the pipeline job logfiles
* **---logFile**: location of the main pipeline logfile
* **---logLevel**: can be **CRITICAL**, **ERROR**, **WARNING**, **INFO** or **DEBUG**
* **---batchSystem**: use speficic batch system of an HPC cluster or similar, e.g. ``slurm`` or ``single_machine``
* **---stats**: creates runtime statistics
* **---maxLocalJobs**: amount of local jobs to be run at the same time ("max_per_node")
* **---retryCount**: amount of retries for each failed pipeline job
* **---preserve-entire-environment**: use system environment variables (only for manual installation)
* **---no-container**: don't use Docker container (only for manual installation)
Stopping and restarting the pipeline
------------------------------------
You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it).
Sometimes some of
the processes that the pipeline started don't get properly terminated, so if the
genericpipeline process doesn't terminate you should look for its child
processes and terminate them too.
.. note::
If you stop and re-start pipelines a number of time then you should also
check occasionally if there are orphaned children that are eating up
resources on you computer.
You can restart a pipeline if using ``toil`` through adding the parameter ``--restart`` on the terminal. If you want to start from scratch you should delete the directory created via ``jobStore`` and all intermediate data products (usually specified via the ``--workDir`` parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.
Pipeline crashes
----------------
With ``cwltool`` a pipeline crash is a reported with this message::
WARNING Final process status is permanentFail
In order to figure out at which step the pipeline failed you can search for the term ``permanentFail`` in the ``toil`` or ``cwltool`` logfile::
$ more logfile | grep "Permanent Fail"
WARNING [job compare_station_list] completed permanentFail
WARNING [step compare_station_list] completed permanentFail
INFO [workflow prep] completed permanentFail
WARNING [step prep] completed permanentFail
INFO [workflow prefactor] completed permanentFail
WARNING [step prefactor] completed permanentFail
INFO [workflow ] completed permanentFail
WARNING Final process status is permanentFail
With that information it is possible to identify the failed step ``compare_station_list``. To find the corresponding part of the logfile where the step was launched you search for ``[job compare_station_list]``.
It is usually best to also check all lines leading with ``ERROR`` to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong. See :ref:`help` for tips on interpreting the error messages.
If you identify the problem and it does not affect the products that have been
already produced, you can launch the pipeline again, after correcting the issue
causing the process to stop. But in most cases it might be necessary to fully re-start the pipeline.
.. _runprefactor_old:
Starting a pipeline
-------------------
.. note::
These instructions are outdated and only valid for **prefactor** 3.2 or older. Please check the :doc:`recent instrunctions page<running>`.
Once you have the data and the parsets ready, you can run the pipeline using the
genericpipeline script, e.g.::
$ genericpipeline.py -d -c pipeline.cfg My_prefactor_calibrator.parset
.. note::
The -d option is recommended: it does make the log-files extremely large
(many megabytes), but without it, often the important information as to why a
pipeline run fails is not included.
While the pipeline runs, in the specified ``runtime_directory`` (see previous
section) new files are generated in a directory named after the parset name (e.g.,
if you are running ``My_prefactor_calibrator.parset`` a directory named
``My_prefactor_calibrator`` will appear in your ``runtime_directory``)::
$ ls My_prefactor_calibrator/
logs mapfiles parsets statefile
The logs dir contains all the logs of the pipeline runs, identified by the date
and time of execution, e.g.::
$ ls My_prefactor_calibrator/logs/2016-06-30T15:07:21/pipeline.log
These contain all the output from the processes the pipeline called and
diagnostic information about the pipeline. So they are useful to follow the
status of the process and possibly identify reasons why a process crashed.
While running the pipeline writes a statefile in the ``runtime_directory``, with all
the step which were successfully executed. If the pipeline stops for whatever
reason, you can re-run the same command and it will skip all the steps that are
already done and only work on those which are still missing.
The intermediate data files of the pipeline are written in the ``working_directory``
specified in the ``pipeline.cfg``.
Stopping and restarting the pipeline
------------------------------------
You can stop a pipeline run anytime by terminating the genericpipeline process
(typically by pressing CRTL-C in the terminal where you started it). Sometimes some of
the processes that the pipeline started don't get properly terminated, so if the
genericpipeline process doesn't terminate you should look for its child
processes and terminate them too.
.. note::
If you stop and re-start pipelines a number of time then you should also
check occasionally if there are orphaned children that are eating up
resources on you computer.
As mentioned earlier, you can re-start the pipeline by running the same command
with which you started it.
Pipeline crashes
----------------
It can happen that the pipeline stops with a message like this::
ERROR genericpipeline: LOFAR Pipeline finished unsuccesfully.
WARNING genericpipeline: recipe genericpipeline completed with errors
You need to read the log of that run to identify the reason why it stopped, e.g.::
> less My_prefactor_calibrator/logs/2016-06-30T15:07:21/pipeline.log
It is usually best to first check at the end of the file for what ended the
pipeline and then search from the beginning of the file for error or diagnostic
messages that tell you what exactly went wrong. See :ref:`help` for tips on
interpreting the error messages.
If you identify the problem and it does not affect the products that have been
already produced, you can launch the pipeline again, after correcting the issue
causing the process to stop.
Rerunning parts of the pipeline
--------------------------------
You can fully rerun a pipeline by deleting the runtime and working directories and restarting the pipeline.
To rerun parts of the pipeline that were (allegedly) already executed
successfully, you need to modify the ``statefile`` of the pipeline. To do this
there is a ``statefile_manipulation.py`` script as part of prefactor::
python prefactor/bin/statefile_manipulation.py My_Workdir/My_calibrator_job/statefile
If you then run the pipeline again, it will start at the step that you removed with the statefile manipulation tool.
docs/source/structure.png

43.5 KiB

.. _target_pipeline:
Target pipeline
===============
.. note::
If you are running the deprecated genericpipeline version of the pipeline (**prefactor** 3.2 or older), please check the :doc:`old instrunctions page<target_old>`.
This pipeline processes the target data in order to apply the direction-independent corrections from the calibrator pipeline. A first initial direction-independent self-calibration of the target field is performed, using a global sky model based on the `TGSS ADR`_ or the new `Global Sky Model`_ (GSM), and applied to the data.
This chapter will present the specific steps of the target pipeline in more detail.
All results (diagnostic plots and calibration solutions) will be stored usually in the ``--outdir`` directory specified with your ``cwltool`` or ``toil`` command.
.. image:: targetscheme.png
Prepare target, incl. "demixing" (``prep``)
-------------------------------------------
This part of the pipeline prepares the target data in order to be calibration-ready for the first direction-independent phase-only self-calibration against a global sky model.
This mainly includes mitigation of bad data (RFI, bad antennas, contaminations from A-Team sources), selection of the data to be calibrated (usually Dutch stations only), and some averaging to reduce data size and enhance the signal-to-noise ratio.
Furthermore, ionospheric Rotation Measure corrections are applied, using `RMextract`_
The user can specify whether to do raw data or pre-processed data flagging and whether demixing should be performed.
The basic workflows are:
- preparation of data (``prep``)
- concatenating and phase-only self-calibration against a global sky model (``gsmcal``)
- creating the finally calibrated data set, via applying the self-calibration solutions and compressing the data (``finalize``)
The workflow ``prep`` consists of:
- check for a potential station mismatch between calibrator solutions and the target data (step ``compare_station_list``)
- checking for nearby A-Team sources (step ``check_Ateam_separation``)
- creating a model of A-Team sources to be subtracted (step ``make_sourcedb_ateam``)
- getting ionospheric Rotation Measure corrections and adding it to the solutions (step ``createRMh5parm``)
.. image:: RMextract.png
- basic flagging, applying solutions, and averaging (subworkflow ``ndppp_prep_target``)
- edges of the band (``flagedge``) -- only used if ``raw_data : true``
- statistical flagging (``aoflag``) -- only used in ``raw_data : true``
- baseline flagging (``flagbaseline``)
- low elevation flagging (below 15 degress elevation) (``flagelev``)
- low amplitude flagging (below 1e-30) (``flagamp``)
- demix A-Team sources (``demix``) -- only used if specified ``demix : true``
- applying calibrator solutions (steps ``applyPA``, ``applyBandpass``, ``prep_target_applycal``)
- averaging of the data in time and frequency
- predicting impact of A-Team sources and write it to the ``MODEL_DATA`` column (step ``predict``)
- clipping time- and frequency chunks that are likely to be affected by A-Team sources (step ``Ateamclipper``)
Phase-only self-calibration (``gsmcal``)
-----------------------------------------
These steps aim for deriving a good first guess for the phase correction in the direction of the phase center (direction-independent phase correction).
Once this is done, the data is ready for further processing with direction-dependent calibration techniques, using software like `Rapthor`_, `factor`_ or `killMS`_.
The phase solutions derived from the ``gsmcal`` workflow are collected and loaded into **LoSoTo** to provide diagnostic plots:
- ``ph_freq??``: matrix plot of the phase solutions with time for a particular chunk of target data, where both polarizations are colorcoded
.. image:: ph_freq.png
- ``ph_poldif_freq??``: matrix plot of the XX-YY phase solutions with time for a particular chunk of target data
.. image:: ph_poldif_freq.png
- ``ph_pol??``: matrix plot of the phase solutions for the XX and YY polarization
.. image:: ph_polXX.png
- ``ph_poldif``: matrix plot of the phase solutions for the XX-YY polarization
.. image:: ph_poldif.png
The workflow ``gsmcal`` consists of:
- retrieving and creating a global sky model (steps ``find_skymodel_target``, ``make_sourcedb_target``)
- identification of fully flagged antennas (step ``identify_bad_antennas``)
- concatenating the data into chunks (subworkflow ``concat``)
- wide-band statistical flagging (steps ``ms_concat`` and ``aoflag``)
- checking for bad data chunks (step ``check_unflagged_fraction``)
- perform the self-calibration against the global skymodel (subworkflow ``calibrate_target``, baseline-dependend smoothing (step ``BLsmooth``) if specified ``do_smooth : true``))
Finalizing the prefactor output (``finalize``)
----------------------------------------------
These steps produce the final data output and many helpful diagnostics.
The workflow ``finalize`` consists of:
- adding missing stations to the solution set with zero phase and unit amplitude (for international stations, step ``add_missing_stations``)
- applying the phase-only self-calibration solutions to the data and compress them (step ``apply_gsmcal``)
- derive the structure function of the phases (step ``structure_function``)
- make a fast image of the target field (steps ``average`` and ``wsclean``)
- create plots of the ``uv``-coverage of the final data set (step ``uvplot``)
- create a summary file (step ``summary``)
The last step also incorporates full `Dysco`_ compression to save disk space. The fully calibrated data is stored in the DATA column of the final data set.
.. note::
All solutions are written in the h5parm file format via the steps ``H5parm_collector`` and called during all the workflows.
The solutions are stored in the final calibrator solution set ``results/cal_values/cal_solutions.h5``.
Further diagnostics
-------------------
The ``results`` directory will contain all relevant outputs of the current **prefactor** run, once the pipeline has finished:
- fully calibrated data sets
- logfiles in ``results/logs``
- summary file (JSON format) in ``results/summary``
- calibration solutions in ``results/cal_values/cal_solutions.h5``
- inspection plots in ``results/inspection``
The following diagnostic help to assess the quality of the data reduction:
- ``Ateam_separation.png``: shows the distance and the elevation of A-Team sources with respect to the analyzed observation
.. image:: Ateam_separation.png
- ``Ateamclipper.png``: fraction of flagged data due to their potential contamination from A-Team sources versus frequency
.. image:: Ateamclipper.png
- ``unflagged_fraction.png``: fraction of remaining unflagged data versus frequency
.. image:: unflagged_fraction.png
- ``??_uv-coverage_uvdist.png``: fraction of remaining unflagged data versus ``uv``-distance
.. image:: uv-coverage_uvdist.png
- ``??_uv_coverage.png``: the ``uv``-coverage of the final data set
.. image:: uv-coverage.png
- ``??_structure.png``: plot of the ionospheric `structure function`_ of the processed target field
.. image:: structure.png
- ``??-MFS-image.fits``: FITS image of the target field
.. image:: target_field.png
You can also check the calibration solutions for more details::
$ losoto -i results/cal_values/cal_solutions.h5
Summary of results/cal_values/cal_solutions.h5
Solution set 'calibrator':
==========================
Directions: 3c286
Stations: CS001HBA0 CS001HBA1 CS002HBA0 CS002HBA1
CS003HBA0 CS003HBA1 CS004HBA0 CS004HBA1
CS005HBA0 CS005HBA1 CS006HBA0 CS006HBA1
CS007HBA0 CS007HBA1 CS011HBA0 CS011HBA1
CS017HBA0 CS017HBA1 CS021HBA0 CS021HBA1
CS024HBA0 CS024HBA1 CS026HBA0 CS026HBA1
CS028HBA0 CS028HBA1 CS030HBA0 CS030HBA1
CS031HBA0 CS031HBA1 CS032HBA0 CS032HBA1
CS101HBA0 CS101HBA1 CS103HBA0 CS103HBA1
CS201HBA0 CS201HBA1 CS301HBA0 CS301HBA1
CS302HBA0 CS302HBA1 CS401HBA0 CS401HBA1
CS501HBA0 CS501HBA1 RS106HBA RS205HBA
RS208HBA RS210HBA RS305HBA RS306HBA
RS307HBA RS310HBA RS406HBA RS407HBA
RS409HBA RS503HBA RS508HBA RS509HBA
Solution table 'bandpass' (type: amplitude): 120 times, 372 freqs, 60 ants, 2 pols
Flagged data: 0.000%
Solution table 'clock' (type: clock): 120 times, 60 ants
Flagged data: 0.000%
Solution table 'faraday' (type: rotationmeasure): 60 ants, 120 times
Flagged data: 0.014%
Solution table 'polalign' (type: phase): 120 times, 60 ants, 1484 freqs, 2 pols
Flagged data: 0.000%
Solution set 'target':
======================
Directions: P000+00
Stations: CS001HBA0 CS001HBA1 CS002HBA0 CS002HBA1
CS003HBA0 CS003HBA1 CS004HBA0 CS004HBA1
CS005HBA0 CS005HBA1 CS006HBA0 CS006HBA1
CS007HBA0 CS007HBA1 CS011HBA0 CS011HBA1
CS017HBA0 CS017HBA1 CS021HBA0 CS021HBA1
CS024HBA0 CS024HBA1 CS026HBA0 CS026HBA1
CS028HBA0 CS028HBA1 CS030HBA0 CS030HBA1
CS031HBA0 CS031HBA1 CS032HBA0 CS032HBA1
CS101HBA0 CS101HBA1 CS103HBA0 CS103HBA1
CS201HBA0 CS201HBA1 CS301HBA0 CS301HBA1
CS302HBA0 CS302HBA1 CS401HBA0 CS401HBA1
CS501HBA0 CS501HBA1 RS106HBA RS205HBA
RS208HBA RS210HBA RS305HBA RS306HBA
RS307HBA RS310HBA RS406HBA RS407HBA
RS409HBA RS503HBA RS508HBA RS509HBA
Solution table 'RMextract' (type: rotationmeasure): 60 ants, 119 times
Flagged data: 0.000%
Solution table 'TGSSphase' (type: phase): 3446 times, 58 ants, 1 freq, 2 pols
Flagged data: 0.000%
History: 2021-07-30 11:25:44: Bad stations 'CS006HBA1', 'CS006HBA0' have not been added
back.
For an overall summary it is advised to check the summary logfile::
$ cat results/logs/???_summary.log
*****************************************
*** prefactor target pipeline summary ***
*****************************************
Field name: P000+00
User-specified baseline filter: [CR]S*&
Additional antennas removed from the data: CS006HBA1, CS006HBA0
A-Team sources close to the phase reference center: NONE
XX diffractive scale: 4.4 km
YY diffractive scale: 4.0 km
Changes applied to cal_solutions.h5:
2021-07-30 11:25:44: Bad stations 'CS006HBA1', 'CS006HBA0' have not been added back.
Amount of flagged solutions per station and solution table:
Station bandpass clock faraday polalign RMextract TGSSphase
CS001HBA0 0.29% 0.00% 0.00% 0.00% 0.00% 0.00%
CS001HBA1 0.29% 0.00% 0.00% 0.00% 0.00% 0.00%
CS002HBA0 0.29% 0.00% 0.00% 0.00% 0.00% 0.05%
CS002HBA1 0.29% 0.00% 0.00% 0.00% 0.00% 0.00%
CS003HBA0 0.29% 0.00% 0.00% 0.00% 0.00% 0.00%
CS003HBA1 0.29% 0.00% 0.00% 0.00% 0.00% 0.05%
CS004HBA0 0.29% 0.00% 0.00% 0.00% 0.00% 0.05%
CS004HBA1 6.05% 0.00% 0.00% 0.00% 0.00% 0.05%
CS005HBA0 0.29% 0.00% 0.00% 0.00% 0.00% 0.05%
CS005HBA1 0.39% 0.00% 0.00% 0.00% 0.00% 0.00%
CS006HBA0 0.29% 0.00% 0.00% 0.00% 0.00%
CS006HBA1 0.29% 0.00% 0.00% 0.00% 0.00%
Amount of flagged data per station at a given state:
Station initial prep Ateam final
CS001HBA0 5.13% 5.41% 11.12% 22.74%
CS001HBA1 5.13% 5.41% 11.03% 22.51%
CS002HBA0 5.12% 5.39% 11.39% 23.18%
CS002HBA1 5.12% 5.40% 21.09% 29.95%
CS003HBA0 5.12% 5.39% 9.92% 22.58%
CS003HBA1 5.12% 5.40% 11.37% 23.95%
CS004HBA0 5.12% 5.40% 13.27% 24.62%
CS004HBA1 5.12% 5.40% 12.24% 23.53%
CS005HBA0 5.12% 5.40% 11.59% 23.38%
CS005HBA1 5.12% 15.36% 20.07% 30.09%
CS006HBA0 100.00% 100.00% 100.00%
CS006HBA1 100.00% 100.00% 100.00%
**********
Summary file is written to: ???_prefactor_target_summary.json
Summary has been created.
User-defined parameter configuration
------------------------------------
**Parameters you will need to adjust**
*Location of the target data and calibrator solutions*
- ``msin``: location of the input target data, for instructions look at the :doc:`configuration instructions<parset>` page
- ``cal_solutions``: location of the calibrator solutions, for instructions look at the :doc:`configuration instructions<parset>` page.
**Parameters you may need to adjust**
*Data selection and calibration options*
- ``refant``: regular expression of the stations that are allowed to be selected as a reference antenna by the pipeline (default: ``CS00.*``)
- ``flag_baselines``: DP3-compatible pattern for baselines or stations to be flagged (may be an empty list, i.e.: ``[]`` )
- ``process_baselines_target``: performs A-Team-clipping/demixing and direction-independent phase-only self-calibration only on these baselines. Choose ``[CR]S*&`` if you want to process only cross-correlations and remove international stations (default: ``[CR]S*&``)
- ``filter_baselines``: selects only this set of baselines to be processed. Choose ``[CR]S*&`` if you want to process only cross-correlations and remove international stations (default: ``[CR]S*&``)
- ``do_smooth``: enable or disable baseline-based smoothing (default: ``false``)
- ``rfistrategy``: strategy to be applied with the statistical flagger (`AOFlagger`_, default: ``HBAdefault.rfis``)
- ``min_unflagged_fraction``: minimal fraction of unflagged data to be accepted for further processing of the data chunk (default: 0.5)
- ``raw_data``: use autoweight, set to True in case you are using raw data (default: ``false``)
- ``compression_bitrate``: defines the bitrate of `Dysco`_ compression of the data after the final step, choose 0 if you do NOT want to compress the data
- ``propagatesolutions``: use already derived solutions as initial guess for the upcoming time slot
- ``apply_tec``: apply TEC solutions from the calibrator (default: ``false``)
- ``apply_clock``: apply clock solutions from the calibrator (default: ``true``)
- ``apply_phase``: apply full phase solutions from the calibrator (default: ``false``)
- ``apply_RM``: apply ionospheric Rotation Measure from `RMextract`_ (default: ``true``)
- ``apply_beam``: apply element beam corrections (default: ``true``)
- ``gsmcal_step``: type of calibration to be performed in the self-calibration step (default: ``phase``)
- ``updateweights``: update ``WEIGHT_SPECTRUM`` column in a way consistent with the weights being inverse proportional to the autocorrelations (default: ``true``)
- ``use_target``: enable downloading of a target skymodel (default: ``true``)
- ``skymodel_source``: choose the target skymodel from `TGSS ADR`_ or the new `Global Sky Model`_ (GSM) (default: ``TGSS``)
A comprehensive explanation of the baseline selection syntax can be found `here`_.
*Demixing and clipping options*
- ``demix_sources``: choose sources to demix (provided as list), e.g., ``[CasA,CygA]``
- ``demix_target``: if given, the target source model (its patch in the SourceDB) is taken into account when solving (default: ``""``)
- ``demix_freqstep``: number of channels to average when demixing (default: 16)
- ``demix_timestep`` : number of time slots to average when demixing (default: 10)
- ``demix``: enable demixing (default: ``false``)
- ``clip_sources``: list of the skymodel patches to be used for Ateamclipping (default: ``[VirA_4_patch,CygAGG,CasA_4_patch,TauAGG]``)
*Further pipeline options*
- ``min_separation``: minimal accepted distance to an A-team source on the sky in degrees (will raise a WARNING, default: ``30``)
**Parameters for pipeline performance**
- ``max_dppp_threads``: number of threads per process for DP3 (default: 10)
- ``memoryperc``: maximum of memory used for aoflagger in raw_flagging mode in percent (default: 20)
- ``min_length``: minimum amount of subbands to concatenate in frequency necessary to perform the wide-band flagging in the RAM. It data is too big aoflag will use indirect-read (default: 50)
- ``overhead``: only use this fraction of the available memory for deriving the amount of data to be concatenated (default: 0.8)
*Skymodel directory*
- ``A-Team_skymodel``: location of the prefactor A-Team skymodels
- ``target_skymodel``: location of a user-defined target skymodel used for the self-calibration
*Averaging for the calibrator data*
- ``avg_timeresolution``: intermediate time resolution of the data in seconds after averaging (default: 4)
- ``avg_freqresolution`` : intermediate frequency resolution of the data after averaging (default: 48.82kHz, which translates to 4 channels per subband)
- ``avg_timeresolution_concat``: final time resolution of the data in seconds after averaging and concatenation (default: 8)
- ``avg_freqresolution_concat``: final frequency resolution of the data after averaging and concatenation (default: 97.64kHz, which translates to 2 channels per subband)
- ``num_SBs_per_group``: make concatenated measurement-sets with that many subbands, choose a high number if running LBA (default: 10)
*Concatenating of the target data*
- ``num_SBs_per_group``: make concatenated measurement-sets with that many subbands (default: 10)
- ``reference_stationSB``: station-subband number to use as reference for grouping, (default: ``None`` -> use lowest frequency input data as reference)
*RMextract settings*
- ``ionex_server``: URL of the *IONEX* server (default: ``"ftp://ftp.aiub.unibe.ch/CODE/"``)
- ``ionex_prefix``: the prefix of the *IONEX* files (default: ``CODG``)
- ``proxy_server``: specify URL or IP of proxy server if needed
- ``proxy_port``: port of proxy server if needed
- ``proxy_user``: user name of proxy server if needed
- ``proxy_pass``: password of proxy server if needed
In case of **LBA** observations you might also want to enable demixing (``demix: true``).
If your **LBA** data has **not** been demixed before you may still want to keep the A-Team-clipping.
.. _structure function: https://ui.adsabs.harvard.edu/abs/2016RaSc...51..927M/abstract
.. _Rapthor: https://github.com/darafferty/rapthor
.. _Global Sky Model: https://lcs165.lofar.eu/
.. _RMextract: https://github.com/lofar-astron/RMextract/
.. _factor: https://github.com/lofar-astron/factor/
.. _killMS: https://github.com/saopicc/killMS/
.. _TGSS ADR: https://http://tgssadr.strw.leidenuniv.nl/
.. _Dysco: https://github.com/aroffringa/dysco/
.. _AOFlagger: https://gitlab.com/aroffringa/aoflagger.git
.. _here: https://www.astron.nl/lofarwiki/doku.php?id=public:user_software:documentation:ndppp#description_of_baseline_selection_parameters
docs/source/target_field.png

646 KiB

.. _target_pipeline_old:
Target pipeline
===============
.. note::
These instructions are outdated and only valid for **prefactor** 3.2 or older. Please check the :doc:`recent instrunctions page<target>`.
This pipeline processes the target data in order to apply the direction-independent corrections from the calibrator pipeline (line 26). A first initial direction-independent self-calibration of the target field is performed, using a global sky model based on the `TGSS ADR`_ or the new Global Sky Model (GSM), and applied to the data.
You will find the single steps in the parameter ``pipeline.steps`` in line 99.
This chapter will present the specific steps of the target pipeline in more detail.
All results (diagnostic plots and calibration solutions) are usually stored in a subfolder of the results directory, see ``inspection_directory`` (line 71) and ``cal_values_directory`` (line 72), respectively.
Prepare target (incl. "demixing")
---------------------------------
This part of the pipeline prepares the target data in order to be calibration-ready for the first direction-independent phase-only self-calibration against a global sky model.
This mainly includes mitigation of bad data (RFI, bad antennas, contaminations from A-Team sources), selection of the data to be calibrated (usually Dutch stations only), and some averaging to reduce data size and enhance the signal-to-noise ratio.
Furthermore, ionospheric Rotation Measure corrections are applied, using `RMextract`_
The user can specify whether to do raw data or pre-processed data flagging and whether demixing should be performed.
The basic steps are:
- mapping of data to be used (``createmap_target``)
- copying h5parm solution set from the calibrator (``copy_cal_sols``)
- gathering RM satellite information and writing it into h5parm (``h5imp_RMextract``)
.. image:: RMextract.png
- creating a model of A-Team sources to be subtracted (``make_sourcedb_ateam``)
- check of any missing solutions for the target data (``check_station_mismatch``)
- basic flagging and averaging (``ndppp_prep_target``)
- edges of the band (``flagedge``) -- only used in ``raw_flagging`` mode
- statistical flagging (``aoflag``) -- only used in ``raw_flagging`` mode
- baseline flagging (``flag``)
- low elevation flagging (below 20 degress elevation) (``elev``)
- demix A-Team sources (``demix``) -- only used if specified
- applying clock offsets, polarization alignment, and bandpass correction derived from the calibrator (``applyclock``, ``applyPA``, ``applybandpass``)
- applying LOFAR beam and Rotation Measure correction from `RMextract`_ (``applybeam``, ``applyRM``)
- interpolation of flagged data (``interp``)
- averaging of the data to 4 seconds and 4 channels per subband (``avg``)
- write A-Team skymodel into the MODEL_DATA column (``predict_ateam``)
- clipping potentially A-Team affected data (``ateamcliptar``)
- interpolate, average (to 8 seconds and 2 channels per subband), and concatenate target data into chunks of ten subbands (``dpppconcat``). These chunks are enforced to be equidistant in frequency. Missing data will be filled back and flagged.
- wide-band statistical flagging (``aoflag``)
- remove chunks with more than 50\% flagged data (``check_unflagged``)
- identify fully flagged antennas (``check_bad_antennas``)
Now the data is prepared and cleaned from the majority of bad data.
Phase-only self-calibration
---------------------------
These steps aim for deriving a good first guess for the phase correction into the direction of the phase center (direction-independent phase correction).
Once this is done, the data is ready for further processing with direction-dependent calibration techniques, using software like `factor`_ or `killMS`_.
- download global sky model for the target field automatically (``sky_tar``)
- interpolate flagged data and perform direction-independent phase-only calibration (diagonal terms) within a limited baseline range, using the filter (``gsmcal_dysco``)
The phase solutions derived from the preparation step are now collected and loaded into **LoSoTo** to provide diagnostic plots:
- ``ph_freq??``: matrix plot of the phase solutions with time for a particular chunk of target data, where both polarizations are colorcoded
.. image:: ph_freq.png
- ``ph_poldif_freq??``: matrix plot of the XX-YY phase solutions with time for a particular chunk of target data
.. image:: ph_poldif_freq.png
- ``ph_pol??``: matrix plot of the phase solutions for the XX and YY polarization
.. image:: ph_polXX.png
- ``ph_poldif``: matrix plot of the phase solutions for the XX-YY polarization
.. image:: ph_poldif.png
The solutions are stored in the h5parm file format.
The last step also incorporates full `Dysco`_ compression to save disk space. The fully calibrated data is stored in the DATA column.
In the results directory also the uncompressed and uncorrected data is stored. These data are used for the :ref:`initsubtract_pipeline`.
User-defined parameter configuration
------------------------------------
**Parameters you will need to adjust**
*Information about the input data*
- ``target_input_path``: specify the directory where your target data is stored (a full UNIX-compatible directory is required)
- ``target_input_pattern``: regular expression pattern of all your target files (e.g. ``L72319*.MS``)
*Location of the software*
- ``prefactor_directory``: full path to your prefactor copy
- ``losoto_directory``: full path to your local LoSoTo installation
- ``aoflagger``: full path to your aoflagger executable
*Location of the calibrator solutions*
- ``cal_solutions``: location of the calibrator solutions (default: ``input.output.job_directory/../Pre-Facet-Calibrator/results/cal_values/cal_solutions.h5``, if you stick to the defaults)
**Parameters you may need to adjust**
*Data selection and calibration options*
- ``refant``:name of the station that will be used as a reference for the phase-plots
- ``flag_baselines``: NDPPP-compatible pattern for baselines or stations to be flagged (may be an empty list, i.e.: ``[]`` )
- ``process_baselines_target``: performs A-Team-clipping/demixing and direction-independent phase-only self-calibration only on these baselines. Choose [CR]S*& if you want to process only cross-correlations and remove international stations.
- ``filter_baselines``: selects only this set of baselines to be processed. Choose [CR]S*& if you want to process only cross-correlations and remove international stations.
- ``do_smooth``: enable or disable baseline-based smoothing (may enhance signal-to-noise for **LBA** data)
- ``rfistrategy``: strategy to be applied with the statistical flagger (AOFlagger), default: ``HBAdefault.rfis``
- ``interp_windowsize``: size of the window over which a value is interpolated. Should be odd. (default: 15)
- ``raw_data``: use autoweight, set to True in case you are using raw data (default: False)
- ``compression_bitrate``: defines the bitrate of Dysco compression of the data after the final step, choose 0 if you do NOT want to compress the data
- ``min_unflagged_fraction``: minimal fraction of unflagged data to be accepted for further processing of the data chunk
- ``propagatesolutions``: use already derived solutions as initial guess for the upcoming time slot
A comprehensive explanation of the baseline selection syntax can be found `here`_.
*Demixing options* (only used if demix step is added to the ``prep_targ_strategy`` variable)
- ``demix_sources``: choose sources to demix (provided as list), e.g., ``[CasA,CygA]``
- ``demix_target``: if given, the target source model (its patch in the SourceDB) is taken into account when solving (default: ``""``)
- ``demix_freqstep``: number of channels to average when demixing (default: 16)
- ``demix_timestep`` : number of time slots to average when demixing (default: 10)
*Definitions for pipeline options*
- ``initial_flagging``: choose {{ raw_flagging }} if you process raw data
- ``demix_step``: choose {{ demix }} if you want to demix
- ``apply_steps``: comma-separated list of apply_steps performed in the target preparation (NOTE: only use applyRM if you have performed RMextract before!)
- ``clipAteam_step``: choose {{ none }} if you want to skip A-team-clipping
- ``gsmcal_step``: choose tec if you want to fit TEC instead of self-calibrating for phases
- ``updateweights``: update the weights column, in a way consistent with the weights being inverse proportional to the autocorrelations
**Parameters for pipeline performance**
- ``num_proc_per_node``: number of processes to use per step per node (default: ``input.output.max_per_node``, reads the parameter ``max_per_node`` from the ``pipeline.cfg``)
- ``num_proc_per_node_limit``: number of processes to use per step per node for tasks with high I/O (DPPP or cp) or memory (e.g. calibration) (default: 4)
- ``max_dppp_threads``: number of threads per process for NDPPP (default: 10)
- ``min_length``: minimum amount of chunks to concatenate in frequency necessary to perform the wide-band flagging in the RAM. It data is too big aoflag will use indirect-read.
- ``overhead``: Only use this fraction of the available memory for deriving the amount of data to be concatenated.
- ``min_separation``: minimal accepted distance to an A-team source on the sky in degrees (will raise a WARNING)
- ``error_tolerance``: defines whether pipeline run will continue if single bands fail (default: False)
**Parameters you may want to adjust**
*Main directories*
- ``lofar_directory``: base directory of your **LOFAR** installation (default: $LOFARROOT)
- ``job_directory``: directory of the prefactor outputs (usually the ``job_directory`` as defined in the ``pipeline.cfg``, default: ``input.output.job_directory``)
*Script and plugin directories*
- ``scripts``: location of the prefactor scripts (default: ``{{ prefactor_directory }}/scripts``)
- ``pipeline.pluginpath``: location of the prefactor plugins: (default: ``{{ prefactor_directory }}/plugins``)
*Sky model directory*
- ``A-team_skymodel``: path to A-team skymodel (used for demixing and clipping)
- ``target_skymodel``: path to the skymodel for the phase-only calibration of the target
- ``use_target``: download the phase-only calibration skymodel from TGSS, "Force" : always download , "True" download if {{ target_skymodel }} does not exist , "False" : never download
- ``skymodel_source``: use GSM if you want to use the experimental (!) GSM SkyModel creator using TGSS, NVSS, WENSS and VLSS
*Result directories*
- ``results_directory``: location of the prefactor results (default: ``{{ job_directory }}/results``)
- ``inspection_directory``: location of the inspection plots (default: ``{{ results_directory }}/inspection``)
- ``cal_values_directory``: directory of the calibration solutions (h5parm file, default: ``{{ results_directory }}/cal_values``)
*Location of calibrator solutions*
- ``solutions``: location of the calibration solutions (h5parm file, default: ``{{ cal_values_directory }}/cal_solutions.h5``)
*Averaging for the calibrator data*
- ``avg_timeresolution``: intermediate time resolution of the data in seconds after averaging (default: 4)
- ``avg_freqresolution`` : intermediate frequency resolution of the data after averaging (default: 48.82kHz, which translates to 4 channels per subband)
- ``avg_timeresolution_concat``: final time resolution of the data in seconds after averaging and concatenation (default: 8)
- ``avg_freqresolution_concat``: final frequency resolution of the data after averaging and concatenation (default: 97.64kHz, which translates to 2 channels per subband)
*Concatenating of the target data*
- ``num_SBs_per_group``: make concatenated measurement-sets with that many subbands (default: 10)
- ``reference_stationSB``: station-subband number to use as reference for grouping, (default: ``None`` -> use lowest frequency input data as reference)
*RMextract settings*
- ``ionex_server``: URL of the *IONEX* server (default: "ftp://ftp.aiub.unibe.ch/CODE/")
- ``ionex_prefix``: the prefix of the *IONEX* files (default: CODG)
- ``ionex_path``: location of the *IONEX* files after downloading (default: ``{{ job_directory }}/IONEX/``)
Recommended parameters for **HBA** and **LBA** observations
-----------------------------------------------------------
============================= ============================ =======================
**parameter** **HBA** **LBA**
----------------------------- ---------------------------- -----------------------
``do_smooth`` False True
``rfistrategy`` HBAdefault LBAdefaultwideband.rfis
``apply_steps`` applyclock,applybeam,applyRM applyphase,applybeam
``gsmcal_step`` phase tec
``skymodel_source`` TGSS GSM
``clipATeam_step`` {{ clipATeam }} {{ none }}
``avg_timeresolution_concat`` 8. 4.
``avg_freqresolution_concat`` 97.64kHz 48.82kHz
``num_SBs_per_group`` 10 -1
============================= ============================ =======================
In case of **LBA** observation you might also want to enable demixing in the ``prep_targ_strategy`` variable.
If your **LBA** data has **not** been demixed before you may still want to keep the A-Team-clipping.
.. _RMextract: https://github.com/lofar-astron/RMextract/
.. _factor: https://github.com/lofar-astron/factor/
.. _killMS: https://github.com/saopicc/killMS/
.. _TGSS ADR: https://http://tgssadr.strw.leidenuniv.nl/
.. _Dysco: https://github.com/aroffringa/dysco/
.. _here: https://www.astron.nl/lofarwiki/doku.php?id=public:user_software:documentation:ndppp#description_of_baseline_selection_parameters
docs/source/targetscheme.png

101 KiB

docs/source/tec.png

158 KiB

docs/source/unflagged_fraction.png

19.1 KiB

docs/source/uv-coverage.png

117 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment