Resolve RAP-202

Former-commit-id: df518679 [formerly 3105c505] Former-commit-id: 7619b97b Former-commit-id: 33052033

Resolve RAP-202
c3466fb2 · alex · 0234484e · c3466fb2 · c3466fb2 · c3466fb2
Commit c3466fb2 authored 3 years ago by alex
--- a/docs/source/pipelineoverview_old.rst
+++ b/docs/source/pipelineoverview_old.rst
+.. _pipeline_overview_old:
+Pipeline overview
+=================
+.. note::
+   These instructions are outdated and only valid for **prefactor** 3.2 or older. Please check the :doc:`recent instrunctions page<pipelineoverview>`.
+**Prefactor** is organized in three major parts to process **LOFAR** data:
+    .. image:: prefactor_workflow_sketch.png
+``Pre-Facet-Calibrator``
+    Processes the (amplitude-)calibrator to derive direction-independent corrections. See :ref:`calibrator_pipeline_old` for details.
+``Pre-Facet-Target``
+    Transfers the direction-independent corrections to the target and does direction-independent calibration of the target. See :ref:`target_pipeline_old` for details.
+``Concatenate``
+    Concatenates the single-subband target data retrieved from the LTA into bands suitable for further processing with the initial-subtract pipeline. See :ref:`concatenate_pipeline` for details.
+``Initial-Subtract``
+    Images the full FoV (and 1st side-lobe), generating a sky-model and subtracting it from the visibilities. See :ref:`initsubtract_pipeline` for details.
+``Pre-Facet-Image``
+    Images the full FoV using the full bandwidth. See :ref:`image_pipeline` for details.
--- a/docs/source/polalign.png
+++ b/docs/source/polalign.png
--- a/docs/source/polalign_amp_polXX.png.REMOVED.git-id
+++ b/docs/source/polalign_amp_polXX.png.REMOVED.git-id
+598d381ba68be35a56e5ee8bbe0ce90c9f34cb1d
\ No newline at end of file
--- a/docs/source/polalign_ph_polXX.png.REMOVED.git-id
+++ b/docs/source/polalign_ph_polXX.png.REMOVED.git-id
+47e9728cbc118383209b6905b11f42452004a6ce
\ No newline at end of file
--- a/docs/source/polalign_ph_poldif.png.REMOVED.git-id
+++ b/docs/source/polalign_ph_poldif.png.REMOVED.git-id
+1bf6088377afb06425be9e86d48cfde4a07437ff
\ No newline at end of file
--- a/docs/source/polalign_rotangle.png.REMOVED.git-id
+++ b/docs/source/polalign_rotangle.png.REMOVED.git-id
+5301fd8f9a03325a97a56f4c443595e93362762e
\ No newline at end of file
--- a/docs/source/positional_offsets_sky.png
+++ b/docs/source/positional_offsets_sky.png
--- a/docs/source/prefactor_CWL_workflow_sketch.png
+++ b/docs/source/prefactor_CWL_workflow_sketch.png
--- a/docs/source/prefactor_workflow_sketch.png
+++ b/docs/source/prefactor_workflow_sketch.png
--- a/docs/source/preparation.rst
+++ b/docs/source/preparation.rst
+.. _data_preparation:
+Preparing the data
+------------------
+**Prefactor** requires **LOFAR LBA** or **HBA** raw or pre-processed data. These data are
+typically obtained from the LOFAR `Long-Term Archive`_.
+- The calibrator and target data have to match, i.e., be observed close enough
+  in time that calibration values can be transferred.
+- For each observation you should process all the calibrator data at once
+  together. Clock/TEC separation and flagging of bad amplitudes work better with
+  the full bandwidth.
+- For the target pipeline you will need to have internet access from the machine you are running **prefactor**.
+  It is required in order to retrieve RM values from `CODE`_ and a global sky model (`TGSS`_ or `GSM`_). Both are hosted as online services.
+  It is also possible to provide an own target skymodel to **prefactor** (using the parameters ``target_skymodel``, and ``use_target``, see the :doc:`target<target>` pipeline parameter information).
+.. note::
+    Processing of interleaved datasets is not currently supported.
+    **prefactor** can not handle multi-epoch observations at once.
+    Older versions of **prefactor**: All input measurement-sets for one pipeline run need to be in the same directory.
+.. _CODE: ftp://ftp.aiub.unibe.ch/CODE/
+.. _TGSS: http://tgssadr.strw.leidenuniv.nl/doku.php
+.. _GSM:  http://172.104.228.177/
+.. _Long-Term Archive: https://lta.lofar.eu
--- a/docs/source/running.rst
+++ b/docs/source/running.rst
+.. _runprefactor:
+Starting a pipeline
+=====================
+.. note::
+   If you are running the deprecated genericpipeline version of the pipeline (**prefactor** 3.2 or older), please check the :doc:`old instrunctions page<running_old>`.
+Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with ``cwltool`` or ``toil`` for the HBA calibrator pipeline::
+    $ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json
+    $ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json
+where ``prefactor.json`` is the input JSON file as described in the chapter :doc:`parset` and ``<install_dir>`` the location of the prefactor CWL description files.
+.. note::
+    Instead of specifying all options in ``prefactor.json`` the user can also use command line options to override the defaults.
+For standard LOFAR observations there are workflows available with pre-defined parameters (defaults) for **HBA** and **LBA** observations:
+============================ ====================== =======================
+**prefactor workflow**          **HBA**                   **LBA**
+---------------------------- ---------------------- -----------------------
+``prefactor_calibrator.cwl`` ``HBA_calibrator.cwl`` ``LBA_calibrator.cwl``
+``prefactor_target.cwl``     ``HBA_target.cwl``     ``LBA_target.cwl``
+============================ ====================== =======================
+.. note::
+    The **LBA** workflows are not (yet) available.
+Pipeline options for ``cwltool``
+--------------------------------
+The following ``<cwl_options>`` are recommended to use for running **prefactor** with ``cwltool``:
+    * **---outdir**: specifies the location of the pipeline outputs
+    * **---tmpdir-prefix**: specifies the location of the intermediate data products
+    * **---parallel**: jobs will run in parallel
+    * **---no-container**: don't use Docker container (only for manual installation)
+    * **---preserve-entire-environment**: use system environment variables (only for manual installation)
+    * **---debug**: more verbose output (use only for debugging the pipeline)
+.. note::
+    ``cwltool`` has no option to resume a failed/crashed run. If you need this option have a look at ``toil``.
+While the pipeline runs, the terminal will output the current state of the pipeline. It is also possible to pipe this output into a runtime logfile::
+    $ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json > logfile 2>&1
+In the specified ``--tmpdir_prefix`` all temporary folders and files are generated. At the end of the run those files can be deleted.
+Pipeline options for ``toil``
+--------------------------------
+The following ``<cwl_options>`` are recommended to use for running **prefactor** with ``toil``:
+    * **---workDir**: specifies the location of the intermediate data products
+    * **---outDir**: specificies the lcoation of the final data products
+    * **---jobStore**: location of the jobStore ("statefile")
+    * **---writeLogs**: location of the pipeline job logfiles
+    * **---logFile**: location of the main pipeline logfile
+    * **---logLevel**: can be **CRITICAL**, **ERROR**, **WARNING**, **INFO** or **DEBUG**
+    * **---batchSystem**: use speficic batch system of an HPC cluster or similar, e.g. ``slurm`` or ``single_machine``
+    * **---stats**: creates runtime statistics
+    * **---maxLocalJobs**: amount of local jobs to be run at the same time ("max_per_node")
+    * **---retryCount**: amount of retries for each failed pipeline job
+    * **---preserve-entire-environment**: use system environment variables (only for manual installation)
+    * **---no-container**: don't use Docker container (only for manual installation)
+Stopping and restarting the pipeline
+------------------------------------
+You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it).
+Sometimes some of
+the processes that the pipeline started don't get properly terminated, so if the
+genericpipeline process doesn't terminate you should look for its child
+processes and terminate them too.
+.. note::
+    If you stop and re-start pipelines a number of time then you should also
+    check occasionally if there are orphaned children that are eating up
+    resources on you computer.
+You can restart a pipeline if using ``toil`` through adding the parameter ``--restart`` on the terminal. If you want to start from scratch you should delete the directory created via ``jobStore`` and all intermediate data products (usually specified via the ``--workDir`` parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.
+Pipeline crashes
+----------------
+With ``cwltool`` a pipeline crash is a reported with this message::
+    WARNING Final process status is permanentFail
+In order to figure out at which step the pipeline failed you can search for the term ``permanentFail`` in the ``toil`` or ``cwltool`` logfile::
+    $ more logfile | grep "Permanent Fail"
+    WARNING [job compare_station_list] completed permanentFail
+    WARNING [step compare_station_list] completed permanentFail
+    INFO [workflow prep] completed permanentFail
+    WARNING [step prep] completed permanentFail
+    INFO [workflow prefactor] completed permanentFail
+    WARNING [step prefactor] completed permanentFail
+    INFO [workflow ] completed permanentFail
+    WARNING Final process status is permanentFail
+With that information it is possible to identify the failed step ``compare_station_list``. To find the corresponding part of the logfile where the step was launched you search for ``[job compare_station_list]``.
+It is usually best to also check all lines leading with ``ERROR`` to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong. See :ref:`help` for tips on interpreting the error messages.
+If you identify the problem and it does not affect the products that have been
+already produced, you can launch the pipeline again, after correcting the issue
+causing the process to stop. But in most cases it might be necessary to fully re-start the pipeline.
--- a/docs/source/running_old.rst
+++ b/docs/source/running_old.rst
+.. _runprefactor_old:
+Starting a pipeline
+-------------------
+.. note::
+   These instructions are outdated and only valid for **prefactor** 3.2 or older. Please check the :doc:`recent instrunctions page<running>`.
+Once you have the data and the parsets ready, you can run the pipeline using the
+genericpipeline script, e.g.::
+    $ genericpipeline.py -d -c pipeline.cfg My_prefactor_calibrator.parset
+.. note::
+    The -d option is recommended: it does make the log-files extremely large
+    (many megabytes), but without it, often the important information as to why a
+    pipeline run fails is not included.
+While the pipeline runs, in the specified ``runtime_directory`` (see previous
+section) new files are generated in a directory named after the parset name (e.g.,
+if you are running ``My_prefactor_calibrator.parset`` a directory named
+``My_prefactor_calibrator`` will appear in your ``runtime_directory``)::
+    $ ls My_prefactor_calibrator/
+    logs mapfiles parsets statefile
+The logs dir contains all the logs of the pipeline runs, identified by the date
+and time of execution, e.g.::
+    $ ls My_prefactor_calibrator/logs/2016-06-30T15:07:21/pipeline.log
+These contain all the output from the processes the pipeline called and
+diagnostic information about the pipeline. So they are useful to follow the
+status of the process and possibly identify reasons why a process crashed.
+While running the pipeline writes a statefile in the ``runtime_directory``, with all
+the step which were successfully executed. If the pipeline stops for whatever
+reason, you can re-run the same command and it will skip all the steps that are
+already done and only work on those which are still missing.
+The intermediate data files of the pipeline are written in the ``working_directory``
+specified in the ``pipeline.cfg``.
+Stopping and restarting the pipeline
+------------------------------------
+You can stop a pipeline run anytime by terminating the genericpipeline process
+(typically by pressing CRTL-C in the terminal where you started it). Sometimes some of
+the processes that the pipeline started don't get properly terminated, so if the
+genericpipeline process doesn't terminate you should look for its child
+processes and terminate them too.
+.. note::
+    If you stop and re-start pipelines a number of time then you should also
+    check occasionally if there are orphaned children that are eating up
+    resources on you computer.
+As mentioned earlier, you can re-start the pipeline by running the same command
+with which you started it.
+Pipeline crashes
+----------------
+It can happen that the pipeline stops with a message like this::
+     ERROR   genericpipeline: LOFAR Pipeline finished unsuccesfully.
+     WARNING genericpipeline: recipe genericpipeline completed with errors
+You need to read the log of that run to identify the reason why it stopped, e.g.::
+    > less My_prefactor_calibrator/logs/2016-06-30T15:07:21/pipeline.log
+It is usually best to first check at the end of the file for what ended the
+pipeline and then search from the beginning of the file for error or diagnostic
+messages that tell you what exactly went wrong. See :ref:`help` for tips on
+interpreting the error messages.
+If you identify the problem and it does not affect the products that have been
+already produced, you can launch the pipeline again, after correcting the issue
+causing the process to stop.
+Rerunning parts of the pipeline
+--------------------------------
+You can fully rerun a pipeline by deleting the runtime and working directories and restarting the pipeline.
+To rerun parts of the pipeline that were (allegedly) already executed
+successfully, you need to modify the ``statefile`` of the pipeline. To do this
+there is a ``statefile_manipulation.py`` script as part of prefactor::
+    python prefactor/bin/statefile_manipulation.py My_Workdir/My_calibrator_job/statefile
+If you then run the pipeline again, it will start at the step that you removed with the statefile manipulation tool.
--- a/docs/source/structure.png
+++ b/docs/source/structure.png
--- a/docs/source/target.rst
+++ b/docs/source/target.rst
+.. _target_pipeline:
+Target pipeline
+===============
+.. note::
+   If you are running the deprecated genericpipeline version of the pipeline (**prefactor** 3.2 or older), please check the :doc:`old instrunctions page<target_old>`.
+This pipeline processes the target data in order to apply the direction-independent corrections from the calibrator pipeline. A first initial direction-independent self-calibration of the target field is performed, using a global sky model based on the `TGSS ADR`_ or the new `Global Sky Model`_ (GSM), and applied to the data.
+This chapter will present the specific steps of the target pipeline in more detail.
+All results (diagnostic plots and calibration solutions) will be stored usually in the ``--outdir`` directory specified with your ``cwltool`` or ``toil`` command.
+    .. image:: targetscheme.png
+Prepare target, incl. "demixing" (``prep``)
+-------------------------------------------
+This part of the pipeline prepares the target data in order to be calibration-ready for the first direction-independent phase-only self-calibration against a global sky model.
+This mainly includes mitigation of bad data (RFI, bad antennas, contaminations from A-Team sources), selection of the data to be calibrated (usually Dutch stations only), and some averaging to reduce data size and enhance the signal-to-noise ratio.
+Furthermore, ionospheric Rotation Measure corrections are applied, using `RMextract`_
+The user can specify whether to do raw data or pre-processed data flagging and whether demixing should be performed.
+The basic workflows are:
+- preparation of data (``prep``)
+- concatenating and phase-only self-calibration against a global sky model (``gsmcal``)
+- creating the finally calibrated data set, via applying the self-calibration solutions and compressing the data (``finalize``)
+The workflow ``prep`` consists of:
+    - check for a potential station mismatch between calibrator solutions and the target data (step ``compare_station_list``)
+    - checking for nearby A-Team sources (step ``check_Ateam_separation``)
+    - creating a model of A-Team sources to be subtracted (step ``make_sourcedb_ateam``)
+    - getting ionospheric Rotation Measure corrections and adding it to the solutions (step ``createRMh5parm``)
+    .. image:: RMextract.png
+    - basic flagging, applying solutions, and averaging (subworkflow ``ndppp_prep_target``)
+        - edges of the band (``flagedge``) -- only used if ``raw_data : true``
+        - statistical flagging (``aoflag``) -- only used in ``raw_data : true``
+        - baseline flagging (``flagbaseline``)
+        - low elevation flagging (below 15 degress elevation) (``flagelev``)
+        - low amplitude flagging (below 1e-30) (``flagamp``)
+        - demix A-Team sources (``demix``) -- only used if specified ``demix : true``
+        - applying calibrator solutions (steps ``applyPA``, ``applyBandpass``, ``prep_target_applycal``)
+        - averaging of the data in time and frequency
+        - predicting impact of A-Team sources and write it to the ``MODEL_DATA`` column (step ``predict``)
+        - clipping time- and frequency chunks that are likely to be affected by A-Team sources (step ``Ateamclipper``)
+Phase-only self-calibration (``gsmcal``)
+-----------------------------------------
+These steps aim for deriving a good first guess for the phase correction in the direction of the phase center (direction-independent phase correction).
+Once this is done, the data is ready for further processing with direction-dependent calibration techniques, using software like `Rapthor`_, `factor`_ or `killMS`_.
+The phase solutions derived from the ``gsmcal`` workflow are collected and loaded into **LoSoTo** to provide diagnostic plots:
+- ``ph_freq??``: matrix plot of the phase solutions with time for a particular chunk of target data, where both polarizations are colorcoded
+    .. image:: ph_freq.png
+- ``ph_poldif_freq??``: matrix plot of the XX-YY phase solutions with time for a particular chunk of target data
+    .. image:: ph_poldif_freq.png
+- ``ph_pol??``: matrix plot of the phase solutions for the XX and YY polarization
+    .. image:: ph_polXX.png
+- ``ph_poldif``: matrix plot of the phase solutions for the XX-YY polarization
+    .. image:: ph_poldif.png
+The workflow ``gsmcal`` consists of:
+    - retrieving and creating a global sky model (steps ``find_skymodel_target``, ``make_sourcedb_target``)
+    - identification of fully flagged antennas (step ``identify_bad_antennas``)
+    - concatenating the data into chunks (subworkflow ``concat``)
+    - wide-band statistical flagging (steps ``ms_concat`` and ``aoflag``)
+    - checking for bad data chunks (step ``check_unflagged_fraction``)
+    - perform the self-calibration against the global skymodel (subworkflow ``calibrate_target``, baseline-dependend smoothing (step ``BLsmooth``) if specified ``do_smooth : true``))
+Finalizing the prefactor output (``finalize``)
+----------------------------------------------
+These steps produce the final data output and many helpful diagnostics.
+The workflow ``finalize`` consists of:
+    - adding missing stations to the solution set with zero phase and unit amplitude (for international stations, step ``add_missing_stations``)
+    - applying the phase-only self-calibration solutions to the data and compress them (step ``apply_gsmcal``)
+    - derive the structure function of the phases (step ``structure_function``)
+    - make a fast image of the target field (steps ``average`` and ``wsclean``)
+    - create plots of the ``uv``-coverage of the final data set (step ``uvplot``)
+    - create a summary file (step ``summary``)
+The last step also incorporates full `Dysco`_ compression to save disk space. The fully calibrated data is stored in the DATA column of the final data set.
+.. note::
+    All solutions are written in the h5parm file format via the steps ``H5parm_collector`` and called during all the workflows.
+    The solutions are stored in the final calibrator solution set ``results/cal_values/cal_solutions.h5``.
+Further diagnostics
+-------------------
+The ``results`` directory will contain all relevant outputs of the current **prefactor** run, once the pipeline has finished:
+    - fully calibrated data sets
+    - logfiles in ``results/logs``
+    - summary file (JSON format) in ``results/summary``
+    - calibration solutions in ``results/cal_values/cal_solutions.h5``
+    - inspection plots in ``results/inspection``
+The following diagnostic help to assess the quality of the data reduction:
+    - ``Ateam_separation.png``: shows the distance and the elevation of A-Team sources with respect to the analyzed observation
+        .. image:: Ateam_separation.png
+    - ``Ateamclipper.png``: fraction of flagged data due to their potential contamination from A-Team sources versus frequency
+        .. image:: Ateamclipper.png
+    - ``unflagged_fraction.png``: fraction of remaining unflagged data versus frequency
+        .. image:: unflagged_fraction.png
+    - ``??_uv-coverage_uvdist.png``: fraction of remaining unflagged data versus ``uv``-distance
+        .. image:: uv-coverage_uvdist.png
+    - ``??_uv_coverage.png``: the ``uv``-coverage of the final data set
+        .. image:: uv-coverage.png    
+    - ``??_structure.png``: plot of the ionospheric `structure function`_ of the processed target field
+    .. image:: structure.png
+    - ``??-MFS-image.fits``: FITS image of the target field
+    .. image:: target_field.png
+You can also check the calibration solutions for more details::
+    $ losoto -i results/cal_values/cal_solutions.h5
+    Summary of results/cal_values/cal_solutions.h5
+    Solution set 'calibrator':
+    ==========================
+    Directions: 3c286
+    Stations: CS001HBA0     CS001HBA1       CS002HBA0       CS002HBA1
+              CS003HBA0     CS003HBA1       CS004HBA0       CS004HBA1
+              CS005HBA0     CS005HBA1       CS006HBA0       CS006HBA1
+              CS007HBA0     CS007HBA1       CS011HBA0       CS011HBA1
+              CS017HBA0     CS017HBA1       CS021HBA0       CS021HBA1
+              CS024HBA0     CS024HBA1       CS026HBA0       CS026HBA1
+              CS028HBA0     CS028HBA1       CS030HBA0       CS030HBA1
+              CS031HBA0     CS031HBA1       CS032HBA0       CS032HBA1
+              CS101HBA0     CS101HBA1       CS103HBA0       CS103HBA1
+              CS201HBA0     CS201HBA1       CS301HBA0       CS301HBA1
+              CS302HBA0     CS302HBA1       CS401HBA0       CS401HBA1
+              CS501HBA0     CS501HBA1       RS106HBA        RS205HBA
+              RS208HBA      RS210HBA        RS305HBA        RS306HBA
+              RS307HBA      RS310HBA        RS406HBA        RS407HBA
+              RS409HBA      RS503HBA        RS508HBA        RS509HBA
+    Solution table 'bandpass' (type: amplitude): 120 times, 372 freqs, 60 ants, 2 pols
+        Flagged data: 0.000%
+    Solution table 'clock' (type: clock): 120 times, 60 ants
+        Flagged data: 0.000%
+    Solution table 'faraday' (type: rotationmeasure): 60 ants, 120 times
+        Flagged data: 0.014%
+    Solution table 'polalign' (type: phase): 120 times, 60 ants, 1484 freqs, 2 pols
+        Flagged data: 0.000%
+    Solution set 'target':
+    ======================
+    Directions: P000+00
+    Stations: CS001HBA0     CS001HBA1       CS002HBA0       CS002HBA1
+              CS003HBA0     CS003HBA1       CS004HBA0       CS004HBA1
+              CS005HBA0     CS005HBA1       CS006HBA0       CS006HBA1
+              CS007HBA0     CS007HBA1       CS011HBA0       CS011HBA1
+              CS017HBA0     CS017HBA1       CS021HBA0       CS021HBA1
+              CS024HBA0     CS024HBA1       CS026HBA0       CS026HBA1
+              CS028HBA0     CS028HBA1       CS030HBA0       CS030HBA1
+              CS031HBA0     CS031HBA1       CS032HBA0       CS032HBA1
+              CS101HBA0     CS101HBA1       CS103HBA0       CS103HBA1
+              CS201HBA0     CS201HBA1       CS301HBA0       CS301HBA1
+              CS302HBA0     CS302HBA1       CS401HBA0       CS401HBA1
+              CS501HBA0     CS501HBA1       RS106HBA        RS205HBA
+              RS208HBA      RS210HBA        RS305HBA        RS306HBA
+              RS307HBA      RS310HBA        RS406HBA        RS407HBA
+              RS409HBA      RS503HBA        RS508HBA        RS509HBA
+    Solution table 'RMextract' (type: rotationmeasure): 60 ants, 119 times
+        Flagged data: 0.000%
+    Solution table 'TGSSphase' (type: phase): 3446 times, 58 ants, 1 freq, 2 pols
+        Flagged data: 0.000%
+        History: 2021-07-30 11:25:44: Bad stations 'CS006HBA1', 'CS006HBA0' have not been added
+                                      back.
+For an overall summary it is advised to check the summary logfile::
+    $ cat results/logs/???_summary.log
+    *****************************************
+    *** prefactor target pipeline summary ***
+    *****************************************
+    Field name: P000+00
+    User-specified baseline filter: [CR]S*&
+    Additional antennas removed from the data: CS006HBA1, CS006HBA0
+    A-Team sources close to the phase reference center: NONE
+    XX diffractive scale: 4.4 km
+    YY diffractive scale: 4.0 km
+    Changes applied to cal_solutions.h5:
+    2021-07-30 11:25:44: Bad stations 'CS006HBA1', 'CS006HBA0' have not been added back.
+    Amount of flagged solutions per station and solution table:
+    Station   bandpass    clock    faraday  polalign  RMextract TGSSphase
+    CS001HBA0    0.29%     0.00%     0.00%     0.00%     0.00%     0.00% 
+    CS001HBA1    0.29%     0.00%     0.00%     0.00%     0.00%     0.00% 
+    CS002HBA0    0.29%     0.00%     0.00%     0.00%     0.00%     0.05% 
+    CS002HBA1    0.29%     0.00%     0.00%     0.00%     0.00%     0.00% 
+    CS003HBA0    0.29%     0.00%     0.00%     0.00%     0.00%     0.00% 
+    CS003HBA1    0.29%     0.00%     0.00%     0.00%     0.00%     0.05% 
+    CS004HBA0    0.29%     0.00%     0.00%     0.00%     0.00%     0.05% 
+    CS004HBA1    6.05%     0.00%     0.00%     0.00%     0.00%     0.05% 
+    CS005HBA0    0.29%     0.00%     0.00%     0.00%     0.00%     0.05% 
+    CS005HBA1    0.39%     0.00%     0.00%     0.00%     0.00%     0.00% 
+    CS006HBA0    0.29%     0.00%     0.00%     0.00%     0.00%
+    CS006HBA1    0.29%     0.00%     0.00%     0.00%     0.00%
+    Amount of flagged data per station at a given state:
+    Station    initial  prep    Ateam   final 
+    CS001HBA0   5.13%   5.41%  11.12%  22.74%
+    CS001HBA1   5.13%   5.41%  11.03%  22.51%
+    CS002HBA0   5.12%   5.39%  11.39%  23.18%
+    CS002HBA1   5.12%   5.40%  21.09%  29.95%
+    CS003HBA0   5.12%   5.39%   9.92%  22.58%
+    CS003HBA1   5.12%   5.40%  11.37%  23.95%
+    CS004HBA0   5.12%   5.40%  13.27%  24.62%
+    CS004HBA1   5.12%   5.40%  12.24%  23.53%
+    CS005HBA0   5.12%   5.40%  11.59%  23.38%
+    CS005HBA1   5.12%  15.36%  20.07%  30.09%
+    CS006HBA0 100.00% 100.00% 100.00%
+    CS006HBA1 100.00% 100.00% 100.00%
+    **********
+    Summary file is written to: ???_prefactor_target_summary.json
+    Summary has been created.
+User-defined parameter configuration
+------------------------------------
+**Parameters you will need to adjust**
+*Location of the target data and calibrator solutions*
+- ``msin``: location of the input target data, for instructions look at the :doc:`configuration instructions<parset>` page
+- ``cal_solutions``: location of the calibrator solutions, for instructions look at the :doc:`configuration instructions<parset>` page.
+**Parameters you may need to adjust**
+*Data selection and calibration options*
+- ``refant``: regular expression of the stations that are allowed to be selected as a reference antenna by the pipeline (default: ``CS00.*``)
+- ``flag_baselines``: DP3-compatible pattern for baselines or stations to be flagged (may be an empty list, i.e.: ``[]`` )
+- ``process_baselines_target``: performs A-Team-clipping/demixing and direction-independent phase-only self-calibration only on these baselines. Choose ``[CR]S*&`` if you want to process only cross-correlations and remove international stations (default: ``[CR]S*&``)
+- ``filter_baselines``: selects only this set of baselines to be processed. Choose ``[CR]S*&`` if you want to process only cross-correlations and remove international stations (default: ``[CR]S*&``)
+- ``do_smooth``: enable or disable baseline-based smoothing (default: ``false``)
+- ``rfistrategy``: strategy to be applied with the statistical flagger (`AOFlagger`_, default: ``HBAdefault.rfis``)
+- ``min_unflagged_fraction``: minimal fraction of unflagged data to be accepted for further processing of the data chunk (default: 0.5)
+- ``raw_data``: use autoweight, set to True in case you are using raw data (default: ``false``)
+- ``compression_bitrate``: defines the bitrate of `Dysco`_ compression of the data after the final step, choose 0 if you do NOT want to compress the data
+- ``propagatesolutions``: use already derived solutions as initial guess for the upcoming time slot
+- ``apply_tec``: apply TEC solutions from the calibrator (default: ``false``)
+- ``apply_clock``: apply clock solutions from the calibrator (default: ``true``)
+- ``apply_phase``: apply full phase solutions from the calibrator (default: ``false``)
+- ``apply_RM``: apply ionospheric Rotation Measure from `RMextract`_ (default: ``true``)
+- ``apply_beam``: apply element beam corrections (default: ``true``)
+- ``gsmcal_step``: type of calibration to be performed in the self-calibration step (default: ``phase``)
+- ``updateweights``: update ``WEIGHT_SPECTRUM`` column in a way consistent with the weights being inverse proportional to the autocorrelations (default: ``true``)
+- ``use_target``: enable downloading of a target skymodel (default: ``true``)
+- ``skymodel_source``: choose the target skymodel from `TGSS ADR`_ or the new `Global Sky Model`_ (GSM) (default: ``TGSS``)
+A comprehensive explanation of the baseline selection syntax can be found `here`_.
+*Demixing and clipping options*
+- ``demix_sources``: choose sources to demix (provided as list), e.g., ``[CasA,CygA]``
+- ``demix_target``: if given, the target source model (its patch in the SourceDB) is taken into account when solving (default: ``""``)
+- ``demix_freqstep``: number of channels to average when demixing (default: 16)
+- ``demix_timestep`` : number of time slots to average when demixing (default: 10)
+- ``demix``: enable demixing (default: ``false``)
+- ``clip_sources``: list of the skymodel patches to be used for Ateamclipping (default: ``[VirA_4_patch,CygAGG,CasA_4_patch,TauAGG]``)
+*Further pipeline options*
+- ``min_separation``: minimal accepted distance to an A-team source on the sky in degrees (will raise a WARNING, default: ``30``)
+**Parameters for pipeline performance**
+- ``max_dppp_threads``: number of threads per process for DP3 (default: 10)
+- ``memoryperc``: maximum of memory used for aoflagger in raw_flagging mode in percent (default: 20)
+- ``min_length``: minimum amount of subbands to concatenate in frequency necessary to perform the wide-band flagging in the RAM. It data is too big aoflag will use indirect-read (default: 50)
+- ``overhead``: only use this fraction of the available memory for deriving the amount of data to be concatenated (default: 0.8)
+*Skymodel directory*
+- ``A-Team_skymodel``: location of the prefactor A-Team skymodels
+- ``target_skymodel``: location of a user-defined target skymodel used for the self-calibration
+*Averaging for the calibrator data*
+- ``avg_timeresolution``: intermediate time resolution of the data in seconds after averaging (default: 4)
+- ``avg_freqresolution`` : intermediate frequency resolution of the data after averaging (default: 48.82kHz, which translates to 4 channels per subband)
+- ``avg_timeresolution_concat``: final time resolution of the data in seconds after averaging and concatenation (default: 8)
+- ``avg_freqresolution_concat``: final frequency resolution of the data after averaging and concatenation (default: 97.64kHz, which translates to 2 channels per subband)
+- ``num_SBs_per_group``: make concatenated measurement-sets with that many subbands, choose a high number if running LBA (default: 10)
+*Concatenating of the target data*
+- ``num_SBs_per_group``: make concatenated measurement-sets with that many subbands (default: 10)
+- ``reference_stationSB``: station-subband number to use as reference for grouping, (default: ``None`` -> use lowest frequency input data as reference)
+*RMextract settings*
+- ``ionex_server``: URL of the *IONEX* server (default: ``"ftp://ftp.aiub.unibe.ch/CODE/"``)
+- ``ionex_prefix``: the prefix of the *IONEX* files (default: ``CODG``)
+- ``proxy_server``: specify URL or IP of proxy server if needed
+- ``proxy_port``: port of proxy server if needed
+- ``proxy_user``: user name of proxy server if needed
+- ``proxy_pass``: password of proxy server if needed
+In case of **LBA** observations you might also want to enable demixing (``demix: true``).
+If your **LBA** data has **not** been demixed before you may still want to keep the A-Team-clipping.
+.. _structure function: https://ui.adsabs.harvard.edu/abs/2016RaSc...51..927M/abstract
+.. _Rapthor: https://github.com/darafferty/rapthor
+.. _Global Sky Model: https://lcs165.lofar.eu/
+.. _RMextract: https://github.com/lofar-astron/RMextract/
+.. _factor: https://github.com/lofar-astron/factor/
+.. _killMS: https://github.com/saopicc/killMS/
+.. _TGSS ADR: https://http://tgssadr.strw.leidenuniv.nl/
+.. _Dysco: https://github.com/aroffringa/dysco/
+.. _AOFlagger: https://gitlab.com/aroffringa/aoflagger.git
+.. _here: https://www.astron.nl/lofarwiki/doku.php?id=public:user_software:documentation:ndppp#description_of_baseline_selection_parameters
--- a/docs/source/target_field.png
+++ b/docs/source/target_field.png
--- a/docs/source/target_old.rst
+++ b/docs/source/target_old.rst
+.. _target_pipeline_old:
+Target pipeline
+===============
+.. note::
+   These instructions are outdated and only valid for **prefactor** 3.2 or older. Please check the :doc:`recent instrunctions page<target>`.
+This pipeline processes the target data in order to apply the direction-independent corrections from the calibrator pipeline (line 26). A first initial direction-independent self-calibration of the target field is performed, using a global sky model based on the `TGSS ADR`_ or the new Global Sky Model (GSM), and applied to the data.
+You will find the single steps in the parameter ``pipeline.steps`` in line 99.
+This chapter will present the specific steps of the target pipeline in more detail.
+All results (diagnostic plots and calibration solutions) are usually stored in a subfolder of the results directory, see ``inspection_directory`` (line 71) and ``cal_values_directory`` (line 72), respectively.
+Prepare target (incl. "demixing")
+---------------------------------
+This part of the pipeline prepares the target data in order to be calibration-ready for the first direction-independent phase-only self-calibration against a global sky model.
+This mainly includes mitigation of bad data (RFI, bad antennas, contaminations from A-Team sources), selection of the data to be calibrated (usually Dutch stations only), and some averaging to reduce data size and enhance the signal-to-noise ratio.
+Furthermore, ionospheric Rotation Measure corrections are applied, using `RMextract`_
+The user can specify whether to do raw data or pre-processed data flagging and whether demixing should be performed.
+The basic steps are:
+- mapping of data to be used (``createmap_target``)
+- copying h5parm solution set from the calibrator (``copy_cal_sols``)
+- gathering RM satellite information and writing it into h5parm (``h5imp_RMextract``)
+    .. image:: RMextract.png
+- creating a model of A-Team sources to be subtracted (``make_sourcedb_ateam``)
+- check of any missing solutions for the target data (``check_station_mismatch``)
+- basic flagging and averaging (``ndppp_prep_target``)
+    - edges of the band (``flagedge``) -- only used in ``raw_flagging`` mode
+    - statistical flagging (``aoflag``) -- only used in ``raw_flagging`` mode
+    - baseline flagging (``flag``)
+    - low elevation flagging (below 20 degress elevation) (``elev``)
+    - demix A-Team sources (``demix``) -- only used if specified
+    - applying clock offsets, polarization alignment, and bandpass correction derived from the calibrator (``applyclock``, ``applyPA``, ``applybandpass``)
+    - applying LOFAR beam and Rotation Measure correction from `RMextract`_ (``applybeam``, ``applyRM``)
+    - interpolation of flagged data (``interp``)
+    - averaging of the data to 4 seconds and 4 channels per subband (``avg``)
+- write A-Team skymodel into the MODEL_DATA column (``predict_ateam``)
+- clipping potentially A-Team affected data (``ateamcliptar``)
+- interpolate, average (to 8 seconds and 2 channels per subband), and concatenate target data into chunks of ten subbands (``dpppconcat``). These chunks are enforced to be equidistant in frequency. Missing data will be filled back and flagged.
+- wide-band statistical flagging (``aoflag``)
+- remove chunks with more than 50\% flagged data (``check_unflagged``)
+- identify fully flagged antennas (``check_bad_antennas``)
+Now the data is prepared and cleaned from the majority of bad data.
+Phase-only self-calibration
+---------------------------
+These steps aim for deriving a good first guess for the phase correction into the direction of the phase center (direction-independent phase correction).
+Once this is done, the data is ready for further processing with direction-dependent calibration techniques, using software like `factor`_ or `killMS`_.
+- download global sky model for the target field automatically (``sky_tar``)
+- interpolate flagged data and perform direction-independent phase-only calibration (diagonal terms) within a limited baseline range, using the filter (``gsmcal_dysco``)
+The phase solutions derived from the preparation step are now collected and loaded into **LoSoTo** to provide diagnostic plots:
+- ``ph_freq??``: matrix plot of the phase solutions with time for a particular chunk of target data, where both polarizations are colorcoded
+    .. image:: ph_freq.png
+- ``ph_poldif_freq??``: matrix plot of the XX-YY phase solutions with time for a particular chunk of target data
+    .. image:: ph_poldif_freq.png
+- ``ph_pol??``: matrix plot of the phase solutions for the XX and YY polarization
+    .. image:: ph_polXX.png
+- ``ph_poldif``: matrix plot of the phase solutions for the XX-YY polarization
+    .. image:: ph_poldif.png
+The solutions are stored in the h5parm file format.
+The last step also incorporates full `Dysco`_ compression to save disk space. The fully calibrated data is stored in the DATA column.
+In the results directory also the uncompressed and uncorrected data is stored. These data are used for the :ref:`initsubtract_pipeline`.
+User-defined parameter configuration
+------------------------------------
+**Parameters you will need to adjust**
+*Information about the input data*
+- ``target_input_path``: specify the directory where your target data is stored (a full UNIX-compatible directory is required)
+- ``target_input_pattern``: regular expression pattern of all your target files (e.g. ``L72319*.MS``)
+*Location of the software*
+- ``prefactor_directory``: full path to your prefactor copy
+- ``losoto_directory``: full path to your local LoSoTo installation
+- ``aoflagger``: full path to your aoflagger executable
+*Location of the calibrator solutions*
+- ``cal_solutions``: location of the calibrator solutions (default: ``input.output.job_directory/../Pre-Facet-Calibrator/results/cal_values/cal_solutions.h5``, if you stick to the defaults)
+**Parameters you may need to adjust**
+*Data selection and calibration options*
+- ``refant``:name of the station that will be used as a reference for the phase-plots
+- ``flag_baselines``: NDPPP-compatible pattern for baselines or stations to be flagged (may be an empty list, i.e.: ``[]`` )
+- ``process_baselines_target``: performs A-Team-clipping/demixing and direction-independent phase-only self-calibration only on these baselines. Choose [CR]S*& if you want to process only cross-correlations and remove international stations.
+- ``filter_baselines``: selects only this set of baselines to be processed. Choose [CR]S*& if you want to process only cross-correlations and remove international stations.
+- ``do_smooth``: enable or disable baseline-based smoothing (may enhance signal-to-noise for **LBA** data)
+- ``rfistrategy``: strategy to be applied with the statistical flagger (AOFlagger), default: ``HBAdefault.rfis``
+- ``interp_windowsize``: size of the window over which a value is interpolated. Should be odd. (default: 15)
+- ``raw_data``: use autoweight, set to True in case you are using raw data (default: False)
+- ``compression_bitrate``: defines the bitrate of Dysco compression of the data after the final step, choose 0 if you do NOT want to compress the data
+- ``min_unflagged_fraction``: minimal fraction of unflagged data to be accepted for further processing of the data chunk
+- ``propagatesolutions``: use already derived solutions as initial guess for the upcoming time slot
+A comprehensive explanation of the baseline selection syntax can be found `here`_.
+*Demixing options* (only used if demix step is added to the ``prep_targ_strategy`` variable)
+- ``demix_sources``: choose sources to demix (provided as list), e.g., ``[CasA,CygA]``
+- ``demix_target``: if given, the target source model (its patch in the SourceDB) is taken into account when solving (default: ``""``)
+- ``demix_freqstep``: number of channels to average when demixing (default: 16)
+- ``demix_timestep`` : number of time slots to average when demixing (default: 10)
+*Definitions for pipeline options*
+- ``initial_flagging``: choose {{ raw_flagging }} if you process raw data
+- ``demix_step``: choose {{ demix }} if you want to demix
+- ``apply_steps``: comma-separated list of apply_steps performed in the target preparation (NOTE: only use applyRM if you have performed RMextract before!)
+- ``clipAteam_step``: choose {{ none }} if you want to skip A-team-clipping
+- ``gsmcal_step``: choose tec if you want to fit TEC instead of self-calibrating for phases
+- ``updateweights``: update the weights column, in a way consistent with the weights being inverse proportional to the autocorrelations
+**Parameters for pipeline performance**
+- ``num_proc_per_node``: number of processes to use per step per node (default: ``input.output.max_per_node``, reads the parameter ``max_per_node`` from the ``pipeline.cfg``)
+- ``num_proc_per_node_limit``: number of processes to use per step per node for tasks with high I/O (DPPP or cp) or memory (e.g. calibration) (default: 4)
+- ``max_dppp_threads``: number of threads per process for NDPPP (default: 10)
+- ``min_length``: minimum amount of chunks to concatenate in frequency necessary to perform the wide-band flagging in the RAM. It data is too big aoflag will use indirect-read.
+- ``overhead``: Only use this fraction of the available memory for deriving the amount of data to be concatenated.
+- ``min_separation``: minimal accepted distance to an A-team source on the sky in degrees (will raise a WARNING)
+- ``error_tolerance``: defines whether pipeline run will continue if single bands fail (default: False)
+**Parameters you may want to adjust**
+*Main directories*
+- ``lofar_directory``: base directory of your **LOFAR** installation (default: $LOFARROOT)
+- ``job_directory``: directory of the prefactor outputs (usually the ``job_directory`` as defined in the ``pipeline.cfg``, default: ``input.output.job_directory``)
+*Script and plugin directories*
+- ``scripts``: location of the prefactor scripts (default: ``{{ prefactor_directory }}/scripts``)
+- ``pipeline.pluginpath``: location of the prefactor plugins: (default: ``{{ prefactor_directory }}/plugins``)
+*Sky model directory*
+- ``A-team_skymodel``: path to A-team skymodel (used for demixing and clipping)
+- ``target_skymodel``: path to the skymodel for the phase-only calibration of the target
+- ``use_target``: download the phase-only calibration skymodel from TGSS, "Force" : always download , "True" download if {{ target_skymodel }} does not exist , "False" : never download
+- ``skymodel_source``: use GSM if you want to use the experimental (!) GSM SkyModel creator using TGSS, NVSS, WENSS and VLSS
+*Result directories*
+- ``results_directory``: location of the prefactor results (default: ``{{ job_directory }}/results``)
+- ``inspection_directory``: location of the inspection plots (default: ``{{ results_directory }}/inspection``)
+- ``cal_values_directory``: directory of the calibration solutions (h5parm file, default: ``{{ results_directory }}/cal_values``)
+*Location of calibrator solutions*
+- ``solutions``: location of the calibration solutions (h5parm file, default: ``{{ cal_values_directory }}/cal_solutions.h5``)
+*Averaging for the calibrator data*
+- ``avg_timeresolution``: intermediate time resolution of the data in seconds after averaging (default: 4)
+- ``avg_freqresolution`` : intermediate frequency resolution of the data after averaging (default: 48.82kHz, which translates to 4 channels per subband)
+- ``avg_timeresolution_concat``: final time resolution of the data in seconds after averaging and concatenation (default: 8)
+- ``avg_freqresolution_concat``: final frequency resolution of the data after averaging and concatenation (default: 97.64kHz, which translates to 2 channels per subband)
+*Concatenating of the target data*
+- ``num_SBs_per_group``: make concatenated measurement-sets with that many subbands (default: 10)
+- ``reference_stationSB``: station-subband number to use as reference for grouping, (default: ``None`` -> use lowest frequency input data as reference)
+*RMextract settings*
+- ``ionex_server``: URL of the *IONEX* server (default: "ftp://ftp.aiub.unibe.ch/CODE/")
+- ``ionex_prefix``: the prefix of the *IONEX* files (default: CODG)
+- ``ionex_path``: location of the *IONEX* files after downloading (default: ``{{ job_directory }}/IONEX/``)
+Recommended parameters for **HBA** and **LBA** observations
+-----------------------------------------------------------
+============================= ============================ =======================
+**parameter**                 **HBA**                      **LBA**
+----------------------------- ---------------------------- -----------------------
+``do_smooth``                 False                        True
+``rfistrategy``               HBAdefault                   LBAdefaultwideband.rfis
+``apply_steps``               applyclock,applybeam,applyRM applyphase,applybeam
+``gsmcal_step``               phase                        tec
+``skymodel_source``           TGSS                         GSM
+``clipATeam_step``            {{ clipATeam }}              {{ none }}
+``avg_timeresolution_concat`` 8.                           4.
+``avg_freqresolution_concat`` 97.64kHz                     48.82kHz
+``num_SBs_per_group``         10                           -1
+============================= ============================ =======================
+In case of **LBA** observation you might also want to enable demixing in the ``prep_targ_strategy`` variable.
+If your **LBA** data has **not** been demixed before you may still want to keep the A-Team-clipping.
+.. _RMextract: https://github.com/lofar-astron/RMextract/
+.. _factor: https://github.com/lofar-astron/factor/
+.. _killMS: https://github.com/saopicc/killMS/
+.. _TGSS ADR: https://http://tgssadr.strw.leidenuniv.nl/
+.. _Dysco: https://github.com/aroffringa/dysco/
+.. _here: https://www.astron.nl/lofarwiki/doku.php?id=public:user_software:documentation:ndppp#description_of_baseline_selection_parameters
--- a/docs/source/targetscheme.png
+++ b/docs/source/targetscheme.png
--- a/docs/source/tec.png
+++ b/docs/source/tec.png
--- a/docs/source/unflagged_fraction.png
+++ b/docs/source/unflagged_fraction.png
--- a/docs/source/uv-coverage.png
+++ b/docs/source/uv-coverage.png