Merge branch 'rap-840-add-docs' into 'main'

RAP-840 Add Read the Docs See merge request !9

Merge branch 'rap-840-add-docs' into 'main'
d448282b · Mick Veldhuis · e5ba73f2 · 73aa022d · d448282b · d448282b
Commit d448282b authored 7 months ago by Mick Veldhuis
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,9 @@ __pycache__/
 .venv/
 venv/

+# Sphinx documentation
+docs/_build/
+
 # Data
 *.ms
 *.MS

--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -12,11 +12,11 @@ build:

 # Build documentation in the "docs/" directory with Sphinx
 sphinx:
-   configuration: docs/conf.py
+   configuration: docs/source/conf.py

 # Optionally, but recommended,
 # declare the Python requirements required to build your documentation
 # See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
-python:
-   install:
-   - requirements: docs/requirements.txt
+# python:
+#    install:
+#    - requirements: docs/requirements.txt
--- a/README.md
+++ b/README.md
@@ -4,6 +4,10 @@ The Low-Frequency Array (LOFAR) Pre-Processing Pipeline is a workflow meant to b

 This is an implementation of the pre-processing pipeline in the Common Workflow Language (CWL), which is replacing the Generic Pipeline implementation.

+## Documentation
+
+Documentation is available on [Read the Docs](https://lofar-pre-processing-pipeline.readthedocs.io/).
+
 ## Running the Pipeline

 The pipeline is currently not in a state that should be run yet! However, if you would like to test the development version, please refer to `tests/README.md`. 

--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = "LOFAR Pre-Processing Pipeline"
+copyright = "2024, Team Rapthor"
+author = "Team Rapthor"
+
+
+# -- General configuration ---------------------------------------------------
+
+extensions = []
+
+templates_path = ["_templates"]
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = "alabaster"
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+# html_static_path = ["_static"]
\ No newline at end of file
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
+LOFAR Pre-Processing Pipeline
+===============================
+
+The Low-Frequency Array (LOFAR) Pre-Processing Pipeline is a workflow that's meant to be run on raw LOFAR data post-correlation. The input is assumed to have been produced with COBALT (Correlator and Beamformer Application for the LOFAR Telescope) and it's meant to be run on the Central Processing cluster (CEP). 
+
+The pipeline prepares raw LOFAR data for further processing, e.g., with calibration and imaging pipelines, such as `LINC <https://linc.readthedocs.io/>`_ and `Rapthor <https://rapthor.readthedocs.io/>`_. It's a re-implementation of the old Pre-Processing Pipeline, written in the `Common Workflow Language (CWL) <https://www.commonwl.org/>`_; the old pipeline was implemented in the `Generic Pipeline Framework <https://support.astron.nl/softwaregenericpipeline/>`_.
+
+It performs the following operations:
+
+* initial flagging of bad data
+* removal of bright off-axis sources (demixing)
+* averaging in time and frequency
+* and compression of the visibility data
+
+.. attention::
+
+   The `CWL <https://www.commonwl.org/>`_ implementation of the Pre-Processing Pipeline is currently in development. Please beware that it is in an experimental state and not fully functional yet. Use at your own risk!
+
+Getting Started
+---------------
+
+.. toctree::
+   :maxdepth: 2
+
+   installation
+   running
+
+Further Details
+---------------
+
+.. toctree::
+   :maxdepth: 2
+
+   overview
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
+Downloading and Installation
+============================
+
+Instructions for downloading and installing the Pre-Processing Pipeline are summarised below.
+
+Obtaining the Pre-Processing Pipeline
+-------------------------------------
+
+The pipeline is written in the `Common Workflow Language (CWL) <https://www.commonwl.org/>`_ and consists of CWL workflows, which can be obtained with:
+
+.. code:: console
+
+    $ git clone https://git.astron.nl/RD/preprocessing-cwl
+
+
+Manual Installation
+-------------------
+
+To run the pipeline, you need a CWL runner, such as ``cwltool`` or ``toil``. It's recommended to install these in a virtual environment:
+
+.. code:: console
+
+    $ python3 -m venv venv
+    $ source venv/bin/active
+    $ pip install cwltool toil[cwl]
+
+Note that it's also necessary to install NodeJS, which can either be installed via your favourite package manager or using the ``nodejs-wheel`` Python package. 
+
+Additionally, the Pre-Processing Pipeline depends on the following software for processing:
+
+* `LofarStMan <https://github.com/lofar-astron/LofarStMan>`_
+* `DP3 <https://git.astron.nl/RD/DP3>`_
+* `AOFlagger <https://gitlab.com/aroffringa/aoflagger>`_
+* `SAGECal <https://github.com/nlesc-dirac/sagecal>`_
+* `Casacore <https://github.com/casacore/casacore>`_
+* `EveryBeam <https://git.astron.nl/RD/EveryBeam>`_
+
+Please follow their respective installation instructions. 
+
+Docker Installation
+-------------------
+
+To run the pipeline within a Docker container, please build the image located in the ``docker/pipeline`` directory of the `repository <https://git.astron.nl/RD/preprocessing-cwl>`_:
+
+.. code:: console
+
+    $ docker build ../.. -f Dockerfile -t preprocess:latest
+
+Please modify and give the image an appropriate tag with the ``-t`` option. This requires one to install Docker on their system, either `Docker Engine <https://docs.docker.com/engine/>`_ (only available for Linux) or `Docker Desktop <https://docs.docker.com/desktop/>`_ (available for Windows, MacOS, and Linux). Instead of Docker, one could also use `Apptainer <https://apptainer.org/>`_ (previously called Singularity) on Linux.
--- a/docs/source/overview.rst
+++ b/docs/source/overview.rst
+Overview of the Pipeline
+========================
+
+Processing workflow
+-------------------
+
+The pre-processing takes place in the ``preprocess`` step, which is a wrapper to `DP3 <https://dp3.readthedocs.io/>`_. This step consists of four major categories: flagging, demixing, averaging, and compression. The flagging procedure is comprised of multiple stages of flagging with DP3's PreFlagger and flagging of radio-frequency interference (RFI) with `AOFlagger <https://aoflagger.readthedocs.io/>`_. Subsequently, A-team sources are demixed and the data is averaged in time and frequency. Finally, the visibility data is compressed using the `Dysco <https://arxiv.org/abs/1609.02019>`_ compression algorithm.
+
+The pipeline performs the following operations in order:
+
+* flagging edge channels
+* flagging correlation type (e.g., all auto-correlations)
+* flagging low-amplitude signals (below 1E-30)
+* flagging of interfering radio signals
+* demixing of A-team sources
+* averaging in time and frequency
+* compression of visibility data
+
+Output
+------
+
+The Pre-Processing Pipeline produces the following output in the directory specified by ``--outdir``:
+
+* pre-processed LOFAR MeasurementSets
+* a log file with the captured standard out and standard error (``pipeline.log``)
+
+Diagnostics
+-----------
+
+.. note::
+
+   The pipeline will produce a number of diagnostics, however, these are currently not implemented.
+
+User-defined parameters
+-----------------------
+
+Most of these parameters set similarly named DP3 parameters. Hence, please refer to the relevant DP3 documentation pages for further details regarding their possible values; specifically, the following pages: `PreFlagger <https://dp3.readthedocs.io/en/latest/steps/PreFlagger.html>`_, `AOFlagger <https://dp3.readthedocs.io/en/latest/steps/AOFlagger.html>`_, `Demixer <https://dp3.readthedocs.io/en/latest/steps/Demixer.html>`_, and `Output <https://dp3.readthedocs.io/en/latest/steps/Output.html>`_.
+
+**Mandatory parameters:**
+
+* ``msin``: path to the input data (list of MeasurementSets).
+* ``demix_timestep``: number of time steps to average during demixing; this does not affect averaging of the output.
+* ``demix_freqstep``: number of channels to average during demixing; this does not affect averaging of the output.
+* ``avg_timestep``: this parameter defines the number of time steps that the output data will be averaged by.
+* ``avg_freqstep``: this parameter defines the number of channels that the output data will be averaged by.
+
+**Optional parameters:**
+
+*Flagging options:*
+
+* ``preflag_corrtype``: select a type of correlation to flag, e.g., the auto-correlations (default: ``auto``).
+* ``preflag_min_amplitude``: data below this amplitude will be flagged (default: ``1E-30``).
+* ``aoflagger_rfistrategy``: the RFI flagging strategy used by AOFlagger (default: ``lofar-default.lua``).
+
+*Options for demixing:*
+
+* ``demix_skymodel``: the skymodel used by the demixing algorithm (default: ``Ateam.skymodel``).
+* ``demix_sources``: the list of sources to demix, e.g., ``[CasA, CygA]``. Note that these sources have to be present in the provided skymodel. (default: ``[]``.)
+* ``demix_baselines``: select the baselines used to demix, the baseline selection syntax is described in the `DP3 documentation <https://dp3.readthedocs.io/en/latest/steps/Description%20of%20baseline%20selection%20parameters.html>`_ (default: ``[CR]S*&``).
+* ``demix_ignoretarget``: if set to ``true``, the source model of the target will not be taken into account during demixing, i.e., the target will be ignored (default: ``false``).
+* ``demix_lbfgs_robustdof``: the degrees of freedom of the LBFGS solver noise model (default: ``200``).
+* ``demix_lbfgs_historysize``: the history size the LBFGS solver uses to approximate the inverse Hessian (default: ``10``).
+
+*Options for compression:*
+
+* ``dysco_distribution``: compression distribution used by the Dysco compression algorithm (default: ``TruncatedGaussian``).
+* ``dysco_databitrate``: the number of bits per float used to represent the visibility data (default: ``10``).
+* ``dysco_weightbitrate``: the number of bits per float used to represent the weights (default: ``12``).
+
+*Miscellaneous parameters:*
+
+* ``sasid``: identifier of the SAS process that called the pipeline; this unique identifier is prefixed with an 'L'. The SASID is used to name the output MSs, and replaces the Observation ID (default: reuse Observation ID).
+* ``msin_autoweight``: set this parameter to true when the input consist of raw LOFAR data, this will ensure proper weights are set (default: ``true``).
+* ``dp3_numthreads``: the number of threads per process used by DP3 (default: ``10``).
--- a/docs/source/running.rst
+++ b/docs/source/running.rst
+Running the Pipeline
+====================
+
+Starting a pipeline run
+-----------------------
+
+The Pre-Processing Pipeline can be run from the command line using a CWL runner, e.g., ``cwltool`` or ``toil``. Below we describe how the pipeline can be run with these runners.
+
+.. code:: console
+
+    $ cwltool $INSTALL_DIR/workflows/pipeline.cwl input.json
+
+    $ toil-cwl-runner $INSTALL_DIR/workflows/pipeline.cwl input.json
+
+where ``$INSTALL_DIR`` refers to the location where the CWL files have been installed. The pipeline parameters are provided via a JSON file, described at the bottom of this page. Additionally, ``cwltool`` and ``toil`` come with a number of useful command line arguments, some of which are listed below. Please refer to their respective documentation for a full overview.
+
+Starting a run from within a container
+--------------------------------------
+
+If you followed the Docker installation instructions on the :doc:`installation` page, you can run the container using Docker as follows:
+
+.. code:: console
+
+    $ docker run --rm -v <source_directory>:<mount_point> -w <mount_point> preprocess cwltool --no-container /usr/local/share/prep/workflows/pipeline.cwl input.json
+
+Since the Pre-Processing Pipeline is running inside a container, do not forgot to add the ``--no-container`` option to your CWL runner.
+
+``cwltool`` options
+-------------------
+
+There are a number of command-line options you might want consider adding when running ``cwltool``:
+
+* ``--outdir``: specifies the (relative) path to the directory containing the output of the pipeline (make sure to mount this directory when running the pipeline in a container)
+* ``--log-dir``: specifies the location of the log files produces by the ``stdout`` and ``stderr`` of a ``CommandLineTool`` (make sure to mount this directory when running the pipeline in a container)
+* ``--preserve-entire-environment``: use your system's environment variables when manually installing the dependencies
+* ``--no-container``: do not execute jobs in a container (add this when the dependencies have been installed manually)
+* ``--singularity``: use the Apptainer (previously Singularity) runtime for running containers instead of Docker
+* ``--debug``: more verbose output, useful when debugging
+
+Running the pipeline without the ``--no-container`` option will always attempt to run the steps inside their respective (Docker) container. Make sure to mount the output and log directories, specified by ``--outdir`` and ``--log-dir``, when running the pipeline inside a container to ensure the files are not lost after execution.
+
+A full overview of CLI arguments is available in their `documentation <https://cwltool.readthedocs.io/en/latest/cli.html>`_.
+
+``toil`` options
+----------------
+
+Similarly, these options might be of interest when using ``toil``:
+
+* ``--outdir``: specifies the path to the directory containing the output of the pipeline
+* ``--workDir``: specifies the path to the directory where the temporary files generated by Toil should be placed
+* ``--log-dir``: specifies the location of the log files produces by the ``stdout`` and ``stderr`` of a ``CommandLineTool``
+* ``--logFile``: path to the main log file
+* ``—-jobStore``: path to the Toil job-store (must not exist yet)
+* ``—batchSystem``: use a specific batch system of a HPC cluster (e.g., ``slurm`` or ``single_machine``)
+* ``--preserve-entire-environment``: use your system's environment variables when manually installing the dependencies
+* ``--no-container``: do not execute jobs in a container (add this when the dependencies have been installed manually)
+* ``--singularity``: use the Apptainer (previously Singularity) runtime for running containers instead of Docker
+* ``--stats``: with this option Toil collects runtime statistics (they can be used by ``toil stats``)
+
+Make sure to mount the output, log directories, and working directories, when running the pipeline inside a container to ensure the files are not lost after execution.
+
+A full overview of CLI arguments is available in their `documentation <https://toil.readthedocs.io/en/latest/running/cliOptions.html>`_.
+
+Configuring the pipeline
+------------------------
+
+The parameters of the pipeline are provided as a JSON file. As an example, a minimal input could be a list of MeasurementSets (MSs) that you would like to process:
+
+.. code:: json
+
+   {
+        "msin": [
+            {
+                "class": "Directory",
+                "path": "/data/L888536_SAP000_SB026_uv.MS"
+            },
+            {
+                "class": "Directory",
+                "path": "/data/L888536_SAP000_SB027_uv.MS"
+            }
+        ]
+    }
+
+Refer to the :doc:`overview` section for a full overview of all pipeline parameters and their default values.