Skip to content
Snippets Groups Projects
Commit d448282b authored by Mick Veldhuis's avatar Mick Veldhuis
Browse files

Merge branch 'rap-840-add-docs' into 'main'

RAP-840 Add Read the Docs

See merge request !9
parents e5ba73f2 73aa022d
No related branches found
No related tags found
1 merge request!9RAP-840 Add Read the Docs
Pipeline #108241 failed
Pipeline: preprocessing-cwl

#108242

    ......@@ -3,6 +3,9 @@ __pycache__/
    .venv/
    venv/
    # Sphinx documentation
    docs/_build/
    # Data
    *.ms
    *.MS
    ......
    ......@@ -12,11 +12,11 @@ build:
    # Build documentation in the "docs/" directory with Sphinx
    sphinx:
    configuration: docs/conf.py
    configuration: docs/source/conf.py
    # Optionally, but recommended,
    # declare the Python requirements required to build your documentation
    # See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
    python:
    install:
    - requirements: docs/requirements.txt
    # python:
    # install:
    # - requirements: docs/requirements.txt
    ......@@ -4,6 +4,10 @@ The Low-Frequency Array (LOFAR) Pre-Processing Pipeline is a workflow meant to b
    This is an implementation of the pre-processing pipeline in the Common Workflow Language (CWL), which is replacing the Generic Pipeline implementation.
    ## Documentation
    Documentation is available on [Read the Docs](https://lofar-pre-processing-pipeline.readthedocs.io/).
    ## Running the Pipeline
    The pipeline is currently not in a state that should be run yet! However, if you would like to test the development version, please refer to `tests/README.md`.
    ......
    # Configuration file for the Sphinx documentation builder.
    #
    # This file only contains a selection of the most common options. For a full
    # list see the documentation:
    # https://www.sphinx-doc.org/en/master/usage/configuration.html
    # -- Path setup --------------------------------------------------------------
    # If extensions (or modules to document with autodoc) are in another directory,
    # add these directories to sys.path here. If the directory is relative to the
    # documentation root, use os.path.abspath to make it absolute, like shown here.
    #
    # import os
    # import sys
    # sys.path.insert(0, os.path.abspath('.'))
    # -- Project information -----------------------------------------------------
    project = "LOFAR Pre-Processing Pipeline"
    copyright = "2024, Team Rapthor"
    author = "Team Rapthor"
    # -- General configuration ---------------------------------------------------
    extensions = []
    templates_path = ["_templates"]
    # List of patterns, relative to source directory, that match files and
    # directories to ignore when looking for source files.
    # This pattern also affects html_static_path and html_extra_path.
    exclude_patterns = []
    # -- Options for HTML output -------------------------------------------------
    # The theme to use for HTML and HTML Help pages. See the documentation for
    # a list of builtin themes.
    #
    html_theme = "alabaster"
    # Add any paths that contain custom static files (such as style sheets) here,
    # relative to this directory. They are copied after the builtin static files,
    # so a file named "default.css" will overwrite the builtin "default.css".
    # html_static_path = ["_static"]
    \ No newline at end of file
    LOFAR Pre-Processing Pipeline
    ===============================
    The Low-Frequency Array (LOFAR) Pre-Processing Pipeline is a workflow that's meant to be run on raw LOFAR data post-correlation. The input is assumed to have been produced with COBALT (Correlator and Beamformer Application for the LOFAR Telescope) and it's meant to be run on the Central Processing cluster (CEP).
    The pipeline prepares raw LOFAR data for further processing, e.g., with calibration and imaging pipelines, such as `LINC <https://linc.readthedocs.io/>`_ and `Rapthor <https://rapthor.readthedocs.io/>`_. It's a re-implementation of the old Pre-Processing Pipeline, written in the `Common Workflow Language (CWL) <https://www.commonwl.org/>`_; the old pipeline was implemented in the `Generic Pipeline Framework <https://support.astron.nl/softwaregenericpipeline/>`_.
    It performs the following operations:
    * initial flagging of bad data
    * removal of bright off-axis sources (demixing)
    * averaging in time and frequency
    * and compression of the visibility data
    .. attention::
    The `CWL <https://www.commonwl.org/>`_ implementation of the Pre-Processing Pipeline is currently in development. Please beware that it is in an experimental state and not fully functional yet. Use at your own risk!
    Getting Started
    ---------------
    .. toctree::
    :maxdepth: 2
    installation
    running
    Further Details
    ---------------
    .. toctree::
    :maxdepth: 2
    overview
    Downloading and Installation
    ============================
    Instructions for downloading and installing the Pre-Processing Pipeline are summarised below.
    Obtaining the Pre-Processing Pipeline
    -------------------------------------
    The pipeline is written in the `Common Workflow Language (CWL) <https://www.commonwl.org/>`_ and consists of CWL workflows, which can be obtained with:
    .. code:: console
    $ git clone https://git.astron.nl/RD/preprocessing-cwl
    Manual Installation
    -------------------
    To run the pipeline, you need a CWL runner, such as ``cwltool`` or ``toil``. It's recommended to install these in a virtual environment:
    .. code:: console
    $ python3 -m venv venv
    $ source venv/bin/active
    $ pip install cwltool toil[cwl]
    Note that it's also necessary to install NodeJS, which can either be installed via your favourite package manager or using the ``nodejs-wheel`` Python package.
    Additionally, the Pre-Processing Pipeline depends on the following software for processing:
    * `LofarStMan <https://github.com/lofar-astron/LofarStMan>`_
    * `DP3 <https://git.astron.nl/RD/DP3>`_
    * `AOFlagger <https://gitlab.com/aroffringa/aoflagger>`_
    * `SAGECal <https://github.com/nlesc-dirac/sagecal>`_
    * `Casacore <https://github.com/casacore/casacore>`_
    * `EveryBeam <https://git.astron.nl/RD/EveryBeam>`_
    Please follow their respective installation instructions.
    Docker Installation
    -------------------
    To run the pipeline within a Docker container, please build the image located in the ``docker/pipeline`` directory of the `repository <https://git.astron.nl/RD/preprocessing-cwl>`_:
    .. code:: console
    $ docker build ../.. -f Dockerfile -t preprocess:latest
    Please modify and give the image an appropriate tag with the ``-t`` option. This requires one to install Docker on their system, either `Docker Engine <https://docs.docker.com/engine/>`_ (only available for Linux) or `Docker Desktop <https://docs.docker.com/desktop/>`_ (available for Windows, MacOS, and Linux). Instead of Docker, one could also use `Apptainer <https://apptainer.org/>`_ (previously called Singularity) on Linux.
    Overview of the Pipeline
    ========================
    Processing workflow
    -------------------
    The pre-processing takes place in the ``preprocess`` step, which is a wrapper to `DP3 <https://dp3.readthedocs.io/>`_. This step consists of four major categories: flagging, demixing, averaging, and compression. The flagging procedure is comprised of multiple stages of flagging with DP3's PreFlagger and flagging of radio-frequency interference (RFI) with `AOFlagger <https://aoflagger.readthedocs.io/>`_. Subsequently, A-team sources are demixed and the data is averaged in time and frequency. Finally, the visibility data is compressed using the `Dysco <https://arxiv.org/abs/1609.02019>`_ compression algorithm.
    The pipeline performs the following operations in order:
    * flagging edge channels
    * flagging correlation type (e.g., all auto-correlations)
    * flagging low-amplitude signals (below 1E-30)
    * flagging of interfering radio signals
    * demixing of A-team sources
    * averaging in time and frequency
    * compression of visibility data
    Output
    ------
    The Pre-Processing Pipeline produces the following output in the directory specified by ``--outdir``:
    * pre-processed LOFAR MeasurementSets
    * a log file with the captured standard out and standard error (``pipeline.log``)
    Diagnostics
    -----------
    .. note::
    The pipeline will produce a number of diagnostics, however, these are currently not implemented.
    User-defined parameters
    -----------------------
    Most of these parameters set similarly named DP3 parameters. Hence, please refer to the relevant DP3 documentation pages for further details regarding their possible values; specifically, the following pages: `PreFlagger <https://dp3.readthedocs.io/en/latest/steps/PreFlagger.html>`_, `AOFlagger <https://dp3.readthedocs.io/en/latest/steps/AOFlagger.html>`_, `Demixer <https://dp3.readthedocs.io/en/latest/steps/Demixer.html>`_, and `Output <https://dp3.readthedocs.io/en/latest/steps/Output.html>`_.
    **Mandatory parameters:**
    * ``msin``: path to the input data (list of MeasurementSets).
    * ``demix_timestep``: number of time steps to average during demixing; this does not affect averaging of the output.
    * ``demix_freqstep``: number of channels to average during demixing; this does not affect averaging of the output.
    * ``avg_timestep``: this parameter defines the number of time steps that the output data will be averaged by.
    * ``avg_freqstep``: this parameter defines the number of channels that the output data will be averaged by.
    **Optional parameters:**
    *Flagging options:*
    * ``preflag_corrtype``: select a type of correlation to flag, e.g., the auto-correlations (default: ``auto``).
    * ``preflag_min_amplitude``: data below this amplitude will be flagged (default: ``1E-30``).
    * ``aoflagger_rfistrategy``: the RFI flagging strategy used by AOFlagger (default: ``lofar-default.lua``).
    *Options for demixing:*
    * ``demix_skymodel``: the skymodel used by the demixing algorithm (default: ``Ateam.skymodel``).
    * ``demix_sources``: the list of sources to demix, e.g., ``[CasA, CygA]``. Note that these sources have to be present in the provided skymodel. (default: ``[]``.)
    * ``demix_baselines``: select the baselines used to demix, the baseline selection syntax is described in the `DP3 documentation <https://dp3.readthedocs.io/en/latest/steps/Description%20of%20baseline%20selection%20parameters.html>`_ (default: ``[CR]S*&``).
    * ``demix_ignoretarget``: if set to ``true``, the source model of the target will not be taken into account during demixing, i.e., the target will be ignored (default: ``false``).
    * ``demix_lbfgs_robustdof``: the degrees of freedom of the LBFGS solver noise model (default: ``200``).
    * ``demix_lbfgs_historysize``: the history size the LBFGS solver uses to approximate the inverse Hessian (default: ``10``).
    *Options for compression:*
    * ``dysco_distribution``: compression distribution used by the Dysco compression algorithm (default: ``TruncatedGaussian``).
    * ``dysco_databitrate``: the number of bits per float used to represent the visibility data (default: ``10``).
    * ``dysco_weightbitrate``: the number of bits per float used to represent the weights (default: ``12``).
    *Miscellaneous parameters:*
    * ``sasid``: identifier of the SAS process that called the pipeline; this unique identifier is prefixed with an 'L'. The SASID is used to name the output MSs, and replaces the Observation ID (default: reuse Observation ID).
    * ``msin_autoweight``: set this parameter to true when the input consist of raw LOFAR data, this will ensure proper weights are set (default: ``true``).
    * ``dp3_numthreads``: the number of threads per process used by DP3 (default: ``10``).
    Running the Pipeline
    ====================
    Starting a pipeline run
    -----------------------
    The Pre-Processing Pipeline can be run from the command line using a CWL runner, e.g., ``cwltool`` or ``toil``. Below we describe how the pipeline can be run with these runners.
    .. code:: console
    $ cwltool $INSTALL_DIR/workflows/pipeline.cwl input.json
    $ toil-cwl-runner $INSTALL_DIR/workflows/pipeline.cwl input.json
    where ``$INSTALL_DIR`` refers to the location where the CWL files have been installed. The pipeline parameters are provided via a JSON file, described at the bottom of this page. Additionally, ``cwltool`` and ``toil`` come with a number of useful command line arguments, some of which are listed below. Please refer to their respective documentation for a full overview.
    Starting a run from within a container
    --------------------------------------
    If you followed the Docker installation instructions on the :doc:`installation` page, you can run the container using Docker as follows:
    .. code:: console
    $ docker run --rm -v <source_directory>:<mount_point> -w <mount_point> preprocess cwltool --no-container /usr/local/share/prep/workflows/pipeline.cwl input.json
    Since the Pre-Processing Pipeline is running inside a container, do not forgot to add the ``--no-container`` option to your CWL runner.
    ``cwltool`` options
    -------------------
    There are a number of command-line options you might want consider adding when running ``cwltool``:
    * ``--outdir``: specifies the (relative) path to the directory containing the output of the pipeline (make sure to mount this directory when running the pipeline in a container)
    * ``--log-dir``: specifies the location of the log files produces by the ``stdout`` and ``stderr`` of a ``CommandLineTool`` (make sure to mount this directory when running the pipeline in a container)
    * ``--preserve-entire-environment``: use your system's environment variables when manually installing the dependencies
    * ``--no-container``: do not execute jobs in a container (add this when the dependencies have been installed manually)
    * ``--singularity``: use the Apptainer (previously Singularity) runtime for running containers instead of Docker
    * ``--debug``: more verbose output, useful when debugging
    Running the pipeline without the ``--no-container`` option will always attempt to run the steps inside their respective (Docker) container. Make sure to mount the output and log directories, specified by ``--outdir`` and ``--log-dir``, when running the pipeline inside a container to ensure the files are not lost after execution.
    A full overview of CLI arguments is available in their `documentation <https://cwltool.readthedocs.io/en/latest/cli.html>`_.
    ``toil`` options
    ----------------
    Similarly, these options might be of interest when using ``toil``:
    * ``--outdir``: specifies the path to the directory containing the output of the pipeline
    * ``--workDir``: specifies the path to the directory where the temporary files generated by Toil should be placed
    * ``--log-dir``: specifies the location of the log files produces by the ``stdout`` and ``stderr`` of a ``CommandLineTool``
    * ``--logFile``: path to the main log file
    * ``—-jobStore``: path to the Toil job-store (must not exist yet)
    * ``—batchSystem``: use a specific batch system of a HPC cluster (e.g., ``slurm`` or ``single_machine``)
    * ``--preserve-entire-environment``: use your system's environment variables when manually installing the dependencies
    * ``--no-container``: do not execute jobs in a container (add this when the dependencies have been installed manually)
    * ``--singularity``: use the Apptainer (previously Singularity) runtime for running containers instead of Docker
    * ``--stats``: with this option Toil collects runtime statistics (they can be used by ``toil stats``)
    Make sure to mount the output, log directories, and working directories, when running the pipeline inside a container to ensure the files are not lost after execution.
    A full overview of CLI arguments is available in their `documentation <https://toil.readthedocs.io/en/latest/running/cliOptions.html>`_.
    Configuring the pipeline
    ------------------------
    The parameters of the pipeline are provided as a JSON file. As an example, a minimal input could be a list of MeasurementSets (MSs) that you would like to process:
    .. code:: json
    {
    "msin": [
    {
    "class": "Directory",
    "path": "/data/L888536_SAP000_SB026_uv.MS"
    },
    {
    "class": "Directory",
    "path": "/data/L888536_SAP000_SB027_uv.MS"
    }
    ]
    }
    Refer to the :doc:`overview` section for a full overview of all pipeline parameters and their default values.
    0% Loading or .
    You are about to add 0 people to the discussion. Proceed with caution.
    Please register or to comment