Skip to content
Snippets Groups Projects
Commit f42c48a1 authored by Vlad Kondratiev's avatar Vlad Kondratiev
Browse files

Merge branch 'update-docs' into 'main'

Update docs

See merge request !4
parents 59de7f8f 999398d1
No related branches found
No related tags found
1 merge request!4Update docs
Pipeline #114632 passed
Pipeline: pulp2-cwl-redigitization

#114633

    .git/
    venv/
    .venv/
    docs/_build/
    !scripts/
    # Python, sphinx
    .venv/
    venv/
    docs/_build
    # BF data
    *.h5
    *.raw
    # Etc
    *.old
    *.old2
    *.removed
    ......@@ -10,10 +10,10 @@ build_ci_runner_image:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - |
    if docker pull $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG; then
    docker build --cache-from $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG --tag $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG docker/ci-runner
    docker build --cache-from $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG --tag $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG . -f docker/workflow/Dockerfile
    else
    docker pull $CI_REGISTRY_IMAGE/ci-build-runner:latest || true
    docker build --cache-from $CI_REGISTRY_IMAGE/ci-build-runner:latest --tag $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG docker/ci-runner
    docker build --cache-from $CI_REGISTRY_IMAGE/ci-build-runner:latest --tag $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG . -f docker/workflow/Dockerfile
    fi
    - docker push $CI_REGISTRY_IMAGE/ci-build-runner:$CI_COMMIT_REF_SLUG # push the image
    - |
    ......
    ......@@ -17,11 +17,8 @@ sphinx:
    # Optionally, but recommended,
    # declare the Python requirements required to build your documentation
    # See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
    # python:
    # install:
    # - requirements: docs/requirements.txt
    # Optionally declare the Python requirements required to build your docs
    #python:
    # install:
    python:
    install:
    - requirements: docs/requirements.txt
    # - method: pip
    # path: .
    # LOFAR2 Beamformed Re-digitization pipeline
    This is CWL pipeline to convert LOFAR raw beamformed complex-voltage (XXYY) 32-bit HDF5 data to 8-bit HDF5 data. This pipeline represents the CWL implementation of existing LOFAR1 functionality in Pulsar Pipeline (PulP) by means of `digitize.py` script written by Marten van Kerkwijk to be run in LOFAR2 operational framework. The script itself was slightly modified to run in Python3. Please not, that pipeline is still in development, so some changes in the workflow and documentation might happen in the future.
    This is a CWL pipeline to convert LOFAR raw beamformed complex-voltage (XXYY) 32-bit HDF5 data to 8-bit HDF5 data. This pipeline represents the CWL implementation of existing LOFAR1 functionality in the Pulsar Pipeline (PulP) by means of the `digitize.py` script written by Marten van Kerkwijk to be run in the LOFAR operational framework. The script itself was slightly modified to run in Python3. Please note, that the pipeline is still in development, so some changes in the workflow and documentation might happen in the future.
    Pulsar Pipeline v2.0 (PulP2) is a set of workflows to process LOFAR raw pulsar beamformed data in several ways, and consists of workflows to redigitize raw XXYY 32-bit data to 8 bits (current repository), to run DSPSR program to dedisperse and fold pulsar data (pulsar timing), and to conver raw HDF5 Stokes data to 8-bit PSRFITS data.
    PulP2 is implemented in the Common Workflow Language (CWL), while original PulP was developed in Python2 and was run in the LOFAR framework as a "black box" using a dedicated wrapper.
    The documenation of the LOFAR2 Beamformed Re-digitization pipeline can be found [here](https://lofar2-beamformed-re-digitization-pipeline.readthedocs.io/en/latest/index.html).
    More information about the original PulP and discussions related to PulP2 development can be found here:
    - PulP / BF pulsar processing for LOFAR2.0 (https://support.astron.nl/confluence/pages/viewpage.action?pageId=154805553)
    - PulP2.0 (https://support.astron.nl/confluence/display/L2COM/PULP2.0)
    FROM ubuntu:24.04
    #
    # Lightweight container with minimum to run digitize3.py
    #
    ENV INSTALL_DIR=/usr/local
    RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get -y update && apt-get -y upgrade && \
    apt-get install -y python-is-python3 && \
    apt-get autoremove --purge && \
    # apt-get install -y wget python3 python3-pip python3-numpy git vim \
    # python3-pkgconfig libhdf5-dev hdf5-tools python3-setuptools cython3 && \
    apt-get install -y wget python3 python3-numpy git vim \
    libhdf5-dev hdf5-tools python3-h5py cwltool && \
    rm -rf /var/lib/apt/lists/*
    # Install cwltool and toil
    #RUN python3 -m pip install cwltool toil[cwl]
    #RUN cd ${INSTALL_DIR} && wget https://files.pythonhosted.org/packages/cc/0c/5c2b0a88158682aeafb10c1c2b735df5bc31f165bfe192f2ee9f2a23b5f1/h5py-3.12.1.tar.gz && \
    # tar xvfz h5py-3.12.1.tar.gz && cd h5py-3.12.1 && python setup.py install --prefix=${INSTALL_DIR}
    #RUN cd ${INSTALL_DIR} && mkdir -p src && cd src/ && git clone https://git.astron.nl/RD/pulp2-cwl-redigitization && \
    # cd pulp2-cwl-redigitization/scripts && cp digitize3.py ${INSTALL_DIR}/bin
    COPY ./scripts/digitize3.py ${INSTALL_DIR}/bin
    RUN digitize3.py -h
    FROM ubuntu:24.04
    #
    # Lightweight container with minimum to run digitize3.py
    #
    ENV INSTALL_DIR=/usr/local
    RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get -y update && apt-get -y upgrade && \
    apt-get install -y python-is-python3 && \
    apt-get autoremove --purge && \
    apt-get install -y \
    wget \
    python3 \
    python3-numpy \
    git \
    vim \
    libhdf5-dev \
    hdf5-tools \
    python3-h5py \
    cwltool && \
    rm -rf /var/lib/apt/lists/*
    RUN cd ${INSTALL_DIR} && mkdir -p src && cd src/ && git clone https://git.astron.nl/RD/pulp2-cwl-redigitization && \
    cd pulp2-cwl-redigitization/scripts && cp digitize3.py ${INSTALL_DIR}/bin
    RUN digitize3.py -h
    FROM ubuntu:24.04
    FROM ubuntu:22.04
    #
    # Lightweight container with minimum to run digitize3.py
    ......@@ -18,9 +18,14 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
    libhdf5-dev \
    hdf5-tools \
    python3-h5py \
    pip \
    cwltool && \
    rm -rf /var/lib/apt/lists/*
    COPY ./scripts/digitize3.py ${INSTALL_DIR}/bin
    # Install toil
    RUN python3 -m pip install --no-cache-dir --upgrade toil[cwl]
    RUN digitize3.py -h
    COPY . ${INSTALL_DIR}/src/pulp2-cwl-redigitization
    COPY scripts/digitize3.py ${INSTALL_DIR}/bin
    RUN ${INSTALL_DIR}/bin/digitize3.py -h
    sphinx_rtd_theme
    docs/source/_static/psr2.png

    18.6 KiB

    ......@@ -24,7 +24,8 @@ author = "Team Rapthor"
    # -- General configuration ---------------------------------------------------
    extensions = []
    #extensions = []
    extensions = [ "sphinx_rtd_theme" ]
    templates_path = ["_templates"]
    ......@@ -38,54 +39,15 @@ exclude_patterns = []
    # The theme to use for HTML and HTML Help pages. See the documentation for
    # a list of builtin themes.
    #
    html_theme = "alabaster"
    #html_theme = 'furo'
    #html_theme = 'sphinx_book_theme'
    #html_theme = 'sphinx_rtd_theme'
    #html_theme = "alabaster"
    html_theme = 'sphinx_rtd_theme'
    # Add any paths that contain custom static files (such as style sheets) here,
    # relative to this directory. They are copied after the builtin static files,
    # so a file named "default.css" will overwrite the builtin "default.css".
    # html_static_path = ["_static"]
    # Theme options are theme-specific and customize the look and feel of a theme
    # further. For a list of options available for each theme, see the
    # documentation.
    #
    #html_theme_options = {
    # "logo": {
    # "image_light": "GBO-vertical-PGGradient.svg",
    # "image_dark": "GBO-vertical-PGGradient.svg"
    # },
    # "repository_url": "https://github.com/aschmiedeke/gbtdocs",
    # "use_repository_button": True,
    # "use_edit_page_button": False,
    # "use_issues_button": True,
    #}
    #html_context = {
    # "github_user": "aschmiedeke",
    # "github_repo": "gbtdocs",
    # "github_version": "main",
    # "doc_path": "docs/source",
    #}
    #html_show_sourcelink = False
    ## for furo theme
    #html_theme_options = {
    # "light_logo": "GBO-vertical-PGGradient.svg",
    # "dark_logo": "GBO-vertical-PGGradient.svg",
    # "sidebar_hide_name": True,
    # #"announcement": "The GBT is currently offline for maintenance and expected to return to full operations by the end of September.",
    # "dark_css_variables": {"color-announcement-background": "darkred"},
    # "light_css_variables": {"color-announcement-background": "darkred"},
    # "source_repository": "https://github.com/aschmiedeke/gbtdocs/",
    # "source_branch": "main",
    # "source_directory": "docs/source/",
    # #"use_edit_page_button": True,
    # #"use_source_button": True,
    # #"use_issues_button": True,
    # #"use_download_button": True,
    # #"use_sidenotes": True
    # }
    html_static_path = ["_static"]
    html_logo = "_static/psr2.png"
    html_theme_options = {
    'logo_only': False,
    'display_version': False,
    }
    Getting Started
    ===============
    Download
    --------
    .. code-block:: console
    $ git clone https://git.astron.nl/RD/pulp2-cwl-redigitization
    .. _builddocker:
    Building docker container to run the pipeline
    ---------------------------------------------
    To run the pipeline within a `Docker <https://www.docker.com/>`_ container, use the Dockerfile located in `docker/workflow` directory of the repository to build the image:
    .. code-block:: console
    $ cd pulp2-cwl-redigitization
    $ docker build . -f docker/workflow/Dockerfile -t pulp2-cwl-redigitization:latest
    One can specify their own name and tag for the docker image by providing different value for the `-t` option. If your containerization tool of choice is `Apptainer <https://apptainer.org/>`_, please refer to online documentation of how to use the Dockerfile to build the Apptainer container.
    Manual setup to run the pipeline
    --------------------------------
    To run the pipeline, you need a `CWL <https://www.commonwl.org/>`_ runner, such as ``cwltool`` or ``toil``. It is also necessary to install `NodeJS <https://nodejs.org/>`_ required by CWL step of the workflow. It's recommended to install these in a virtual environment:
    .. code-block:: console
    $ python3 -m venv venv
    $ source venv/bin/activate
    $ pip install cwltool toil[cwl] nodejs-wheel
    In order to run the pipeline it is also necessary to install HDF5 library and Python module to access HDF5 files ``h5py``. In Ubuntu the corresponding packages are called ``libhdf5-dev`` and ``python3-h5py``. Optionally, ``hdf5-tools`` can also be installed to manipulate the HDF5 files. In Ubuntu, these packages can be installed as:
    .. code-block:: console
    $ apt-get install -y libhdf5-dev hdf5-tools python3-h5py
    Running the pipeline
    --------------------
    The LOFAR2 Beamformed Re-digitization Pipeline can be run from the command line using a `CWL <https://www.commonwl.org/>`_ runner, e.g., ``cwltool`` or ``toil``, such as:
    .. code-block:: console
    $ cwltool ${INSTALL_DIR}/pulp2-cwl-redigitization/workflows/pulp2-xxyy-8bit-requantisation.cwl input.json
    or
    .. code-block:: console
    $ toil-cwl-runner ${INSTALL_DIR}/pulp2-cwl-redigitization/workflows/pulp2-xxyy-8bit-requantisation.cwl input.json
    where ``${INSTALL_DIR}`` refers to the directory where pipeline repository ``pulp2-cwl-redigitization`` has been installed, and ``input.json`` file is the JSON file with the input parameters. You can see the example of input JSON file(s) in the ``tests/`` directory in the repository. The pipeline parameters are described in the :doc:`overview` page. Additionally, ``cwltool`` and ``toil`` come with a number of useful command line arguments, some of which are listed below. Please refer to their respective documentation for a full overview (`cwltool <https://cwltool.readthedocs.io/>`_ and `toil <https://toil.readthedocs.io/>`_).
    Running the pipeline from within a container
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    If you followed the Docker installation instructions in the corresponding section :ref:`above <builddocker>`, you can run the container using Docker as follows:
    .. code-block:: console
    $ docker run --rm -v <source_directory>:<mount_point> -w <mount_point> pulp2-cwl-redigitization:latest cwltool /usr/local/src/pulp2-cwl-redigitization/workflows/pulp2-xxyy-8bit-requantisation.cwl input.json
    There are a number of command-line options you might want consider adding when running ``cwltool`` and ``toil``. For an overview of those we refer you to the corresponding sections of documentation for `LOFAR Pre-Processing Pipeline <https://lofar-pre-processing-pipeline.readthedocs.io/en/latest/running.html>`_.
    LOFAR2 Beamformed Re-digitization Pipeline
    ==========================================
    This is `Common Workflow Language (CWL) <https://www.commonwl.org/>`_ pipeline to convert LOFAR raw beamformed complex-voltage (XXYY) 32-bit HDF5 data to 8-bit HDF5 data. This pipeline represents the CWL implementation of existing LOFAR1 functionality in Pulsar Pipeline (PulP) by means of `digitize.py` script written by Marten van Kerkwijk to be run in LOFAR2 operational framework. The script itself was slightly modified to run in Python3. Please not, that pipeline is still in development, so some changes in the workflow and documentation might happen in the future. The name of the pipeline is also not finally decided, the current development working name is PulP2-XXYY-8bit-redigitization, but more sensible name (such as "LOFAR2 Beamformed Re-digitization Pipeline") might be agreed upon in near future.
    This is a `Common Workflow Language (CWL) <https://www.commonwl.org/>`_ pipeline to convert LOFAR raw beamformed complex-voltage (XXYY) 32-bit HDF5 data to 8-bit HDF5 data. This pipeline represents the CWL implementation of existing LOFAR1 functionality in the Pulsar Pipeline (PulP) by means of the `digitize.py` script written by Marten van Kerkwijk to be run in the LOFAR operational framework. The script itself was slightly modified to run in Python3. Please note, that the pipeline is still in development, so some changes in the workflow and documentation might happen in the future. The name of the pipeline is also not finally decided, the current development working name is PulP2-XXYY-8bit-redigitization, but more sensible name (such as "LOFAR2 Beamformed Re-digitization Pipeline") might be agreed upon in the near future.
    .. toctree::
    :maxdepth: 2
    :caption: Contents:
    Home <self>
    getting_started
    overview
    credits
    Indices and tables
    ------------------
    * :ref:`genindex`
    * :ref:`modindex`
    * :ref:`search`
    Overview
    ========
    Pipeline overview
    =================
    The pipeline converts raw `LOFAR <https://www.lofar.eu/>`_ beamformed complex-voltage (XXYY) 32-bit `HDF5 <https://www.hdfgroup.org/solutions/hdf5/>`_ data to 8-bit `HDF5 <https://www.hdfgroup.org/solutions/hdf5/>`_ data. The re-digitization happens in the ``digitize.cwl`` step, which is a wrapper for the Python script `digitize3.py` (part of the repository in the ``scripts/`` directory). This script itself is a slightly modified version of original ``digitize.py`` `script <https://github.com/mhvk/scintellometry/blob/master/scintellometry/lofar/digitize.py>`_ written by Marten van Kerkwijk.
    Input parameters
    ----------------
    * ``h5in``: input HDF5 LOFAR BF ``.h5`` files. Each of these files should have the corresponding ``.raw`` file in the same directory. The workflow will then do the conversion on these ``.raw`` files and update the metadata in the ``.h5`` files.
    * ``out_dir``: the output directory for the new re-digitized data, ``.raw`` and ``.h5`` files. By default, the name of these directory is ``8-bit`` relative to the directory specified using ``--outdir`` option of ``cwltool``, or if not specified relative to the working directory of the docker (i.e. directory used in the ``-w`` option of the ``docker run`` command).
    * ``nsigma``: digitization threshold (in standard deviations, default is 5.0). All data fluctuations above/below the threshold will be clipped.
    Output
    ------
    The pipeline produces the following output in the directory specified by ``--outdir``:
    * ``out_dir``: the output directory, specified as the input parameter, or ``8-bit`` by default.
    * new re-digitized ``.h5`` and ``.raw`` files in the output directory given by ``out_dir`` parameter.
    * ``pipeline.log``: a concatenated log file capturing standard output for processing of all input ``.h5`` files.
    * ``pipeline_error.log``: a concatenated log file capturing standard error for processing of all input ``.h5`` files.
    Configuring the pipeline
    ------------------------
    The parameters of the pipeline are provided as a `JSON <https://www.json.org>`_ file.
    Here is the example of an input JSON file:
    .. code-block:: JSON
    {
    "h5in": [
    {
    "class": "File",
    "path": "/data/scratch/L2049631/32-bit/L2049631_SAP000_B000_S0_P000_bf.h5"
    },
    {
    "class": "File",
    "path": "/data/scratch/L2049631/32-bit/L2049631_SAP000_B000_S1_P000_bf.h5"
    },
    {
    "class": "File",
    "path": "/data/scratch/L2049631/32-bit/L2049631_SAP000_B000_S2_P000_bf.h5"
    },
    {
    "class": "File",
    "path": "/data/scratch/L2049631/32-bit/L2049631_SAP000_B000_S3_P000_bf.h5"
    }
    ],
    "out_dir": "8-bit",
    "nsigma": 5.1
    }
    The ``out_dir`` and/or ``nsigma`` parameters can be omitted. Then, their default values will be used.
    # Testing the pipeline
    The `tests/` directory of this repository currently contains input JSON files to test the pipeline either on [CEP4](https://science.astron.nl/telescopes/lofar/lofar-system-overview/lofar-computing-resources/cep-and-lta-computing-facilities/) or [DAS-6](https://www.cs.vu.nl/das6/) clusters.
    ## Test data
    The test 32-bit raw LOFAR BF complex-voltage data are 5-min HBA-Low observation of the pulsar B1919+21 with single station CS032 with the `SASId`=2049631. In total, the volume is about 18 GB with there four `.raw` and four `.h5` files. Each `.raw` file is about 4.4 GB:
    ```
    L2049631_SAP000_B000_S0_P000_bf.h5
    L2049631_SAP000_B000_S0_P000_bf.raw
    L2049631_SAP000_B000_S1_P000_bf.h5
    L2049631_SAP000_B000_S1_P000_bf.raw
    L2049631_SAP000_B000_S2_P000_bf.h5
    L2049631_SAP000_B000_S2_P000_bf.raw
    L2049631_SAP000_B000_S3_P000_bf.h5
    L2049631_SAP000_B000_S3_P000_bf.raw
    ```
    On [CEP4](https://science.astron.nl/telescopes/lofar/lofar-system-overview/lofar-computing-resources/cep-and-lta-computing-facilities/) the data are located here:
    ```
    /data/kondratiev/pulp2-data/digitize/L2049631/32-bit
    ```
    and on [DAS-6](https://www.cs.vu.nl/das6/) here:
    ```
    /var/scratch/vkondrat/pulp2-data/digitize/L2049631/32-bit
    ```
    If you would like to test the pipeline, but do not have access to these test raw LOFAR data,
    please contact Team Rapthor.
    ## Running the tests
    To run these tests you need to have a CWL runner, such as `cwltool` or `toil`, installed on your system.
    For example, on [CEP4](https://science.astron.nl/telescopes/lofar/lofar-system-overview/lofar-computing-resources/cep-and-lta-computing-facilities/) with `cwltool` you could run the top-level `pulp2-xxyy-8bit-requantisation.cwl` workflow as follows:
    ```
    cwltool --debug --preserve-entire-environment --outdir=pipeline-out ${INSTALL_DIR}/workflows/pulp2-xxyy-8bit-requantisation.cwl ${INSTALL_DIR}/tests/digitize_input.json
    ```
    where `${INSTALL_DIR}` refers to the location of this repository on your system.
    To run the same command on [DAS-6](https://www.cs.vu.nl/das6/) you need to modify the paths for input files in the input JSON file, or use another JSON file `digitize_das6_input.json` already tailored for [DAS-6](https://www.cs.vu.nl/das6/) cluster.
    Please modify the command as desired, for an overview of possible arguments run `cwltool --help` or refer to their [Read the Docs pages](https://cwltool.readthedocs.io/).
    ## Validate tests
    It might also be useful to validate the workflow and its input. This is achieved by running:
    ```
    cwltool --validate ${INSTALL_DIR}/workflows/pulp2-xxyy-8bit-requantisation.cwl ${INSTALL_DIR}/tests/digitize_input.json
    ```
    This will not only validate the `pulp2-xxyy-8bit-requantisation.cwl` workflow, but also throw an error when missing parameters are not defined in `digitize_input.json` (or `digitize_das6_input.json` if you run the command on [DAS-6](https://www.cs.vu.nl/das6/)).
    The documenation of the LOFAR2 Beamformed Re-digitization pipeline can be found [here](https://lofar2-beamformed-re-digitization-pipeline.readthedocs.io/en/latest/index.html).
    0% Loading or .
    You are about to add 0 people to the discussion. Proceed with caution.
    Please register or to comment