Skip to content
Snippets Groups Projects
Forked from ResearchAndDevelopment / LINC
142 commits behind, 7 commits ahead of the upstream repository.
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
running.rst 11.80 KiB

Starting a pipeline

Note

If you are running the deprecated genericpipeline version of the pipeline (prefactor 3.2 or older), please check the :doc:`old instructions page<running_old>`.

Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with cwltool or toil for the HBA calibrator pipeline:

$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json

$ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json

where LINC.json is the input JSON file as described in the chapter :doc:`parset` and <install_dir> the location of the LINC CWL description files.

Note

Instead of specifying all options in LINC.json the user can also use command line options to override the defaults.

By default, LINC will execute the processing steps (like Dp3, etc.) inside a Docker container. If you prefer to use Singularity instead, the option --singularity can be added to the cwltool command line (see options below).

Note

Do not run your cwltool or toil calls inside the Docker or Singularity container unless this is exactly what you intend to do (see next section)

The following list provides the workflows to call in the command above for standard LOFAR observations. These provide the proper pipeline with pre-defined parameters (defaults) for HBA and LBA observations:

LINC workflow HBA LBA
LINC_calibrator.cwl HBA_calibrator.cwl LBA_calibrator.cwl
LINC_target.cwl HBA_target.cwl LBA_target.cwl

Note

The LBA target workflow is not (yet) available.

If you have installed cwltool or toil locally on your system, LINC will pull automatically the right (u)Docker/Singularity image for you.

Running LINC from within a (u)Docker/Singularity image

If you do not want to install cwltool or toil locally on your system you need to pull the software images first (otherwise you do not need to do that): For Docker:

$ docker pull astronrd/linc

, uDocker:

$ udocker pull astronrd/linc

, and for Singularity:

$ singularity pull docker://astronrd/linc

To run LINC you only need to add the container-specific execution command and make sure that all necessary volumes are mounted read-write (<mount_points>) and are thus accessible from inside the container, e.g.,:

$ singularity exec --bind <mount_points>:<mount_points> <linc.sif> cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json

where <linc.sif> is the location of the Singularity image, or

$ docker run --rm <docker_options> -v <mount_points>:<mount_points> -w $PWD astronrd/linc cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json

Since you are running LINC inside a container, do not forget to add the --no-container flag to your call, no matter whether you use Singularity or Docker. Do not use the --singularity flag.

Pipeline options for cwltool

The following <cwl_options> are recommended to use for running LINC with cwltool. Please check carefully which options to choose depending on the way how you run LINC:

  • ---outdir: specifies the location of the final pipeline output directory (results)
  • ---tmpdir-prefix: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using /tmp)
  • ---leave-tmpdir: do not delete intermediate data products (use this if you need debugging)
  • ---parallel: jobs will run in parallel (highly recommended to achieve decent processing speed)
  • ---singularity: use Singularity instead of Docker (necessary if you want to use Singularity)
  • ---user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)
  • ---no-container: don't use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
  • ---preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
  • ---debug: more verbose output (use only for debugging the pipeline)

While the pipeline runs, the terminal will output the current state of the pipeline. For debugging it is recommended to run cwltool inside a screen or to pipe the output into a runtime logfile:

$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1

A fairly typical run that uses Singularity can look similar to this:

$ cwltool \
  --singularity \
  --outdir "/data/myproject/Linc-L628614" \
  --log-dir "/data/myproject/Log-L628614" \
  --tmpdir-prefix "/data/myproject/Tmp-L628614/" \
  ~/.local/share/linc/workflows/HBA_target.cwl \
  linc-L628614.json

In the specified --tmpdir_prefix all temporary folders and files are generated. At the end of the run those files can be deleted.

Note

cwltool has no option to resume a failed/crashed run. If you need this option have a look at toil.

Pipeline options for toil

The following <cwl_options> are recommended to use for running LINC with toil:

  • ---workDir: specifies the location of toil-specific intermediate data products
  • ---tmpdir-prefix: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using /tmp)
  • ---outDir: specificies the lcoation of the final data products
  • ---leave-tmpdir: do not delete intermediate data products (use this if you need debugging)
  • ---jobStore: location of the jobStore ("statefile")
  • ---writeLogs: location of the pipeline job logfiles
  • ---logFile: location of the main pipeline logfile
  • ---logLevel: can be CRITICAL, ERROR, WARNING, INFO or DEBUG
  • ---batchSystem: use speficic batch system of an HPC cluster or similar, e.g. slurm or single_machine
  • ---stats: creates runtime statistics
  • ---maxLocalJobs: amount of local jobs to be run at the same time ("max_per_node")
  • ---retryCount: amount of retries for each failed pipeline job
  • ---singularity: use Singularity instead of Docker (necessary if you want to use Singularity)
  • ---user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)
  • ---preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
  • ---no-container: don't use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)

Stopping and restarting the pipeline

You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it). If you use cwltool the pipeline can not be resumed from the stage where it was terminated. You will have to restart the full pipeline.

You can restart a pipeline if using toil through adding the parameter --restart on the terminal. If you want to start from scratch you should delete the directory created via jobStore and all intermediate data products (usually specified via the --workDir parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.

Troubleshooting

With cwltool a pipeline crash is reported via this message:

WARNING Final process status is permanentFail

If you encounter such a permanent fail it is highly recommend to pipe the output of the pipeline run into a logfile and add --leave-tmpdir to the <cwl_options>:

$ cwltool --leave-tmpdir <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1

In order to figure out at which step the pipeline failed you can search for the term permanentFail in the toil or cwltool logfile:

$ more logfile | grep "Permanent Fail"

WARNING [job find_skymodel_cal] completed permanentFail
WARNING [step find_skymodel_cal] completed permanentFail
INFO [workflow prep] completed permanentFail
WARNING [step prep] completed permanentFail
INFO [workflow linc] completed permanentFail
WARNING [step linc] completed permanentFail
INFO [workflow ] completed permanentFail
WARNING [job check_ateam_separation] completed permanentFail
WARNING [step check_ateam_separation] completed permanentFail
WARNING Final process status is permanentFail

With that information it is possible to identify the first failed job/step to be find_skymodel_cal. To find the corresponding part of the logfile where the step was launched you search for [job find_skymodel_cal]. The corresponding logfiles of this job/step can be found in the <tmpdir> (specified with --tmpdir-prefix):

$ find <tmpdir> | grep find_skymodel_cal

<tmpdir>/n6zgif6j/find_skymodel_cal.log
<tmpdir>/n6zgif6j/find_skymodel_cal_err.log

$ cat <tmpdir>/n6zgif6j/find_skymodel_cal.log <tmpdir>/n6zgif6j/find_skymodel_cal_err.log

Traceback (most recent call last):
File "find_sky.py", line 27, in <module>
    output = find_skymodel(mss, skymodels, max_separation_arcmin=max_separation_arcmin)
File "/usr/local/bin/find_skymodel_cal.py", line 130, in main
    ra, dec = grab_pointing(input2strlist_nomapfile(ms_input)[0])
File "/usr/local/bin/find_skymodel_cal.py", line 26, in grab_pointing
    [ra, dec] = pt.table(MS+'::FIELD', readonly=True, ack=False).getcol('PHASE_DIR')[0][0] * 180 / math.pi
File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 372, in __init__
    Table.__init__(self, tabname, lockopt, opt)
RuntimeError: Table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS::FIELD does not exist

In this example the table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS seems to be missing. Since in this example we make use of Docker we need to find the location of this file on our harddisk:

$ more logfile  | grep "/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS"

--mount=type=bind,source=/data/L667521_SB000_uv.MS,target=/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS,readonly \

$ ls -d /data/L667521_SB000_uv.MS

ls: cannot access '/data/L667521_SB000_uv.MS': No such file or directory

So obviously we have specified a non-existing data set as an input in the LINC.json

In toil the main logfile is written to --logFile and logfiles from single jobs/steps are put into --writeLogs. If a job has failed the corresponding logfile location is reported in the main logfile.

If there is no error message reported or no corresponding logfile available, check for all lines leading with ERROR or error to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong.

To get help on new or already known issues, please check :ref:`help` for further support and information.