Starting a pipeline
Note
If you are running the deprecated genericpipeline version of the pipeline (prefactor 3.2 or older), please check the :doc:`old instructions page<running_old>`.
Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with cwltool
or toil
for the HBA calibrator pipeline:
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json
$ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json
where LINC.json
is the input JSON file as described in the chapter :doc:`parset` and <install_dir>
the location of the LINC CWL description files.
Note
Instead of specifying all options in LINC.json
the user can also use command line options to override the defaults.
By default, LINC will execute the processing steps (like Dp3, etc.) inside a Docker container. If you prefer to use Singularity instead, the option
--singularity
can be added to the cwltool command line (see options below).
Note
Do not run your cwltool
or toil
calls inside the Docker or Singularity container unless this is exactly what you intend to do (see next section)
The following list provides the workflows to call in the command above for standard LOFAR observations. These provide the proper pipeline with pre-defined parameters (defaults) for HBA and LBA observations:
LINC workflow | HBA | LBA |
LINC_calibrator.cwl |
HBA_calibrator.cwl |
LBA_calibrator.cwl |
LINC_target.cwl |
HBA_target.cwl |
LBA_target.cwl |
Note
The LBA target workflow is not (yet) available.
If you have installed cwltool
or toil
locally on your system, LINC will pull automatically the right (u)Docker/Singularity image for you.
Running LINC from within a (u)Docker/Singularity image
If you do not want to install cwltool
or toil
locally on your system you need to pull the software images first (otherwise you do not need to do that):
For Docker:
$ docker pull astronrd/linc
, uDocker:
$ udocker pull astronrd/linc
, and for Singularity:
$ singularity pull docker://astronrd/linc
To run LINC you only need to add the container-specific execution command and make sure that all necessary volumes are mounted read-write (<mount_points>
) and are thus accessible from inside the container, e.g.,:
$ singularity exec --bind <mount_points>:<mount_points> <linc.sif> cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json
where <linc.sif>
is the location of the Singularity image, or
$ docker run --rm <docker_options> -v <mount_points>:<mount_points> -w $PWD astronrd/linc cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json
Since you are running LINC inside a container, do not forget to add the --no-container
flag to your call, no matter whether you use Singularity or Docker. Do not use the --singularity
flag.
Pipeline options for cwltool
The following <cwl_options>
are recommended to use for running LINC with cwltool
.
Please check carefully which options to choose depending on the way how you run LINC:
- ---outdir: specifies the location of the final pipeline output directory (results)
- ---tmpdir-prefix: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using /tmp)
- ---leave-tmpdir: do not delete intermediate data products (use this if you need debugging)
- ---parallel: jobs will run in parallel (highly recommended to achieve decent processing speed)
- ---singularity: use Singularity instead of Docker (necessary if you want to use Singularity)
- ---user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)
- ---no-container: don't use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
- ---preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
- ---debug: more verbose output (use only for debugging the pipeline)
While the pipeline runs, the terminal will output the current state of the pipeline. For debugging it is recommended to
run cwltool inside a screen
or to pipe the output into a runtime logfile:
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1
A fairly typical run that uses Singularity can look similar to this:
$ cwltool \
--singularity \
--outdir "/data/myproject/Linc-L628614" \
--log-dir "/data/myproject/Log-L628614" \
--tmpdir-prefix "/data/myproject/Tmp-L628614/" \
~/.local/share/linc/workflows/HBA_target.cwl \
linc-L628614.json
In the specified --tmpdir_prefix
all temporary folders and files are generated. At the end of the run those files can be deleted.
Note
cwltool
has no option to resume a failed/crashed run. If you need this option have a look at toil
.
Pipeline options for toil
The following <cwl_options>
are recommended to use for running LINC with toil
:
- ---workDir: specifies the location of
toil
-specific intermediate data products- ---tmpdir-prefix: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using /tmp)
- ---outDir: specificies the lcoation of the final data products
- ---leave-tmpdir: do not delete intermediate data products (use this if you need debugging)
- ---jobStore: location of the jobStore ("statefile")
- ---writeLogs: location of the pipeline job logfiles
- ---logFile: location of the main pipeline logfile
- ---logLevel: can be CRITICAL, ERROR, WARNING, INFO or DEBUG
- ---batchSystem: use speficic batch system of an HPC cluster or similar, e.g.
slurm
orsingle_machine
- ---stats: creates runtime statistics
- ---maxLocalJobs: amount of local jobs to be run at the same time ("max_per_node")
- ---retryCount: amount of retries for each failed pipeline job
- ---singularity: use Singularity instead of Docker (necessary if you want to use Singularity)
- ---user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)
- ---preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
- ---no-container: don't use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
Stopping and restarting the pipeline
You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it).
If you use cwltool
the pipeline can not be resumed from the stage where it was terminated. You will have to restart the full pipeline.
You can restart a pipeline if using toil
through adding the parameter --restart
on the terminal. If you want to start from scratch you should delete the directory created via jobStore
and all intermediate data products (usually specified via the --workDir
parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.
Troubleshooting
With cwltool
a pipeline crash is reported via this message:
WARNING Final process status is permanentFail
If you encounter such a permanent fail it is highly recommend to pipe the output of the pipeline run into a logfile and add --leave-tmpdir
to the <cwl_options>
:
$ cwltool --leave-tmpdir <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1
In order to figure out at which step the pipeline failed you can search for the term permanentFail
in the toil
or cwltool
logfile:
$ more logfile | grep "Permanent Fail"
WARNING [job find_skymodel_cal] completed permanentFail
WARNING [step find_skymodel_cal] completed permanentFail
INFO [workflow prep] completed permanentFail
WARNING [step prep] completed permanentFail
INFO [workflow linc] completed permanentFail
WARNING [step linc] completed permanentFail
INFO [workflow ] completed permanentFail
WARNING [job check_ateam_separation] completed permanentFail
WARNING [step check_ateam_separation] completed permanentFail
WARNING Final process status is permanentFail
With that information it is possible to identify the first failed job/step to be find_skymodel_cal
. To find the corresponding part of the logfile where the step was launched you search for [job find_skymodel_cal]
.
The corresponding logfiles of this job/step can be found in the <tmpdir>
(specified with --tmpdir-prefix
):
$ find <tmpdir> | grep find_skymodel_cal
<tmpdir>/n6zgif6j/find_skymodel_cal.log
<tmpdir>/n6zgif6j/find_skymodel_cal_err.log
$ cat <tmpdir>/n6zgif6j/find_skymodel_cal.log <tmpdir>/n6zgif6j/find_skymodel_cal_err.log
Traceback (most recent call last):
File "find_sky.py", line 27, in <module>
output = find_skymodel(mss, skymodels, max_separation_arcmin=max_separation_arcmin)
File "/usr/local/bin/find_skymodel_cal.py", line 130, in main
ra, dec = grab_pointing(input2strlist_nomapfile(ms_input)[0])
File "/usr/local/bin/find_skymodel_cal.py", line 26, in grab_pointing
[ra, dec] = pt.table(MS+'::FIELD', readonly=True, ack=False).getcol('PHASE_DIR')[0][0] * 180 / math.pi
File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 372, in __init__
Table.__init__(self, tabname, lockopt, opt)
RuntimeError: Table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS::FIELD does not exist
In this example the table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS
seems to be missing. Since in this example we make use of Docker we need to find the location of this file on our harddisk:
$ more logfile | grep "/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS"
--mount=type=bind,source=/data/L667521_SB000_uv.MS,target=/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS,readonly \
$ ls -d /data/L667521_SB000_uv.MS
ls: cannot access '/data/L667521_SB000_uv.MS': No such file or directory
So obviously we have specified a non-existing data set as an input in the LINC.json
In toil
the main logfile is written to --logFile
and logfiles from single jobs/steps are put into --writeLogs
. If a job has failed the corresponding logfile location is reported in the main logfile.
If there is no error message reported or no corresponding logfile available, check for all lines leading with ERROR
or error
to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong.
To get help on new or already known issues, please check :ref:`help` for further support and information.