Starting a pipeline
Note
If you are running the deprecated genericpipeline version of the pipeline (prefactor 3.2 or older), please check the :doc:`old instrunctions page<running_old>`.
Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with cwltool
or toil
for the HBA calibrator pipeline:
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json
$ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json
where prefactor.json
is the input JSON file as described in the chapter :doc:`parset` and <install_dir>
the location of the prefactor CWL description files.
Note
Instead of specifying all options in prefactor.json
the user can also use command line options to override the defaults.
For standard LOFAR observations there are workflows available with pre-defined parameters (defaults) for HBA and LBA observations:
prefactor workflow | HBA | LBA |
prefactor_calibrator.cwl |
HBA_calibrator.cwl |
LBA_calibrator.cwl |
prefactor_target.cwl |
HBA_target.cwl |
LBA_target.cwl |
Note
The LBA workflows are not (yet) available.
Pipeline options for cwltool
The following <cwl_options>
are recommended to use for running prefactor with cwltool
:
- ---outdir: specifies the location of the final pipeline output directory (results)
- ---tmpdir-prefix: specifies the location of the intermediate data products
- ---parallel: jobs will run in parallel
- ---singularity: use Singularity instead of Docker
- ---no-container: don't use Docker container (only for manual installation)
- ---preserve-entire-environment: use system environment variables (only for manual installation)
- ---debug: more verbose output (use only for debugging the pipeline)
Note
cwltool
has no option to resume a failed/crashed run. If you need this option have a look at toil
.
While the pipeline runs, the terminal will output the current state of the pipeline. It is also possible to pipe this output into a runtime logfile:
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl prefactor.json > logfile 2>&1
In the specified --tmpdir_prefix
all temporary folders and files are generated. At the end of the run those files can be deleted.
Pipeline options for toil
The following <cwl_options>
are recommended to use for running prefactor with toil
:
- ---workDir: specifies the location of the intermediate data products
- ---outDir: specificies the lcoation of the final data products
- ---jobStore: location of the jobStore ("statefile")
- ---writeLogs: location of the pipeline job logfiles
- ---logFile: location of the main pipeline logfile
- ---logLevel: can be CRITICAL, ERROR, WARNING, INFO or DEBUG
- ---batchSystem: use speficic batch system of an HPC cluster or similar, e.g.
slurm
orsingle_machine
- ---stats: creates runtime statistics
- ---maxLocalJobs: amount of local jobs to be run at the same time ("max_per_node")
- ---retryCount: amount of retries for each failed pipeline job
- ---singularity: use Singularity instead of Docker
- ---preserve-entire-environment: use system environment variables (only for manual installation)
- ---no-container: don't use Docker container (only for manual installation)
Stopping and restarting the pipeline
You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it).
Sometimes some of the processes that the pipeline started don't get properly terminated, so if the genericpipeline process doesn't terminate you should look for its child processes and terminate them too.
Note
If you stop and re-start pipelines a number of time then you should also check occasionally if there are orphaned children that are eating up resources on you computer.
You can restart a pipeline if using toil
through adding the parameter --restart
on the terminal. If you want to start from scratch you should delete the directory created via jobStore
and all intermediate data products (usually specified via the --workDir
parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.
Pipeline crashes
With cwltool
a pipeline crash is a reported with this message:
WARNING Final process status is permanentFail
In order to figure out at which step the pipeline failed you can search for the term permanentFail
in the toil
or cwltool
logfile:
$ more logfile | grep "Permanent Fail"
WARNING [job compare_station_list] completed permanentFail
WARNING [step compare_station_list] completed permanentFail
INFO [workflow prep] completed permanentFail
WARNING [step prep] completed permanentFail
INFO [workflow prefactor] completed permanentFail
WARNING [step prefactor] completed permanentFail
INFO [workflow ] completed permanentFail
WARNING Final process status is permanentFail
With that information it is possible to identify the failed step compare_station_list
. To find the corresponding part of the logfile where the step was launched you search for [job compare_station_list]
.
It is usually best to also check all lines leading with ERROR
to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong. See :ref:`help` for tips on interpreting the error messages.
If you identify the problem and it does not affect the products that have been already produced, you can launch the pipeline again, after correcting the issue causing the process to stop. But in most cases it might be necessary to fully re-start the pipeline.