LINC failing on clocktec step for calibrator
I am running LINC with toil-cwl-runner in a batch job; I've been doing this for many calibrators on two different clusters. I am finding that about 25% of the time the pipeline fails on the losoto_ion.losoto_clocktec.losoto_clocktec
step. At first I thought this might be a memory issue, but investigating the error logs (e.g., /cosma5/data/durham/dc-mora2/surveys/processing/576823/tmp0yrd197m/output.h5-losoto_err.log) shows (here's one as an example, but the same error holds true for multiple files):
source: /.inject-singularity-env.sh:21:31: a command can only contain words and redirects; encountered (
Traceback (most recent call last):
File "/usr/local/bin/losoto", line 154, in <module>
returncode += ops[ op ]._run_parser( soltab, parser, step )
File "/usr/local/lib/python3.8/dist-packages/losoto/operations/clocktec.py", line 28, in _run_parser
return run(soltab, tecsoltabOut, clocksoltabOut, offsetsoltabOut, tec3rdsoltabOut, flagBadChannels, flagCut, chi2cut, combinePol, removePhaseWraps, fit3rdorder, circular, reverse, invertOffset, nproc)
File "/usr/local/lib/python3.8/dist-packages/losoto/operations/clocktec.py", line 109, in run
result=doFit(vals,flags==0,freqs,stations,station_positions,axes,\
File "/usr/local/lib/python3.8/dist-packages/losoto/operations/_fitClockTEC.py", line 784, in doFit
if not 'LBA' in stations[0] and len(initSol) < 1:
IndexError: index 0 is out of bounds for axis 0 with size 0
and the output log shows:
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading ABS module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading BANDPASSTEC module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading CLIP module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading CLOCKTEC module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading DELETEAXIS module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading STATIONSCREEN module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading DIRECTIONSCREEN module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading DUPLICATE module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading EXAMPLE module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading FARADAY module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading FLAG module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading FLAGEXTEND module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading FLAGSTATION module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading GLOBALDELAY module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading INTERPOLATE module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading INTERPOLATEDIRECTIONS module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading LOFARBEAM module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading NORM module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading PLOT module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading PLOTSCREEN module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading POLALIGN module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading PREFACTOR_XYOFFSET module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading PREFACTOR_BANDPASS module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading REFERENCE module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading REPLICATEONAXIS module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading RESET module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading RESIDUALS module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading REWEIGHT module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading SCREENVALUES module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading SMOOTH module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading SPLITLEAK module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading STRUCTURE module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading TEC module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading TECJUMP module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mLoading TECSINGLEFREQ module.^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mAppending to output.h5.^[[0m
2023-05-17 02:41:03 - INFO - ^[[32m--> Starting 'CLOCKTEC' step (operation: CLOCKTEC).^[[0m
2023-05-17 02:41:03 - DEBUG - ^[[35mNo lofar.expion present, clock/tec separation maybe not optimal^[[0m
2023-05-17 02:41:03 - INFO - ^[[32mClock/TEC separation on soltab: phase000^[[0m
2023-05-17 02:41:04 - DEBUG - ^[[35m0 selected stations: []^[[0m
2023-05-17 02:41:04 - DEBUG - ^[[35mStation indices: [False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False] RS []^[[0m
Closing remaining open files:
output.h5...
which seems like it might not be a memory issue. Any ideas on what might be going on or how to troubleshoot?
cheers Leah