H5ParmPredict step sometime freeze
Created by: flomertens
I use H5ParmPredict to subtract source model in several directions. The parset is typically:
steps=[sub] sub.type=h5parmpredict sub.sourcedb= sub.applycal.correction=fulljones sub.applycal.parmdb= sub.directions= sub.operation=subtract
I work with many small 10 min snapshots MS, and so I need to make many call of DPPP and sometime a few DPPP call per node are running simultaneously (but always on different MSs of course). It happens sometimes that one of this call get stuck and I need to kill them manually.
This is the gdb backtrace of one of this stuck process:
(gdb) info threads Id Target Id Frame
- 1 Thread 0x7fe51115f480 (LWP 49769) "DPPP" 0x00007fe5062e5705 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
(gdb) bt #0 0x00007fe5062e5705 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 (closed) 0x00007fe5060849ac in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib64/libstdc++.so.6 #2 (closed) 0x0000000000542abd in DP3::ThreadPool::write(unsigned long, std::function<void (unsigned long)>&&, unsigned long*) () #3 (closed) 0x00000000005402df in DP3::DPPP::Predict::process(DP3::DPPP::DPBuffer const&) () #4 0x0000000000540765 in DP3::DPPP::Predict::process(DP3::DPPP::DPBuffer const&) () #5 0x000000000055df98 in DP3::DPPP::H5ParmPredict::process(DP3::DPPP::DPBuffer const&) () #6 (closed) 0x000000000049b89f in DP3::DPPP::MSReader::process(DP3::DPPP::DPBuffer const&) () #7 0x000000000046a076 in DP3::DPPP::DPRun::execute(std::string const&, int, char**) () #8 (closed) 0x000000000045b502 in main ()
I don't have many more information now, but if you need anything in particular, I could try to catch it the next time this happens to me.