Solver threading tweaks
Merge request !1348 (merged) changed the threading approach of DDECal's iterative solvers. While this improved some use-cases, other use-cases were significantly slowed down, as e.g. reported by Francesco and Reinout. The difference in performance is amplified by how much data is in one "solve": in Francesco's solve this was 8 timesteps and this takes less than a second, whereas the optimizations were done for a long/slow solve with many 30-50 directions, which takes several of minutes. In the latter case, the overhead of threads synchronization is insignificant, whereas in Francesco's case this is a considerable factor.
This MR makes the approach conditional: if there are enough channel blocks available to keep all cpus busy, no nested parallelization is done.
Depends on https://gitlab.com/aroffringa/aocommon/-/merge_requests/182 , which should be merged first.