MERL-141: Fix undefined behaviour in Python API execute and execute_from_command_line

Issue description

A few weeks ago, I started using the Python API's execute_from_command_line in the SKA batch pre-processing pipeline (BPP), instead of subprocess calls. BPP generates DP3 commands in various contexts, one of them being to split the work along the frequency dimension using msin.startchan and msin.nchan; the work is divided in chunks and msin.nchan has the same value for all of them.

I ran into a mysterious issue a few days ago where DP3 would successfully parse all command line arguments except msin.nchan, which would be then set to the default of 0, which would then obviously lead to the work being incorrectly split in frequency. There were two symptoms:

  1. msin.nchan is silently set to 0
  2. This warning appears
*** WARNING: the following parset keywords were not used ***
             maybe they are misspelled
    - pe

The list of arguments passed to DP3 was obviously correct (they are logged by BPP), and there was no pe argument. But somehow DP3 received something different.

Minimum reproducible example

Reproducibility is always tricky with UB (see below), but this example below seems to be a reliable one.

import dp3

# Actual data paths do not matter, number of arguments does
arglist = [
    "checkparset=1",
    "steps=[]",
    "msin=/datasets/mkt_ecdfs25_nano.ms",
    "msin.startchan=1",
    "msin.nchan=2",
    "msout=test.ms",
    "msout.overwrite=true",
]

dp3.execute_from_command_line(arglist)

After adding code to debug print all command line args in dp3::base::ExecuteFromCommandLine this produces:

ARGUMENT LIST:
kt_ecdfs25_nano.ms
`Qd
msin=/datasets/mkt_ecdfs25_nano.ms
msin.startchan=1
msin.nchan=2
msout=test.ms
msout.overwrite=true
END ARGUMENT LIST

The first argument should be "DP3" and the second "checkparset=1", but we get some mangled output. The contents of the second argument changes between executions.

Cause

The problem is in how the list of argument strings received from Python is converted to a sequence of char *, which is what dp3::base::ExecuteFromCommandLine wants. The relevant function is in pydp3.cc. These added comments below explain what is going on.

// This is called in two places
// In both cases, an empty `argv_strings` is defined by the caller and passed in
std::vector<char *> pylist_to_char_array(
    py::list argv_list, std::vector<std::string> &argv_strings) {
  std::vector<char *> argv;

  for (auto &arg : argv_list) {
	// argv_strings may get dynamically resized and its members moved to a new memory location
    argv_strings.push_back(arg.cast<std::string>());
    
    // ... and if that happens, the raw pointers in already present in argv
    // may now refer to invalid memory locations!
    argv.push_back(argv_strings.back().data());
  }

  return argv;
}

Solution

Refactored so that DP3's main argument parsing code path now has interfaces that are tied to std::vector<std::string> . The code path is that is tied to (int argc, char* argv[]) still exists and is unchanged, but instead goes through the new path after handling the conversion char* argv[] to vector<string> safely. That should prevent any future re-occurrence of the problem.

Affected versions

The issue affects DP3 versions locally compiled and versions built in SKA's spack environment, which is where the issue was first detected.

The issue does NOT affect the dp3 Python packages on PyPI, which is the one that runs within BPP's Gitlab CI pipeline, and that explains why the issue was not caught much earlier; I suspect this is because the Python wheels are built with more portable compilation flags, and that somehow affects the dynamic reallocation behaviour of push_back.

Edited by Vincent Morello

Merge request reports

Loading