From f74f5528b12184c67e40a7a289d79a9b82ac7d46 Mon Sep 17 00:00:00 2001 From: Bram Veenboer <bram.veenboer@gmail.com> Date: Mon, 26 Apr 2021 09:43:21 +0000 Subject: [PATCH] COB-121: Update PerformanceTest.md for tSubbandProcPerformance --- .../doc/performanceTest/PerformanceTest.md | 83 ++++--------------- 1 file changed, 18 insertions(+), 65 deletions(-) diff --git a/RTCP/Cobalt/GPUProc/doc/performanceTest/PerformanceTest.md b/RTCP/Cobalt/GPUProc/doc/performanceTest/PerformanceTest.md index 170927b9041..7be0a0e8695 100644 --- a/RTCP/Cobalt/GPUProc/doc/performanceTest/PerformanceTest.md +++ b/RTCP/Cobalt/GPUProc/doc/performanceTest/PerformanceTest.md @@ -1,73 +1,26 @@ # Goal -We run a set of representative benchmarks to test performance regression of the kernels for the different pipelines. As we only aim to test a representative parameter space for the kernels it seems best to run a set of pipeline tests with representative use cases for LOFAR and LOFAR Mega Mode. Tests are specified through the LOFAR parsets and are passed to the gpu_load test that sets up a relative simple pipeline with performance benchmarking enabled. +We run a set of representative benchmarks to test performance regression of the kernels for the different pipelines. As we only aim to test a representative parameter space for the kernels it seems best to run a set of pipeline tests with representative use cases for LOFAR and LOFAR Mega Mode. Tests are specified through the LOFAR parsets and are passed to the `gpu_load` test that sets up a relative simple pipeline with performance benchmarking enabled. -# Run benchmarks +# The `tSubbandProcPerformance` test +Start the performance test using the `ctest` framework: ``` -INPUT: (required) location of gpu_load binary, - (required) location of source parsets, - (required) root directory for results, - (optional) specify [int] number of iterations for gpu_load, default is 1000 iterations - (optional) enable storing of parsets, - (optional) select specific parset(s) - (optional) disable processing (e.g. dry run) - -Take as test suites a list of: <parset-name><observation ID> - -FOR EACH test suite P in a list of test suites (P1 through Pn): - Take the base parset - SET Cobalt.Benchmark.file=<benchmark file path>/<P.name>.csv - IF storing of parsets is enabled store modified parset to <path>/<P.name>.parset - IF processing is enabled - blocking call gpu_load with parset as argument, results are written (by gpu_load) to <benchmark file path>/<P.name>.csv - ENDIF -END - -Optionally later add test cases (variation of parset parameters) -FOR EACH test suite P in a list of test suites (P1 through Pn): - Take its base parset (observation ID) or a minimal version of that (TBD) - FOR EACH test case C in the test suite - Modify the parset with use case specific parameters - SET Cobalt.Benchmark.file=<benchmark file path>/<P.name><C.name>.csv - IF storing of parsets is enabled store modified parset to <path>/<P.name><C.name>.parset - IF processing is enabled - blocking call gpu_load with parset as argument, results are written (by gpu_load) to <benchmark file path>/<P.name><C.name>.csv - ENDIF - END -END +ctest -V -R tSubbandProcPerformance ``` +This will run all the `tSubbandProcPerformance_*.parset` in `test/SubbandProc`, and compare the results with the reference output in `tSubbandProcPerformance_reference`. Note that these parsets do not yet contain the `Cobalt.Benchmark` keys, these are filled in during the test. -# Analyze benchmarks -``` -INPUT: (required) root directory with benchmark results +The format of the reference output is `<OBSID>_<GPUNAME>.csv`. The GPU name +should match the name reported by `nvidia-smi`, with spaces replaced with dashes. E.g. "Tesla V100-PCIE-16GB" becomes "Tesla-V100-PCIE-16GB". In case no reference output is found (for instance, when running on a system with different GPUs), the test is skipped. -FOR EACH file in <benchmark file path> (list of files <P.name>.csv) - Read the file - FOR EACH KernelName in the file - Filter for "PerformanceCounter" and list the performance statistics - <File name (test ID) > ; <kernel name> ; <statistics> - Construct kernelIndex from test ID and kernel name - Write in a separate file: <kernelIndex> ; <mean> - END -END -``` -Example of input that we are filtering for: -``` -format; kernelName; count; mean; stDev; min; max; unit -PerformanceCounter; output (incoherent); 10; 0.28483; 0.01499; 0.24310; 0.29318; ms -``` -# Compare analysis -``` -INPUT: (required) file with reference mean values (output format from analysis script) - (required) file with current mean values (output format from analysis script) +One can add new or update existing reference files as follows: +1. Get a parset and make sure that the `OBSID` is either unique (for new tests), or matches the `OBSID` of an existing `tSubbandProcPerformance_*.parset` (when updating an existing reference output). +2. Add the `Cobalt.Benchmark` keys to enable dumping of the performance counter data to a CSV file. +3. Run `gpu_load` with the new parset. +4. Rename the resulting CSV file as described above and copy it to the `tSubbandProcPerformance_reference` directory. -Construct a dict with kernelIndex and mean values for both files -FOR EACH key in the reference dict - IF key also exists in the current dict - Calculate diff and write output as <kernelIndex> ; <referece mean> ; <current mean> ; <difference of mean values> - END -END -``` +The `tSubbandProcPerformance_compare.py` script compares two CSV files and has a configurable tolerance. E.g. if the tolerance is 10%, a measurement is considered as PASS as it is no more than 10% slower than the reference. The benchmark considers measurements running faster than the reference as PASS as well. This can for instance happen when a kernel is optimized. + +It is adviced to add new reference files seperately from any code-changes (that may have caused performance differences). This way, one can not (accidentally) introduce a performance regression. -Optionally extend with: -* Calculation of % real-time -* Compare with performance (golden) reference file, pass fail output to top script for CI pipeline \ No newline at end of file +This test has two caveats: +1. Some timings are unreliable, as the kernels run to short to provide a meaningfull measurement. Therefore, any timing of less than 5% of the total runtime is ignored. This is currently a hard-coded value in the comparison script. +2. When using an LMM parset, performance counters with the same names are likely to co-exist. This case is not yet taken into account in the benchmark. \ No newline at end of file -- GitLab