Skip to content
nabajour edited this page Jul 10, 2020 · 5 revisions

debugging tools

Overview

Debug tools are configured from src/headers/debug.h, by defining precompiler macros. They then execute some tests at check points that are written throughout the code.

The check points are in the main loop, in the Thor and ProfX functions, checking values between different steps of the simulation. As we have loops, small steps and big steps, we mark them with loops within loops, that are given as first argument.

Check points look like this:

BENCH_POINT_I_S(current_step, rk, "Compute_Temperature_H_Pt_Geff", (), ("temperature_d", "h_d", "hh_d", "pt_d", "pth_d", "gtil_d", "gtilh_d"))

The arguments are:

  • 1, 2 or 3 arguments (depending on using BENCH_POINT_I, BENCH_POINT_I_S, BENCH_POINT_I_SS describing the level in the update loop. (first iteration, sub-iteration and sub-sub-iteration)
  • String argument describing the step in update loop. (e.g. "RK2": second Runge Kutta step, "Vertical_Eq": vertical equilibrium computation.)
  • vector of string describing the arguments that are inputs to next step of simulation (currently unused, but can get stored to make comparisons between input and outputs)
  • vector of string describing the arguments that are outputs of the previous step of the simulation (before the call to the debug function)

It then runs various checks depending on the other debug flags enabled.

binary comparison

The debug tools can dump the intermediate state of arrays from the simulation for each timesteps. And then in next runs, compare the computed values to the ones in the stored files. This helps to check consistency when making code changes that shouldn't impact computation.

To use the binary comparison file:

  • enable debug tools, in src/headers/debug.h, uncomment #define BENCHMARKING

Dump the reference files:

  • run THOR with the --binwrite option
  • the intermediate states go to the output directory, in the ref subdirectory.
  • the output files go now to the output directory, in the write subdirectory.
  • compile
  • run this saves reference files to the reference path.

Then, apply the code changes you want to test, and enable dump comparison:

  • run THOR with the --bincompare option
  • the intermediate states go to the output directory, in the ref subdirectory.
  • the output files go now to the output directory, in the compare subdirectory.
  • compile
  • run

This will print out data when the stored value isn't exactly equal to the computed value. It prints the bench-point name, the iteration numbers, the short name of arrays and 1 if the array matched, 0 if they didn't match, NA if it didn't find the array in the input file.

You can add more arrays from the dynamical core to check by putting it in the list defined in src/devel/binary_test.cpp::build_definitions( ESP & esp, Icogrid & grid):

// {named index, { pointer, size, long name, short name, on device }}
 {"Rho_d",         { esp.Rho_d,         esp.nv*esp.point_num,   "Density", "rho", true}},
  • named index: the index used in the code to find the info for that element
  • pointer: the pointer to the data (on device or host)
  • size: the size of the table
  • name: the name to display
  • short name: the name to use in short debug summary
  • on device: boolean telling if the pointer points to data on device or on host. If the data is on the device, it will copy the data to host or work on it on the device if needed.
Additional debug output

To have more verbose output, define BENCH_PRINT_DEBUG. This prints out all the comparisons, and not only when the comparison fail.

You can also print out statistics on errors by defining the BENCH_COMPARE_PRINT_STATISTICS. This will print statistics on tables with errors, (covering the failed values, when we talk about max and mean)

  • absolute deviation: max and mean
  • relative deviation: max and mean
  • reference value: max and mean
  • absolute value: max and mean
Fuzzy compare

By defining BENCH_COMPARE_USE_EPSILON and an epsilon value for BENCH_COMPARE_EPSILON_VALUE, it will try to compare the relative difference abs(val - ref)/abs(ref) to the epsilon value instead of an exact comparison.

Exact comparison is useful for bitwise comparison, but if algorithms are changed, the exact value will change but can stay close to the reference. For these tests, the fuzzy compare can be useful.

tracing

The debug tools can help for debugging and tracing, by running checks and printing out results at each check points. The checks can be enabled in debug.h with BENCHMARKING and the check flags.

  • BENCH_NAN_CHECK: runs NaN checks on the list of input arrays and prints out arrays that contain NaNs.
    • new to v2.1: when NaNs are detected, the code now outputs text files in the output directory in a subdirectory crash. These text files contain the grid indices and locations of the detected NaNs for each array (note that some files may be empty, as typically only one array contains NaNs)
  • BENCH_CHECK_LAST_CUDA_ERROR: checks for cuda error codes and prints them out with the name of the

You can add more checks by adding flags in src/headers/debug.h and the code in src/devel/binary_test.cpp in the function binary_test::check_data() with an appropriate #ifdef clause. The function gets some text for the iteration count and the name of the steps, and a vector of tables to work on. It computes the definitions of tables (with pointers and sizes) to work on at the top.

Profiling

Profiling can be run with NVidia's NVProf tool.

Note that the profiling tool installation changed and you might run into some access permission issues