-
Notifications
You must be signed in to change notification settings - Fork 13
Debugging tools
Debug tools are configured from src/headers/debug.h
, by defining precompiler macros. They then execute some tests at check points that are written throughout the code.
The check points are in the main loop, in the Thor and ProfX functions, checking values between different steps of the simulation. As we have loops, small steps and big steps, we mark them with loops within loops, that are given as first argument.
Check points look like this:
BENCH_POINT_I_S(current_step, rk, "Compute_Temperature_H_Pt_Geff", (), ("temperature_d", "h_d", "hh_d", "pt_d", "pth_d", "gtil_d", "gtilh_d"))
The arguments are:
- 1, 2 or 3 arguments (depending on using
BENCH_POINT_I
,BENCH_POINT_I_S
,BENCH_POINT_I_SS
describing the level in the update loop. (first iteration, sub-iteration and sub-sub-iteration) - String argument describing the step in update loop. (e.g. "RK2": second Runge Kutta step, "Vertical_Eq": vertical equilibrium computation.)
- vector of string describing the arguments that are inputs to next step of simulation (currently unused, but can get stored to make comparisons between input and outputs)
- vector of string describing the arguments that are outputs of the previous step of the simulation (before the call to the debug function)
It then runs various checks depending on the other debug flags enabled.
The debug tools can dump the intermediate state of arrays from the simulation for each timesteps. And then in next runs, compare the computed values to the ones in the stored files. This helps to check consistency when making code changes that shouldn't impact computation.
To use the binary comparison file:
- enable debug tools, in
src/headers/debug.h
, uncomment#define BENCHMARKING
Dump the reference files:
- run THOR with the
--binwrite
option - the intermediate states go to the output directory, in the
ref
subdirectory. - the output files go now to the output directory, in the
write
subdirectory. - compile
- run this saves reference files to the reference path.
Then, apply the code changes you want to test, and enable dump comparison:
- run THOR with the
--bincompare
option - the intermediate states go to the output directory, in the
ref
subdirectory. - the output files go now to the output directory, in the
compare
subdirectory. - compile
- run
This will print out data when the stored value isn't exactly equal to the computed value.
It prints the bench-point name, the iteration numbers, the short name of arrays and 1
if the array matched, 0
if they didn't match, NA
if it didn't find the array in the input file.
You can add more arrays from the dynamical core to check by putting it in the list defined in src/devel/binary_test.cpp::build_definitions( ESP & esp, Icogrid & grid)
:
// {named index, { pointer, size, long name, short name, on device }}
{"Rho_d", { esp.Rho_d, esp.nv*esp.point_num, "Density", "rho", true}},
- named index: the index used in the code to find the info for that element
- pointer: the pointer to the data (on device or host)
- size: the size of the table
- name: the name to display
- short name: the name to use in short debug summary
- on device: boolean telling if the pointer points to data on device or on host. If the data is on the device, it will copy the data to host or work on it on the device if needed.
To have more verbose output, define BENCH_PRINT_DEBUG
. This prints out all the comparisons, and not only when the comparison fail.
You can also print out statistics on errors by defining the BENCH_COMPARE_PRINT_STATISTICS
. This will print statistics on tables with errors, (covering the failed values, when we talk about max and mean)
- absolute deviation: max and mean
- relative deviation: max and mean
- reference value: max and mean
- absolute value: max and mean
By defining BENCH_COMPARE_USE_EPSILON
and an epsilon value for BENCH_COMPARE_EPSILON_VALUE
, it will try to compare the relative difference abs(val - ref)/abs(ref)
to the epsilon value instead of an exact comparison.
Exact comparison is useful for bitwise comparison, but if algorithms are changed, the exact value will change but can stay close to the reference. For these tests, the fuzzy compare can be useful.
The debug tools can help for debugging and tracing, by running checks and printing out results at each check points. The checks can be enabled in debug.h
with BENCHMARKING
and the check flags.
-
BENCH_NAN_CHECK
: runs NaN checks on the list of input arrays and prints out arrays that contain NaNs.- new to v2.1: when NaNs are detected, the code now outputs text files in the output directory in a subdirectory
crash
. These text files contain the grid indices and locations of the detected NaNs for each array (note that some files may be empty, as typically only one array contains NaNs)
- new to v2.1: when NaNs are detected, the code now outputs text files in the output directory in a subdirectory
-
BENCH_CHECK_LAST_CUDA_ERROR
: checks for cuda error codes and prints them out with the name of the
You can add more checks by adding flags in src/headers/debug.h
and the code in src/devel/binary_test.cpp
in the function binary_test::check_data()
with an appropriate #ifdef
clause.
The function gets some text for the iteration count and the name of the steps, and a vector of tables to work on. It computes the definitions of tables (with pointers and sizes) to work on at the top.
Profiling can be run with NVidia's NVProf tool.
Note that the profiling tool installation changed and you might run into some access permission issues