-
Notifications
You must be signed in to change notification settings - Fork 37
Performance profiling of Spicy parsers
This captures some notes for gathering performance statistics for Spicy parsers. The following assumes we're using Spicy and plugin bundled with Zeek.
The two main challenges when creating flame graphs is (1) compiling everything with frame pointer to get consistent stack traces, and (2) cleaning up the collected performance samples to account for Spicy's stack switching.
-
Build Zeek (and hence Spicy's runtime librart) with frame pointers:
# CFLAGS=-fno-omit-frame-pointer CXXFLAGS=-fno-omit-frame-pointer ./configure --generator=Ninja --ccache --disable-zeekctl --disable-python
If you also want to use a non-standard compiler (say, clang), add
CC=/path/to/clang CXX=/path/to/clang++
. -
We also need
malloc
and friends with frame pointers. One way to do that is using jemalloc. It's easy to compile yourself:# CFLAGS=-fno-omit-frame-pointer ./configure && make && make install
(It might even compile with frame pointers by default, not sure.)
Then inject it into processes like this:
# LD_PRELOAD=/usr/local/lib/libjemalloc.so.2
With frame pointers in place, we can now collect perf
data for Zeek:
# perf record -F 99 --call-graph fp -- zeek -Cr trace.pcap my-analyzer.hlto
To create a flame graph out of that, we need to clean up the stack traces first. Get the flame graph tools and run:
# perf script | awk '/^$/ { skip=0; } !skip { print; } /hilti::rt::detail::Callback::Callback/ { skip=1; }' | stackcollapse-perf.pl /dev/stdin >perf.folded
# flamegraph.pl perf.folded >~/tmp/perf.svg
You should now see the Spicy parsers as their stacks in the graphs, placed right on top of the main Zeek process.
Normally, when using a profiler/debugger with *.hlto
files, they won't have access to the generated C++ code, meaning they can't show source code information for a lot of what's going on. To retain the source code when building the analyzer, use spicyz
with -T
:
# TMPDIR=/dir/to/place/code spicyz -T -o my-analyzer.hlto my-analyzer.spicy my-analyzer.evt
Once https://github.com/zeek/spicy-plugin/pull/205 will have landed, you can pass that -T
to CMake when building your analyzer through the standard build setup.
# spicyz -Z -o my-analyzer.hlto my-analyzer.spicy my-analyzer.evt
Then run Zeek with Spicy::enable_profiling
to get a timing dump at the end:
# zeek -r -Cr trace.pcap my-analyzer.hlto Spicy::enable_profiling=T
#
# Profiling results
#
# name count time avg-% total-%
hlt/fiber/save-stack 2 8938 0.09 0.18
hlt/func/Foo::X::__on_x 1 4446076 87.58 87.58
hlt/func/Foo::X::__parse_Foo_X_stage2 1 4565265 89.93 89.93
hlt/func/Foo::X::__parse_stage1 1 4649733 91.59 91.59
hlt/func/Foo::X::parse3 1 4690787 92.40 92.40
hlt/func/Foo::__register_Foo_X 1 5705 0.11 0.11
hlt/func/Foo::fibo 1 4401818 86.71 86.71
hlt/func/Foo::y 88 141955 0.03 2.80
hlt/total 1 5076714 100.00 100.00
spicy/unit/Foo::X 1 4608137 90.77 90.77
spicy/unit/Foo::X::x 1 4506502 88.77 88.77
Usually the most interesting parts are the lines starting with spicy/unit
(showing execution times for parsing units and their fields), and spicy/event
(showing the time it takes to create (not process) a Zeek event). The latter isn't shown in the example.
The time values represent a high-precision internal clock with no direct meaning, so the percentages are most useful. The measure corresponding blocks of code executed. count
counts how often the blocks were executed. For recursive functions/units, only the top-level instance is counted/measured.