cargo: build with frame pointers #10226

erikgrinaker · 2024-12-22T22:01:30Z

Problem

Frame pointers are typically disabled by default (depending on CPU architecture), to improve performance. This frees up a CPU register, and avoids a couple of instructions per function call. However, it makes stack unwinding much more inefficient, since it has to use DWARF debug information instead, and gives worse results with e.g. perf and eBPF profiles. The backtrace implementation of libunwind is also suspected to cause seg faults.

The performance benefit of frame pointer omission doesn't appear to matter that much on modern 64-bit CPU architectures (which have plenty of registers and optimized instruction execution), and benchmarks did not show measurable overhead.

The Rust standard library and jemalloc already enable frame pointers by default.

For more information, see https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html.

Resolves #10224.
Touches #10225.

Summary of changes

Enable frame pointers in all builds, and use frame pointers for pprof-rs stack sampling.

github-actions · 2024-12-22T22:59:43Z

7227 tests run: 6875 passed, 0 failed, 352 skipped (full report)

Flaky tests (5)

Postgres 17

test_compute_pageserver_connection_stress: debug-x86-64
test_physical_replication_config_mismatch_max_locks_per_transaction: release-arm64
test_timeline_archive[4]: release-arm64

Postgres 16

test_lr_with_slow_safekeeper: release-arm64
test_physical_replication_config_mismatch_max_locks_per_transaction: release-x86-64

Code coverage* (full report)

functions: 31.2% (8412 of 26948 functions)
lines: 48.0% (66780 of 139204 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
e180c0b at 2025-01-04T15:21:22.545Z :recycle:}

erikgrinaker · 2024-12-29T15:02:30Z

I ran some benchmarks for WalStreamDecoder::complete_record() on a Linux amd64 box. This isn't necessarily representative (it's mostly very optimized CRC32 checksumming), but it is a CPU-bound hot path that doesn't involve allocations or IO. The results don't show significant overhead -- there is a small amount of variation though, so I ran each benchmark for 30 seconds multiple times, with similar results.

complete_record/size=64 time:   [41.561 ns 41.891 ns 42.152 ns]
                        thrpt:  [1.4141 GiB/s 1.4229 GiB/s 1.4341 GiB/s]
                 change:
                        time:   [+0.7893% +1.1356% +1.4278%] (p = 0.00 < 0.05)
                        thrpt:  [-1.4077% -1.1228% -0.7831%]

complete_record/size=1024
                        time:   [144.64 ns 144.73 ns 144.83 ns]
                        thrpt:  [6.5847 GiB/s 6.5893 GiB/s 6.5935 GiB/s]
                 change:
                        time:   [-3.4100% -3.3531% -3.2898%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4017% +3.4694% +3.5303%]

complete_record/size=8192
                        time:   [965.16 ns 965.27 ns 965.39 ns]
                        thrpt:  [7.9029 GiB/s 7.9039 GiB/s 7.9048 GiB/s]
                 change:
                        time:   [+0.4465% +0.4593% +0.4730%] (p = 0.00 < 0.05)
                        thrpt:  [-0.4708% -0.4572% -0.4445%]

complete_record/size=131072
                        time:   [13.298 µs 13.352 µs 13.425 µs]
                        thrpt:  [9.0927 GiB/s 9.1425 GiB/s 9.1799 GiB/s]
                 change:
                        time:   [+5.9949% +6.8990% +7.8233%] (p = 0.00 < 0.05)
                        thrpt:  [-7.2557% -6.4537% -5.6558%]

erikgrinaker · 2024-12-29T16:12:38Z

Interestingly, the frame-pointer feature of pprof-rs is an order of magnitude slower than just using libunwind without frame pointers (11 µs vs. 1.4 µs). libunwind performance did not change with or without frame pointers.

This was using a benchmark of pprof-rs TraceImpl::trace() with a stack depth of 40 on Linux.

This probably isn't worth it then. There was some hope that it might resolve #10225, as seen in grafana/pyroscope-rs#124, but if it requires the frame-pointer feature then it'll cause a 10x slowdown of traces which isn't acceptable.

erikgrinaker · 2025-01-04T14:16:07Z

Given the frequency with which we're seeing seg faults in staging, even with heap profiling disabled, maybe this is worth a shot -- at the very least, to see if it resolves the seg faults.

skyzh · 2025-01-06T18:59:30Z

neon/Dockerfile

Lines 47 to 58 in 95f1920

    
           RUN set -e \ 
        
               && PQ_LIB_DIR=$(pwd)/pg_install/v${STABLE_PG_VERSION}/lib RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment ${ADDITIONAL_RUSTFLAGS}" cargo build \ 
        
                 --bin pg_sni_router  \ 
        
                 --bin pageserver  \ 
        
                 --bin pagectl  \ 
        
                 --bin safekeeper  \ 
        
                 --bin storage_broker  \ 
        
                 --bin storage_controller  \ 
        
                 --bin proxy  \ 
        
                 --bin neon_local \ 
        
                 --bin storage_scrubber \ 
        
                 --locked --release

Do we also need to add it to the dockerfile or the flags will be combined?

erikgrinaker · 2025-01-06T19:06:44Z

Do we also need to add it to the dockerfile or the flags will be combined?

Good catch, the environment variable will override them. I'll submit a followup.

See #10226 (comment).

erikgrinaker changed the title ~~cargo: build with fram pointers~~ cargo: build with frame pointers Dec 22, 2024

cargo: build with frame pointers

de36026

erikgrinaker force-pushed the erik/frame-pointer branch from c89b06c to de36026 Compare December 29, 2024 11:55

erikgrinaker closed this Dec 29, 2024

erikgrinaker mentioned this pull request Dec 29, 2024

Build with frame pointers for improved profiling #10224

Closed

erikgrinaker reopened this Jan 4, 2025

erikgrinaker marked this pull request as ready for review January 4, 2025 14:20

Merge branch 'main' into erik/frame-pointer

e180c0b

erikgrinaker requested review from a team and jcsp and removed request for a team January 4, 2025 14:20

jcsp approved these changes Jan 6, 2025

View reviewed changes

jcsp added this pull request to the merge queue Jan 6, 2025

Merged via the queue into main with commit 95f1920 Jan 6, 2025
82 checks passed

jcsp deleted the erik/frame-pointer branch January 6, 2025 17:28

erikgrinaker mentioned this pull request Jan 6, 2025

Dockerfile: build with force-frame-pointers=yes #10286

Merged

github-merge-queue bot pushed a commit that referenced this pull request Jan 6, 2025

Dockerfile: build with force-frame-pointers=yes (#10286)

b342a02

See #10226 (comment).

erikgrinaker mentioned this pull request Jan 7, 2025

pprof-rs seg faults #10225

Closed

cloneable mentioned this pull request Jan 9, 2025

Proxy release 2025-01-09 #10320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cargo: build with frame pointers #10226

cargo: build with frame pointers #10226

erikgrinaker commented Dec 22, 2024 •

edited

Loading

github-actions bot commented Dec 22, 2024 •

edited

Loading

Postgres 17

Postgres 16

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Jan 4, 2025

skyzh commented Jan 6, 2025

erikgrinaker commented Jan 6, 2025

cargo: build with frame pointers #10226

cargo: build with frame pointers #10226

Conversation

erikgrinaker commented Dec 22, 2024 • edited Loading

Problem

Summary of changes

github-actions bot commented Dec 22, 2024 • edited Loading

7227 tests run: 6875 passed, 0 failed, 352 skipped (full report)

Postgres 17

Postgres 16

Code coverage* (full report)

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Jan 4, 2025

skyzh commented Jan 6, 2025

erikgrinaker commented Jan 6, 2025

erikgrinaker commented Dec 22, 2024 •

edited

Loading

github-actions bot commented Dec 22, 2024 •

edited

Loading