Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add initial benchmarks for Arrow IPC Stream writer #6972

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jan 13, 2025

Which issue does this PR close?

Part of #6968

Rationale for this change

Add some initial benchmarks to help with efforts to optimize IPC reading and writing.

This is a copy of existing code that we had in Comet and is not as comprehensive as the suggestions in #6968 but it provides a starting point at least.

arrow_ipc/write_single_batch
                        time:   [20.784 µs 20.947 µs 21.091 µs]

arrow_ipc/write_multiple_batches
                        time:   [196.58 µs 199.16 µs 201.40 µs]

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 13, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @andygrove

I tried this out locally:

cargo bench --bench stream_writer

I also made a flamegraph with

 sudo flamegraph -- target/release/deps/stream_writer-a9122e21216257a4 --bench

flamegraph

I have a suggestion for improvement (don't buffer) but that could be done as a follow on

Comment on lines +38 to +41
group.bench_function("write_multiple_batches", |b| {
let batch = create_batch(8192, true);
b.iter(|| {
let buffer = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a quick flamegraph of this benchmark and a non trivial amount of time was spent resizing this buffer:

Screenshot 2025-01-14 at 10 50 06 AM

I recommend we resize / reuse it so we aren't testing the speed of the allocator

Something like this:

@alamb
Copy link
Contributor

alamb commented Jan 14, 2025

Interestingly, when I resized the output buffers reasonably in andygrove#76 and reran the flamegraph, a substantial amount of time is still spent reallocating / resizing:

Screenshot 2025-01-14 at 11 51 32 AM

Here is the whole file:
flamegraph

@alamb alamb changed the title chore: Add initial benchmarks for Arrow IPC writer chore: Add initial benchmarks for Arrow IPC Stream writer Jan 14, 2025
@tustvold
Copy link
Contributor

I believe this is waiting on andygrove#76 so marking as draft

@tustvold tustvold marked this pull request as draft January 25, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants