Skip to content

Investigate unstable benchmark results on macOS #1648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andygrove opened this issue Apr 14, 2025 · 17 comments · Fixed by #1647 or #1679
Closed

Investigate unstable benchmark results on macOS #1648

andygrove opened this issue Apr 14, 2025 · 17 comments · Fixed by #1647 or #1679
Assignees
Labels
bug Something isn't working
Milestone

Comments

@andygrove
Copy link
Member

Describe the bug

I am running TPC-H benchmarks on macOS using the instructions in #1647.

q13 sometimes takes 7 seconds and sometimes takes more than 5 minutes when Comet is enabled. I do not see any spilling. I am testing with Comet with the DataFusion 47 upgrade from #1563

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@andygrove andygrove added the bug Something isn't working label Apr 14, 2025
@andygrove andygrove self-assigned this Apr 14, 2025
@andygrove andygrove added this to the 0.8.0 milestone Apr 14, 2025
@andygrove
Copy link
Member Author

This comparison of two runs suggests that the issue is with a hash aggregate.

Image

@mbutrovich
Copy link
Contributor

I wonder if it's related to the Vec resize issues that @Kontinuation was seeing in #1511 (comment)

@parthchandra
Copy link
Contributor

Why wouldn't we see the issue every time?

@andygrove
Copy link
Member Author

The issue seems at least partly due to a mix of performance and efficiency cores being used, which we do not have any control over.

@andygrove
Copy link
Member Author

thermal management (CPU throttling) could also be a factor

@andygrove
Copy link
Member Author

andygrove commented Apr 15, 2025

I switched from 100 GB to 10 GB data set and the situation is even worse. q13 hangs and there are no stats at all in the plan.

edit: it did eventually complete, and another run was fast, so this is just more of the same instability

Image

@kazuyukitanimura
Copy link
Contributor

The plan is to put a disclaimer

@andygrove andygrove modified the milestones: 0.8.0, 0.9.0 Apr 17, 2025
@mbutrovich
Copy link
Contributor

I profiled it and we're getting crushed in OS mutexes in the allocator. I noticed there's some support for mimalloc in Comet already that's not really documented. DF enables it by default for its benchmarks. I am testing a build with make release COMET_FEATURES=mimalloc to see if the issue goes away.

Image

@mbutrovich
Copy link
Contributor

Anecdotally, my performance issues on macOS disappear when using mimalloc. We should have a larger discussion about where third-party allocators fit into the picture of optimizing Comet (off-heap, etc.).

@parthchandra
Copy link
Contributor

That's a great find. Arrow Cpp discussion: https://lists.apache.org/thread/dts9ggvkthczfpmd25wrz449mxod76o2
Also, I always thought we were using mimalloc (but never actually checked)!

@andygrove
Copy link
Member Author

For my local Linux benchmark, I saw performance improve from ~275 s to ~266 s when using mimalloc

@Dandandan
Copy link

FYI: We found mimalloc to be unstable for running long term (keeps memory allocated but seems doesn't release it over time => running into OOMs much more easily). Switching to jemalloc fixed this.

@Kontinuation
Copy link
Member

We prefer to use jemalloc with Comet by setting LD_PRELOAD, as it returns freed memory to the operating system more aggressively than ptmalloc, resulting in a more predictable RSS for the Spark executor process. Additionally, jemalloc's memory profiler is helpful for diagnosing OOM issues.

However, I'm uncertain whether jemalloc can still be used with LD_PRELOAD when Comet is compiled with mimalloc. The last time I attempted to dynamically override malloc it didn't work.

@mbutrovich
Copy link
Contributor

I've also only ever used jemalloc in the past. I'm not sure what the discussion was at the time to pursue mimalloc for Comet. @mdcallag has some recent writing on the topic, althrough RocksDB is a different workload than what we're doing:

https://smalldatum.blogspot.com/2025/04/battle-of-mallocators.html
https://smalldatum.blogspot.com/2025/04/battle-of-mallocators-part-2.html

The Germans 🇩🇪 did a bake-off a few years ago and they chose jemalloc for Umbra (and likely CedarDB): https://www.adms-conf.org/2019-camera-ready/durner_adms19.pdf

@mbutrovich
Copy link
Contributor

However, I'm uncertain whether jemalloc can still be used with LD_PRELOAD when Comet is compiled with mimalloc. The last time I attempted to dynamically override malloc it didn't work.

Yeah I don't think you want to mix and match like that.

Does LD_PRELOAD also change the allocator for the JVM?

@Kontinuation
Copy link
Member

Does LD_PRELOAD also change the allocator for the JVM?

Memory allocated by Unsafe_AllocateMemory0 (Arrow native memory, Spark off-heap memory) uses the allocator. The large JVM heap regions seems to be directly allocated using syscalls such as mmap, which is not affected by allocator.

@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
7 participants