-
Notifications
You must be signed in to change notification settings - Fork 203
Investigate unstable benchmark results on macOS #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wonder if it's related to the Vec resize issues that @Kontinuation was seeing in #1511 (comment) |
Why wouldn't we see the issue every time? |
The issue seems at least partly due to a mix of performance and efficiency cores being used, which we do not have any control over. |
thermal management (CPU throttling) could also be a factor |
The plan is to put a disclaimer |
I profiled it and we're getting crushed in OS mutexes in the allocator. I noticed there's some support for mimalloc in Comet already that's not really documented. DF enables it by default for its benchmarks. I am testing a build with |
Anecdotally, my performance issues on macOS disappear when using mimalloc. We should have a larger discussion about where third-party allocators fit into the picture of optimizing Comet (off-heap, etc.). |
That's a great find. Arrow Cpp discussion: https://lists.apache.org/thread/dts9ggvkthczfpmd25wrz449mxod76o2 |
For my local Linux benchmark, I saw performance improve from ~275 s to ~266 s when using mimalloc |
FYI: We found mimalloc to be unstable for running long term (keeps memory allocated but seems doesn't release it over time => running into OOMs much more easily). Switching to jemalloc fixed this. |
We prefer to use jemalloc with Comet by setting LD_PRELOAD, as it returns freed memory to the operating system more aggressively than ptmalloc, resulting in a more predictable RSS for the Spark executor process. Additionally, jemalloc's memory profiler is helpful for diagnosing OOM issues. However, I'm uncertain whether jemalloc can still be used with LD_PRELOAD when Comet is compiled with mimalloc. The last time I attempted to dynamically override malloc it didn't work. |
I've also only ever used jemalloc in the past. I'm not sure what the discussion was at the time to pursue mimalloc for Comet. @mdcallag has some recent writing on the topic, althrough RocksDB is a different workload than what we're doing: https://smalldatum.blogspot.com/2025/04/battle-of-mallocators.html The Germans 🇩🇪 did a bake-off a few years ago and they chose jemalloc for Umbra (and likely CedarDB): https://www.adms-conf.org/2019-camera-ready/durner_adms19.pdf |
Yeah I don't think you want to mix and match like that. Does LD_PRELOAD also change the allocator for the JVM? |
Memory allocated by |
Related discord thread: https://discord.com/channels/885562378132000778/1363995762182193373 |
Describe the bug
I am running TPC-H benchmarks on macOS using the instructions in #1647.
q13 sometimes takes 7 seconds and sometimes takes more than 5 minutes when Comet is enabled. I do not see any spilling. I am testing with Comet with the DataFusion 47 upgrade from #1563
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: