-
Notifications
You must be signed in to change notification settings - Fork 205
fix: Fix data race in memory profiling #1727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1727 +/- ##
============================================
+ Coverage 56.12% 58.65% +2.53%
- Complexity 976 1138 +162
============================================
Files 119 129 +10
Lines 11743 12692 +949
Branches 2251 2375 +124
============================================
+ Hits 6591 7445 +854
- Misses 4012 4058 +46
- Partials 1140 1189 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
used -= size; | ||
long newUsed = used.addAndGet(-size); | ||
if (newUsed < 0) { | ||
logger.warn("Used memory is negative: " + newUsed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably make this an error message (or maybe even throw an exception since this is a pretty fatal condition?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this situation is likely to happen, the memeory acquired by MemoryPool.grow
may be less than the actual request. Filed #1733
self.acquire(additional) | |
.unwrap_or_else(|_| panic!("Failed to acquire {} bytes", additional)); | |
self.used.fetch_add(additional, Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to use error logging rather than a warning. I'd prefer not to throw an exception because many jobs will run to completion without issue even if this variable does go negative.
/** The id uniquely identifies the native plan this memory manager is associated to */ | ||
private final long id; | ||
|
||
private final TaskMemoryManager internal; | ||
private final NativeMemoryConsumer nativeMemoryConsumer; | ||
private long used; | ||
private final AtomicLong used = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @andygrove
Co-authored-by: Oleks V <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. (pending ci)
Which issue does this PR close?
Closes #1726
Rationale for this change
Fix a data race that caused negative memory use to be reported
What changes are included in this PR?
Use AtomicLong instead of long to track used memory.
How are these changes tested?
Manually.