-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another Out-Of-Memory Error on Poseidon #453
Comments
We have another occurrence of an OOM kill just from this morning (Sep 18 06:38:45). See more details with |
Unfortunately, Poseidon's Log does not really help identify the memory leak. During the OOM kill, three actions were happening:
Accordingly, I was also not able to reproduce the issue in the staging environment. |
I feared that, but thanks for looking into it nevertheless. Besides the PR #457 you've just created, do we have a chance to analyze a core dump or (performance) profile? I am still hoping it would be possible to get a memory dump just before the memory is cleared, which might further assist us in debugging the cause. |
I haven't found an automated core dump for OOM killed processes [1]. Anyway, our custom memory dump (#457) identified the cause: Sentry Profiling [2] [3]. buckets = make([]*profileSamplesBucket, 0, int64((relativeEndNS-relativeStartNS)/uint64(profilerSamplingRate.Nanoseconds()))+1)
for start.Value != nil {
var bucket = start.Value.(*profileSamplesBucket)
if bucket.relativeTimeNS < relativeStartNS {
break
}
samplesCount += len(bucket.goIDs)
buckets = append(buckets, bucket)
start = start.Prev() My guess would be that the bug is the following: What could we do:
|
Sentry issue: POSEIDON-4H |
Ah, thanks for investigating here! I just went ahead and disabled the profile sampling (by setting |
It seems like the issue didn't occur another time, that's good! 🥳 Nevertheless, I was further able to confirm that it is indeed a reproducible problem. Using CodeOcean and our Python 3.8 exercise (with |
You're right, thanks for drawing further attention to that. With the following configuration, I was able to reproduce the behavior (10/10):
This creates the described memory leak in Sentry's |
Awesome, it's great to hear we have these simple reproduction steps, at least with Poseidon. Can we add a regression test for that, even when no DSN is set? This would at least allow us to identify the issue in the future. And of course, we should probably now go one step further to avoid a memory leak even when the profiler is being used. |
Today, we discussed this issue again. For now, we won't add a regression test, since this would mainly cover code originally written in the Sentry dependency. Nevertheless, we agreed to create a bug report (ideally with a minimal reproduction example independent of Poseidon) for the Sentry team to take a look at this issue. |
Submitted @ getsentry/sentry-go#724 |
The published release |
Yes, please. Let's do so! |
Done with 3d87cfc and |
We are experiencing another out-of-memory error on Poseidon, that should be investigated further.
I don't have many details to share, yet. The issues appeared first on Sunday, September 17th with release 39fc0f9. After rolling back to the previous commit 68cd8f4 (around 3:30pm CEST), the issue did not occur another time. This could point to one of the dependencies, but I am not sure about that either.
The text was updated successfully, but these errors were encountered: