-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition with net_http instrumentation on JRuby causing intermittent errors #3021
Comments
Thank you for letting us know about this issue. |
Hello @NC-piercej I'm going to make a change that will prevent this specific error from occurring, but I suspect this is a side effect of some other error happening, due to Are you seeing any other errors logged by the ruby agent in the |
I am going to look into seeing if I can find someone who has access to the those logs. In the meantime, I do have some more details. This error is always triggering from the same place: logging the response async on a different thread after the server has already responded with the response. It is possible there may some weirdness around this. Basically, the sequence is:
Second, we did find another race-condition related error within the agent gem code that may be related:
|
We're tweaking some logging settings and should hopefully have some data for you tomorrow. |
We are seeing agent logs, but nothing unusual when the race condition error occurs. Should we try a log level other than the default? |
Hmm, that's interesting that you aren't seeing anything related to the errors I was expecting. The error I was specifically interested in is logged at an error level, so it should be there with default settings if an error is happening in that area. No other errors either? I was looking at the newer error you provided, the hash/iteration one. Does the original error you reported usually occur in the same transaction? I noticed the original stacktrace also includes code in the aws sdk, but that one seems to be using the s3 sdk. Just wanted to check because we do have dynamodb specific instrumentation, but the s3 one would only be picked up by the net http instrumentation. Are these errors usually happening at the same time, or are these different things that both are having errors. Just want to get a better understanding. Also, I did make a change that will prevent the original error from occurring when segment is nil #3046, this should be included in the next release that we do. I'll start looking at that second error and see what's going on there. |
@tannalynn We took another look at the logs and we think we found a few things:
Looking at the code, I think this array mutation needs to be guarded by a mutex: We have multiple services (S3, DynamicDB, etc...) being written to in parallel during our logging code. These would all be triggered from the same overall request/transaction. |
Description
We are seeing occasional errors logged in NewRelic that appear to be originating in the NewRelic gem code. Here is a sample backtrace:
Specifically, it appears to be blowing up here, meaning that
NewRelic::Agent::Tracer.start_external_request_segment
seems to be incorrectly returningnil
.Expected Behavior
NewRelic instrumentation reliably works correctly when multiple requests are in flight at once on different threads on JRuby.
Troubleshooting or NR Diag results
Provide any other relevant log data.
TIP: Scrub logs and diagnostic information for sensitive information
Steps to Reproduce
This appears to be a race condition. We see it quite frequently when multiple requests are running at once on different threads, but it is hard to specifically reproduce.
Your Environment
JRuby 9.4.9.0 (
jruby:9.4.9.0-jdk21
docker image)NewRelic 9.16.1
Additional context
Add any other context about the problem here. For example, relevant community posts or support tickets.
For Maintainers Only or Hero Triaging this bug
Suggested Priority (P1,P2,P3,P4,P5):
Suggested T-Shirt size (S, M, L, XL, Unknown):
The text was updated successfully, but these errors were encountered: