Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write heartbeat thread output to safe crash log #153

Merged
merged 3 commits into from
May 27, 2024
Merged

Conversation

kpamnany
Copy link
Collaborator

@kpamnany kpamnany commented May 24, 2024

PR Description

#140 added the --safe-crash-log-file command line argument. When this is specified, and when in signal handler context, jl_safe_printf additionally writes whatever it is printing to the file in JSON format.

With this PR, jl_safe_printf writes to the file in JSON format from the heartbeat thread's context also.

This PR also refactors the JSON printing function to use a single write call rather than multiple, to reduce the possibility of output from multiple threads being interleaved.

Checklist

Requirements for merging:

kpamnany added 2 commits May 24, 2024 12:11
To allow `jl_safe_printf()` to determine whether it's being called
from the heartbeat thread.
In `jl_safe_printf()`, we're already writing to the safe crash log
if we're in signal handler context (in addition to writing to
`stderr`). Now we do the same if we're in the heartbeat thread.
@github-actions github-actions bot added port-to-v1.10 This change should apply to Julia v1.10 builds port-to-master This change should apply to all future Julia builds labels May 24, 2024
@kpamnany kpamnany requested review from NHDaly, Drvi and d-netto May 24, 2024 18:43
src/jl_uv.c Outdated Show resolved Hide resolved
src/jl_uv.c Show resolved Hide resolved
src/jl_uv.c Outdated Show resolved Hide resolved
Copy link
Member

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Lgtm except for some buffer math questions. Would be good to print what we can if possible, and best if we could print it all!

Concurrent `write()` calls to the same file descriptor should be
thread-safe, but can result in interleaving. Refactor the JSON
printing code used from `jl_safe_printf()` to assemble the message
into a buffer and use a single `write()` call.
@kpamnany
Copy link
Collaborator Author

Improved the "JSONification" implementation based on @NHDaly's comments.

My first stab at this was to use alloca for a sufficiently large buffer, but that didn't work well. I then moved to using a large but fixed-size buffer and hoping it was big enough -- terrible approach. But I didn't want to add the overhead of a malloc and a free for every call, so I'm now following Nathan's suggestion of truncating the message if it is too big for the fixed-size buffer. I've also updated the buffer math to accommodate the JSON preamble, etc.

@kpamnany
Copy link
Collaborator Author

I've tested this locally, starting Julia with --safe-crash-log-file:

  • Sending a signal: the output is correctly duplicated into the safe crash log file as before.
  • Starting the heartbeat mechanism and forcing it to timeout: the heartbeat loss messages and the task backtraces are correctly duplicated into the safe crash log file.

@kpamnany kpamnany merged commit 6479c07 into v1.10.2+RAI May 27, 2024
5 checks passed
@kpamnany kpamnany deleted the kp-hb-safe-log branch May 27, 2024 18:12
@kpamnany kpamnany restored the kp-hb-safe-log branch May 27, 2024 18:13
@kpamnany kpamnany deleted the kp-hb-safe-log branch May 27, 2024 19:39
@NHDaly
Copy link
Member

NHDaly commented May 28, 2024

I chose your first suggestion -- to truncate if the message is too large -- mainly because I'm pretty sure that it will almost never happen because of the way jl_safe_printf is used.

Ohhhhhh too bad - So we're still going to get one json message per line in the safe printf in the julia task stacktraces? I was incorrectly believing this PR would also be combining the whole report into one json message, which would be nicer to consume, but I guess would be annoying to implement.

So that will have to wait for a future PR, yeah?

@kpamnany
Copy link
Collaborator Author

So we're still going to get one json message per line in the safe printf in the julia task stacktraces? I was incorrectly believing this PR would also be combining the whole report into one json message, which would be nicer to consume, but I guess would be annoying to implement.

Yes, it's one JSON message per line. To do what you're thinking would require changing how jl_safe_printf is used in the stack trace collection code and I think that would be a lot more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
port-to-master This change should apply to all future Julia builds port-to-v1.10 This change should apply to Julia v1.10 builds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants