-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic handling: thread safety; set hook once and not repeatedly #1037
base: master
Are you sure you want to change the base?
Conversation
API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1037 |
API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1037 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 🙂
This PR removes the parameter, and prints anyways in the panic hook. I don't understand the intention behind suppressing the print call for the two aforementioned functions, and I don't want to modify their behavior until I understand the reasoning. I marked this PR as a draft because of this.
This needs to stay, it's used to make the integration tests less spammy, see here. Eventually we might do this via cfg, but for now I don't want to make the crate setup more complex.
Thanks for double-checking, it's appreciated! 👍
There is one unsafe block added. It's used to optimize debug information (so
error_context
remains evaluated, unless it is needed). Specifically, it casts&dyn Fn() -> String
to&'static dyn Fn()
, so that the value can be (temporarily) pushed onto athread_local
vec, operating as a stack. This'static
reference will be removed by a drop guard before it is invalidated, which prevents any UB access.
This sounds a bit shady. This part adds a great deal of complexity with thread_local!
and unsafe
, so could you elaborate what we are optimizing exactly? What if we didn't do this optimization?
Another thing you removed is the #[cfg(debug_assertions)]
. Quite some discussion went into the status quo, see #889 and #923. I'm inclined to keep this distinction -- panics in Release should be as fast as possible, while information density is secondary.
Oh I see. The linked source temporarily replaces the panic hook to suppress the print, which should suppress printing now (previously printing was handled outside of the panic hook, but now that it's back in, this should be enough on its own).
It is a little shady, but it's ultimately used to avoid evaluating the
I'll double check the performance impact of removing
Which I don't think would affect performance notably, but I'll double check the performance with this commit/without it to see if there's any difference on release builds to see if there's something else I'm missing. |
I ran some quick tests to check performance between the latest gdext master and this commit. I used the latest godot master, and built both debug and release template builds. Running on windows 10.
I ran a test similar to the previously mentioned test, displaying the number of ms required to run 1mil noop calls. So there's definitely something going on here. I expected a speedup in debug builds, but nowhere near 2x; the performance in debug builds now beats gdscript. Conversely, I wasn't really expecting much change at all in release, but this commit runs about 33% slower. I'll do some digging to see if I can figure out what's happening here. Edit: Unsurprisingly, the slowdown has to due with storing the error context. Removing the error context (so that In summary: the easiest solution would be to mark the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the detailed answer and for the measurements! 👍
It should be completely valid;
error_context
(with a local lifetime) is converted to'static
and pushed onto a global, but removed from that global immediately before the lifetime is invalidated. In other words, the'static
lifetime is needed to place the function in a global, but the lifetime is respected.
Ah I see now, so it's similar to the scoped-tls
crate. This makes sense; it also seems like extending the lifetime to 'static
is sound, as long as the reference isn't accessed past its original lifetime (I checked with Miri).
That said, I wonder if it wouldn't be clearer if instead of transmuting the lifetime to &'static
, you would insert a raw pointer into the stack, and then simply have a // SAFETY:
statement on unsafe access. This will make it less tempting to accidentally reuse the reference (e.g. store it) in the hook. What do you think?
Unfortunately, using a
Cell<...>
instead of aRefCell<Vec<...>>
means that nested calls won't have theirerror_context
values respected, so I don't think it can be used.
Could you elaborate "nested calls" with a small (pseudo-)code scenario? Isn't panicking while a panic hook is processing already a problem?
In summary: the easiest solution would be to mark the
error_context
behind thedebug_assertions
flag. This would result in identical performance to the current gdext master. Release panics would now include location information of the panic, but theerror_context
would only be available on debug builds.
I do think this might be the better trade-off. There are many places where we give away performance of FFI calls due to subtle little interactions in the library, and this also reflects badly in library benchmarks. Eventually I'd like to make this configurable in a very coarse way (setting global flags during library init), but then it would be easier to already have some case differentiation.
I can confirm that this is true. lifetimes are effectively just a safety invariant on references, meaning that breaking them wont actually cause any undefined behavior. so it is sound to create references with too large lifetimes, provided nothing ever accesses them after their true lifetime ends.
I would agree that it'd be nicer to use a type that clearly indicates the unsafety here. but i dont think a raw pointer will work? because they're not |
Yeah, using pointers makes things more clear. I added a type
Sorry, I meant nested calls to
Fair enough.
One of the nifty benefits of |
So I think I understand why the linux tests are failing. Broadly speaking, it's because
Some stacktrace indicates that modifying
to
(plus the get/set calls) fixes everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant nested calls to
handle_panic
.
Yes, but when do they happen? Did you encounter this?
Should we not try to avoid it?
Some stacktrace indicates that modifying
ERROR_CONTEXT_STACK
(thethread_local
from this PR) leads tois_hot_reload_enabled
being called, which makesenable_hot_reload
fail.
Kind of a weird bug, but simple enough fix
This happens here, in the thread_atexit
hook.
While AtomicBool
may make the issue go away, I'd rather be sure that the hot-reload-enabled flag is initialized early enough so that it's not an issue. This should be done very early, on startup. Is the whole panic handling not only set up later, or why the interference here?
Either way, would be good to have a separate PR to discuss this 🙂
godot-core/src/private.rs
Outdated
/// # Safety | ||
/// Function must removed (using pop_function) before lifetime is invalidated. | ||
unsafe fn push_function(&mut self, function: &dyn Fn() -> String) { | ||
/// # Safety | ||
/// The caller must ensure that the function isn't used past its original lifetime. | ||
/// (Invariant must be held by push_function) | ||
unsafe fn assume_static_lifetime( | ||
value: &dyn Fn() -> String, | ||
) -> &'static dyn Fn() -> String { | ||
std::mem::transmute(value) | ||
} | ||
self.0.push(assume_static_lifetime(function) as *const _); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use raw pointers, lifetimes are thrown away, so the extension with assume_static_lifetime
is no longer needed. Just insert the raw pointer directly.
Furthermore, which function is unsafe depends a bit on how we lay out the responsibilities. I'd spontaneously have said that get_last()
should be unsafe since it's the one introducing UB when dereferencing a dangling pointer, but on the other hand there's no real invariant the caller can uphold inside the panic hook; they rely on everything being done correctly in handle_panic
.
As such, this probably makes sense?
@lilizoey opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use raw pointers, lifetimes are thrown away, so the extension with
assume_static_lifetime
is no longer needed. Just insert the raw pointer directly.
I'm not a hundred percent certain why, but inserting the raw pointer without the call to assume_static_lifetime
results in a compiler error:
error: lifetime may not live long enough
--> godot-core\src\private.rs:296:29
|
287 | unsafe fn push_function(&mut self, function: &dyn Fn() -> String) {
| - let's call the lifetime of this reference `'1`
...
296 | self.functions.push(function as *const dyn Fn() -> String);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cast requires that `'1` must outlive `'static`
error: could not compile `godot-core` (lib) due to 1 previous error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has to do with the lifetime of the function trait itself, but I'm not certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, which function is unsafe depends a bit on how we lay out the responsibilities. I'd spontaneously have said that
get_last()
should be unsafe since it's the one introducing UB when dereferencing a dangling pointer, but on the other hand there's no real invariant the caller can uphold inside the panic hook; they rely on everything being done correctly inhandle_panic
.As such, this probably makes sense?
@lilizoey opinions on this?
Honestly it's a bit of a tossup, at least one of push_function
and get_last
have to be unsafe for sure. I think this works as is though. The api isn't the easiest to use safely but we also dont use this anywhere else, so it's probably fine.
I think it has to do with the lifetime of the function trait itself, but I'm not certain.
Yeah, you can do
function as (*const dyn Fn() -> String) as (*const (dyn Fn() -> String + 'static))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still pending 🙂
First, let's not use as
cast to go from reference to pointer, there's a dedicated function for it. Unfortunately, we can't use ptr::cast()
for the second conversion, as that implicitly needs Sized
for some reason.
Using intermediate variables for clarity, this then becomes:
let function = std::ptr::from_ref(function);
let function = function as *const (dyn Fn() -> String + 'static);
Sidenote: as _
casts are a rusty crowbar which is very dangerous, let's write types explicitly in this context, not with _
. It's ironic that mem::transmute
got a lint to specify types explicitly, but pointer bending at will is totally not seen as a problem, not even in clippy. (I'm not saying it should be unsafe
, of course; a lint would be nice though).
I was assuming (maybe incorrectly?) that Altogether, this PR should be pretty much done, pending some kind of fix for the interference with TLS causing the linux tests to fail (#1040). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed you changed code style in several places and deleted some comments documenting behavior, but the previous code had often a reason to be. So I added some comments asking to revert such cases.
Sorry for being nitpicky, but panic handling is a particularly central and important part of the codebase 😉
Once we find a solution for the hot-reload flag, I'll also try to run it locally to test.
godot-core/src/private.rs
Outdated
/// # Safety | ||
/// Function must removed (using pop_function) before lifetime is invalidated. | ||
unsafe fn push_function(&mut self, function: &dyn Fn() -> String) { | ||
/// # Safety | ||
/// The caller must ensure that the function isn't used past its original lifetime. | ||
/// (Invariant must be held by push_function) | ||
unsafe fn assume_static_lifetime( | ||
value: &dyn Fn() -> String, | ||
) -> &'static dyn Fn() -> String { | ||
std::mem::transmute(value) | ||
} | ||
self.0.push(assume_static_lifetime(function) as *const _); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still pending 🙂
First, let's not use as
cast to go from reference to pointer, there's a dedicated function for it. Unfortunately, we can't use ptr::cast()
for the second conversion, as that implicitly needs Sized
for some reason.
Using intermediate variables for clarity, this then becomes:
let function = std::ptr::from_ref(function);
let function = function as *const (dyn Fn() -> String + 'static);
Sidenote: as _
casts are a rusty crowbar which is very dangerous, let's write types explicitly in this context, not with _
. It's ironic that mem::transmute
got a lint to specify types explicitly, but pointer bending at will is totally not seen as a problem, not even in clippy. (I'm not saying it should be unsafe
, of course; a lint would be nice though).
Co-authored-by: Jan Haller <[email protected]>
Co-authored-by: Jan Haller <[email protected]>
For:
Clippy complains that the second cast is unnecessary:
... and yet, without it, it doesn't acknowledge the function is being |
Work-in-progress change to how gdext handles panics. This PR is mostly complete, but there's one or two things I don't understand and wanted to get feedback on. Mostly, it comes down to
handle_varcall_panic
andhandle_ptrcall_panic
; the existing implementation ofhandle_panic_with_print
provides aprint
parameter, which allowshandle_varcall_panic
andhandle_ptrcall_panic
to suppress printing on error. This PR removes the parameter, and prints anyways in the panic hook. I don't understand the intention behind suppressing the print call for the two aforementioned functions, and I don't want to modify their behavior until I understand the reasoning. I marked this PR as a draft because of this.Otherwise, this PR should be mostly straightforward. Instead of calling
set_hook
for all calls tohandle_panic_with_print
, the set_hook function is set only on initialization. This should both make panic handling thread safe, and also allow users to set their own panic hook without conflicting with gdext.There is one unsafe block added. It's used to optimize debug information (so
error_context
remains unevaluated, unless it is needed). Specifically, it casts&dyn Fn() -> String
to&'static dyn Fn()
, so that the value can be (temporarily) pushed onto athread_local
vec, operating as a stack. This'static
reference will be removed by a drop guard before it is invalidated, which prevents any UB access.