-
-
Notifications
You must be signed in to change notification settings - Fork 224
Virtual methods that return pointers should only be implementable through an unsafe trait #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can see a couple of solutions to this:
We should check how many methods this applies to. If there are very few then the third option would be fairly viable, because sometimes we can create a safe custom virtual method (see for instance my comment on The first option is definitely simplest, but could cause an issue where someone who doesn't care about overriding the methods in question would suddenly need to add The second option limits how much people need to use |
I wrote a comment about this here: gdext/godot-codegen/src/generator/functions_common.rs Lines 99 to 103 in fe5b02b
While "technically correct", this approach is super impractical:
That might be an option, but we'd need to see how much this bloats the API. Since virtual methods are duplicated across all derived classes, this can cause a proliferation of extra traits just in support of one method.
I like this 🙂 For the user, it would likely be more ergonomic if no unsafety is involved at all. But I could imagine there are cases where this can be desired for more low-level control; would need to look more closely into the affected APIs. I also have another possible idea:
// Instead of:
fn instance_create(&self, for_object: Gd<Object>) -> *mut c_void {
do_thing(self, for_object)
}
// We would have:
fn instance_create(&self, for_object: Gd<Object>) -> Checked<*mut c_void> {
let ptr: *mut c_void = do_thing(self, for_object);
unsafe { Checked::from_raw(ptr) }
} Not the nicest, but at least it would re-introduce locality and could be applied in a generic way. Of course this would not necessarily exclude your suggestion 3). We could also combine the two (either by manually selecting some methods for which we have a safe counterpart, and remove them; or keep both and check that only one is overridden). But this depends a bit on how often such methods occur. |
i hadn't seen that before. i do disagree with this being "technically more correct", it is in fact currently incorrect and making the trait unsafe is one of several correct options. soundness of an api isn't a matter of degree really. except ig in how many different ways you can break it, but that's just varying degrees of incorrect not varying degrees of correctness. a particular api either is sound, all other cases are unsound. regarding i dont think that fourth option really works to be honest. because like what is the actual safety invariant of it's also not the case that constructing the i dont think it's actually possible to create a coherent safety invariant for that type which doesn't just become exhaustively enumerating every possible method it can be returned from. I'd much rather do option 2 or even option 1 over that one. |
Not sure if you're getting hung up on the "more" here, but this incorrectness is the exact reason why I'm looking into the different alternatives now 😉 I just didn't feel like solving all problems at once, which is why I wrote that comment. "Technically correct" still is "correct", but the context is that it's impractical to blindly follow rules without thinking about the further API implications. Instead, we should look for a solution which is both correct and ergonomic (which 1) isn't). I'm not advocating to sacrifice correctness for ergonomics.
That's not how Example:
Context is important -- people would only encounter
I would not even start trying this. It's fine to be vague, people need to understand the context when using it. I really don't see the problem.
I don't see option 1) happening, for the stated reasons. Option 2) can be discussed but this being a niche use case, I'm not sure if someone truly benefits from extra traits. On the contrary, this may become harder to discover as people wonder where certain methods are, etc. And code organization wise, it's still non-local. If I have multiple unsafe methods, then I need to describe their contracts in the trait description, for each: // SAFETY:
// - why method1 is correct
// - why method2 is correct
unsafe impl UnsafeIThing for MyClass {
fn method1() -> void* {
do_thing()
}
fn method2() -> void* {
do_thing()
}
} instead of near the place where the actual unsafety can occur: impl IThing for MyClass {
fn method1() -> Checked<void*> {
// SAFETY: why the return value is correct
unsafe { Checked::from_raw(do_thing()) }
}
fn method2() -> Checked<void*> {
// SAFETY: why the return value is correct
unsafe { Checked::from_raw(do_thing()) }
}
} Conceptually, each method has its own invariant, there is no inherent unsafety in the trait itself. To demonstrate this, one can implement the trait and override nothing, in which case there is no invariant to uphold. Per-method |
My point was that we can choose where to put safety obligations, and the obligation should be returning this type. But we can't do that without making the trait unsafe.
Safe methods cannot have safety invariants on their implementation unless they are in a unsafe trait. What you're suggesting is removing the safety invariants imposed on implementing the methods, and instead requiring the user to return a type that must fulfill some safety invariant separate to the method. Conceptually i am fine with this, for instance making What i am against here is making a catch-all such return type. Because the safety invariant of a function should ideally be as minimal as possible. To make it feasible to understand and check the uses of unsafe that such a function imposes on the user. But you're suggesting making an extremely broad safety invariant which encompasses any method that happens to return a pointer. At the very least if we're gonna do this, we can codegen separate like |
From a quick look, it seems only 5 virtual methods have pointer returns? trait IScriptExtension {
fn instance_create(&self, for_object: Gd<Object>) -> *mut c_void;
fn placeholder_instance_create(&self, for_object: Gd<Object>) -> *mut c_void;
}
trait ITextServerExtension {
fn shaped_text_get_glyphs(&self, shaped: Rid) -> *const Glyph;
fn shaped_text_sort_logical(&self, shaped: Rid) -> *const Glyph;
fn shaped_text_get_ellipsis_glyphs(&self, shaped: Rid) -> *const Glyph;
} See Godot docs for The first 2 methods create new independent objects, which makes a safe abstraction a bit easier. The second 3 however seem to return a pointer to internal glyphs (there are no docs, so I have to guess) -- which means they are coupled to a lifetime which we don't know. A safe abstraction like this might be possible, but it would need more thorough investigation? fn shaped_text_get_glyphs(&'a self, shaped: Rid) -> &'a [Glyph];
// (lifetime for explicitness, can be elided) |
(Btw, unrelated to the discussion here: I just noticed that functions accepting |
ok so that does mean that option 3 is likely entirely viable which makes the discussion partially moot. we could still have option 2 as a fallback in cases where we havent made a custom impl. |
Yeah, if we can get by with safe methods without losing any functionality, that would be the best case scenario 🙂 Maybe we could add a mechanism that stops nightly CI when there are new methods that don't have a safe counterpart, otherwise we'll forget those. Regarding the above case, I dug a bit in the Godot source code, and the returned glyph pointers are indirectly used as follows: TypedArray<Dictionary> TextServer::_shaped_text_get_glyphs_wrapper(const RID &p_shaped) const {
TypedArray<Dictionary> ret;
const Glyph *glyphs = shaped_text_get_glyphs(p_shaped);
int gl_size = shaped_text_get_glyph_count(p_shaped);
for (int i = 0; i < gl_size; i++) {
Dictionary glyph;
glyph["start"] = glyphs[i].start;
glyph["end"] = glyphs[i].end;
glyph["repeat"] = glyphs[i].repeat;
glyph["count"] = glyphs[i].count;
glyph["flags"] = glyphs[i].flags;
glyph["offset"] = Vector2(glyphs[i].x_off, glyphs[i].y_off);
glyph["advance"] = glyphs[i].advance;
glyph["font_rid"] = glyphs[i].font_rid;
glyph["font_size"] = glyphs[i].font_size;
glyph["index"] = glyphs[i].index;
ret.push_back(glyph);
}
return ret;
} So it really looks like the pointers are just used for temporary reading, and all the contents are copied. Returning a slice with lifetime of We can add an assertion that |
unfortunately we cannot rely on a user-generated count method always returning the right length of such a slice, since there's nothing stopping a user from just returning different lengths every single time. and if we were to check the length every time when calling the method then that's basically the same as us writing the count method but with extra steps. so that leaves either us making the count method or an unsafe trait. |
This should probably be fine: // overridden by user
fn shaped_text_get_glyphs(&self, shaped: Rid) -> &[Glyph];
// hidden outside trait, generated when user overrides the above
fn shaped_text_get_glyph_count(&self, shaped: Rid) -> i64 {
self.shaped_text_get_glyphs(shaped).len() as i64
} Since the former returns a reference, it's unlikely to incur expensive computations, so calling it twice shouldn't be an issue. |
the user is still free to return differently sized slices from |
A supervillain could theoretically implement Currently we have the following call sequences: // _shaped_text_get_glyphs_wrapper
const Glyph *glyphs = shaped_text_get_glyphs(p_shaped);
int gl_size = shaped_text_get_glyph_count(p_shaped);
// _shaped_text_sort_logical_wrapper
const Glyph *glyphs = shaped_text_sort_logical(p_shaped);
int gl_size = shaped_text_get_glyph_count(p_shaped);
// _shaped_text_get_ellipsis_glyphs_wrapper
const Glyph *glyphs = shaped_text_get_ellipsis_glyphs(p_shaped);
int gl_size = shaped_text_get_ellipsis_glyph_count(p_shaped); And of course people can query the count directly. I don't like where this is heading 🙈 a ton of work for an API that might not ever be used at all |
if we're gonna fix this to be correct we should actually make it correct. if we can't find some good relatively simple way to do this correctly then we should just go for something like option 2. (even option 1 may be fine since it's gonna be a single rarely-used virtual method trait that'll be affected) half-assing the correctness here is gonna give people false confidence. since it'll look like we've taken extra care to make this particular api completely sound, when in reality we've not. it'd arguably be better to have something that is obviously unsound at that point. |
This problem is almost certainly not limited to methods returning pointers. There are tons of APIs that return some sort of count, or multiple arrays that are expected to have the same length. E.g. Given the vast implementation, how can we be sure that logic errors never result in UB?
What's your suggestion then? We would need to conservatively make every single interface trait And making everything |
Maybe a useful delineation is "known to cause UB if incorrect". The methods returning pointers can definitely cause UB, so making it unsafe to implement them is standing to reason. If we identify other virtual methods that have this property in the future, we can extend the So what remains is:
|
yeah basically.
if we want to provide a safe abstraction over it then there are a lot of options, but all of them have their own drawbacks. the main issue is that we need to find the length given a pointer to the array, and there isn't a very good way to do that in rust. (well if you have a pointer to a slice it's easy, but we have a pointer to the data in the slice not a pointer to the slice itself) some examples (these are things i've considered for property lists which have many of the same issues):
there isn't really a perfect solution for that case. so i do think the best would be to put it into an unsafe trait, since this is an issue that is easily solvable by just implementing the functions the intended way. making the methods return slices could help guide people towards doing the right thing better so i'd be in favor of that. but it also isn't super important. having the ability to codegen a subset of some virtual method trait into a separate unsafe trait could be useful in the future too if this comes up again. |
Currently when generating virtual methods that take pointers as arguments or return pointers we label the method itself
unsafe
. However generally speaking, a method that returns a pointer shouldn't cause any UB if it's called wrong. For instanceBox::into_raw
is a safe method. So it's wrong (though generally harmless) for methods that only return pointers, but doesn't take them as arguments, to be labelled asunsafe
.However when someone implements this trait, they must ensure that the pointer they return from the method satisfies certain criteria. Otherwise they may cause UB. This means that the trait the method is in must be an unsafe trait.
The text was updated successfully, but these errors were encountered: