Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOUNDNESS: Query should not be Copy #17693

Open
alice-i-cecile opened this issue Feb 5, 2025 · 12 comments · May be fixed by #17694
Open

SOUNDNESS: Query should not be Copy #17693

alice-i-cecile opened this issue Feb 5, 2025 · 12 comments · May be fixed by #17694
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior P-Unsound A bug that results in undefined compiler behavior S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Milestone

Comments

@alice-i-cecile
Copy link
Member

I don't think Query being Copy is sound?
IIUC you can go from a mutable Query to a read-only QueryLens, make a Query out of that lens again, then copy that Query, transmute_lens it again, and get a Query from it for a last time. Then you'd have a read-only query that is no longer bound to the &mut self of the original mutable Query.
Did I overlook something?

Originally posted by @Victoronz in #15858 (review)

@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-ECS Entities, components, systems, and events S-Needs-Investigation This issue requires detective work to figure out what's going wrong P-Unsound A bug that results in undefined compiler behavior labels Feb 5, 2025
@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Feb 5, 2025
@alice-i-cecile
Copy link
Member Author

Good eye; that's worth double checking. I think the best way to test this is to open a draft PR with a test that tries to exploit this, and see if miri complains.

@13ros27
Copy link
Contributor

13ros27 commented Feb 5, 2025

To check if I understand this idea correctly:

fn test(query: Query<&mut Test>) {
    let lens = query.transmute_lens::<&Test>();
    let query2 = lens.query();
    let query_copy = query2;
    let lens2 = query_copy.transmute_lens::<&Test>();
    let query3 = lens2.query(); // And the issue is that this potentially no longer has a reference on `query`?
}

@chescock
Copy link
Contributor

chescock commented Feb 5, 2025

I think that translates to

fn foo(mut query: Query<EntityMut>) {
    let mut first_lens = query.as_query_lens();
    let second_query = first_lens.query();
    let mut second_query_copy = second_query;
    let mut second_lens = second_query_copy.as_query_lens();
    let bad_item_one = query.iter_mut().next();
    let bad_item_two = second_lens.query().into_iter().next();
}

That does give me a "cannot borrow query as mutable more than once at a time" error.

Oh, but wait, rust-analyzer tells me everything there is a Query<EntityMut>, so I haven't managed to Copy anything yet.

@13ros27
Copy link
Contributor

13ros27 commented Feb 5, 2025

Oh, but wait, rust-analyzer tells me everything there is a Query<EntityMut>, so I haven't managed to Copy anything yet.

I think you need to use transmute_lens to make it read_only?

@chescock
Copy link
Contributor

chescock commented Feb 5, 2025

The chain of custody should be in the 'w parameter. transmute_lens() goes from &mut self to QueryLens<'_, D>, so the lens is only allowed to access the world while the original borrow is ongoing. Ah, QueryLens::query goes from &mut self to Query<'w, '_, Q, F>. I see: The intent there was that restricting the 's lifetime would be enough, and it could use the original 'w lifetime.

If that's unsound here, would it have already been unsound with get_inner() and iter_inner()? lens.query().get_inner() would give you something with the 'w lifetime from the lens, but only if it was read-only. And the lens was permanently read-only, since there was no Query::into_readonly() or anything. So I think that was sound then, but is unsound now that we can do lens.query().into_readonly().get_inner().

So, I still haven't worked out the exact way to reproduce it, but I agree that I introduced unsoundness!

Could we fix it by changing QueryLens::query to return Query<'_, '_, Q, F>? That would fit the model that we're borrowing the world access for the duration of the query. But it would make it impossible to get 'w data back out of the lens.

Oh, but maybe we could have QueryLens::query_inner that returns Query<'w, '_, Q, F> but that requires Q:ReadOnlyQueryData? If that's sound, it should cover all the cases that were possible before, since you could only get _inner data for read-only queries.

@chescock
Copy link
Contributor

chescock commented Feb 5, 2025

Okay, one reproduction is

fn bad<'w>(mut lens: QueryLens<'w, EntityMut>, entity: Entity) {
    let one: EntityMut<'w> = lens.query().get_inner(entity).unwrap();
    let two: EntityMut<'w> = lens.query().get_inner(entity).unwrap();
    assert!(one.entity() == two.entity());
}

And changing the return type of QueryLens::query() from Query<'w, '_, Q, F> to Query<'_, '_, Q, F> does make that fail to compile.

@Victoronz
Copy link
Contributor

I remember making this comment on another PR, I think it summarizes the 'w quite well:

Revisiting this, I think this is definitely a more intricate issue than it at first seems. This is my current understanding of the lifetimes here:

The 'w lifetime doesn't semantically mean what it sounds like:
As far as I understand, it is supposed to represent a "borrow from world", which conceptually means that there is a &'w World that 'w refers to. However, that isn't really the case. Semantically, a 'w lifetime is simply a named lifetime like any other, i.e. 'a. Its scope must match any other mention of 'w in the item it is defined in, and satisfy any specified bounds on top of that. Because there is not just one "world borrow lifetime", we are lead to confusing situations where we have both 'world and 'w, which are both supposed to be the same, yet are semantically not.

Furthermore, &'w mut Query<'w, ...> doesn't mean we have a borrow of Query that lives as long as world, but that 'w is now constrained to match the scope of that query borrow.
If we obtain an EntityMut<'> from that query, what is originally written as 'w in the struct definition instead becomes scoped on our query borrow (we could call this 'q). If we now fetch an item from EntityMut, we rely on a borrow of that struct, not directly of Query, or World.
In these constrained cases, switching '
to 'w is actually unsound! See: #17035
I think there are handful of these unsound conversions in this PR.
(Similar to this are transmutes, where the 's in a newly constructed Query borrows from a QueryLens).

I do like '_ when it can remove noise, and say "it is inferred, don't worry about this", and I like 'w when it can clarify noise, and say "it is a world borrow, don't worry about this".

When either applies is heavily case dependent, and I don't think I can broadly say.

@chescock
Copy link
Contributor

chescock commented Feb 5, 2025

I believe the semantics for Query should be that Query<'w, 's, D, F> comes with permission to access the components matched by self.state.component_access for the lifetime 'w. That's what the executor gives the parameter when we schedule a system, and that's enough to justify the implementations of its methods both before and after #15858.

I would expect the same thing for QueryLens<'w, D, F>.

Do those rules seem reasonable? If not, what should the rules be?

Under those rules, QueryLens::query() is unsound, since it hands out permission for the full 'w lifetime, but then claims it has it again when the '_ lifetime ends. The fix is to hand out permission only for '_.

But I think a QueryLens::query_inner() that hands out a full 'w lifetime but requires D: ReadOnlyQueryData should be sound. It does have the access to give away, and it's okay that it has it again later because read-only access doesn't conflict with itself. Since the only way to use the full 'w lifetime prior to #15858 required ReadOnlyQueryData, that should cover any cases broken by the change to QueryLens::query().

I'll try to push up a PR with those two changes in case that's the way we want to go.

@Victoronz
Copy link
Contributor

Victoronz commented Feb 5, 2025

I believe the semantics for Query should be that Query<'w, 's, D, F> comes with permission to access the components matched by self.state.component_access for the lifetime 'w. That's what the executor gives the parameter when we schedule a system, and that's enough to justify the implementations of its methods both before and after #15858.

I would expect the same thing for QueryLens<'w, D, F>.

Do those rules seem reasonable? If not, what should the rules be?

Under those rules, QueryLens::query() is unsound, since it hands out permission for the full 'w lifetime, but then claims it has it again when the '_ lifetime ends. The fix is to hand out permission only for '_.

This captures why this all is so confusing!
As I understand:
&'a Query<'w, 's, D, F> introduces the requirement that both 'w and 's live for at least as long as 'a.
Any method of query that returns something containing a '_ lifetime constrains that return type to 'a.
The QueryLens return types do properly contain '_, meaning that they propagate the 'a constraint.
If we now return a Query from that QueryLens, we now introduce 'b from &'b QueryLens, and this time 's inherits that constraint.
As long as the constraints are propagated via the self lifetimes, we remain sound!
However, as soon as we try to skirt it, we need to be careful.

Query<'w, 's, D, F> does not exactly come with the permission to access the components it matches for the lifetime of 'w. It is both 'w and 's.
Additionally, 'w does not, and must not, semantically, only mean the original &'a World, instead, it can be instantiated with multiple constraints, only one of which should have been derived from &'a World at some point.

Note that in some items, 'a is defined as a separate lifetime, and in some places, it is defined as &'w Query<'w, 's, D, F> which is more restrictive than need be. It means that if 's is shorter than 'w, like in a QueryLens-derived Query, those impls are no longer applicable. Hierarchy code contained such overly restrictive impls for example.

In general, we have the inconsistent naming scheme of 'w, 'world, 'a, ... today because we intuitively want to write 'w to mean "the world borrow lifetime" when in actuality it should be "a lifetime originally derived from a world borrow".

@chescock
Copy link
Contributor

chescock commented Feb 5, 2025

I think we might be talking past each other somehow? I understand the language-level rules, and that lifetime parameters can be substituted with different arguments.

Here we're building safe abstractions with unsafe implementations, so we need to define our own safety requirements. That is, somewhere in <&T as QueryData>::fetch() we get a raw pointer from storage and cast it to an &T. We need a proof that this doesn't cause aliasing violations. To construct that proof, we want to define safety requirements for individual pieces so that we can check each step locally.

Do you agree that that's a good way to construct proofs for unsafe operations?

I'm proposing that the safety requirements for struct Query<'world, 'state, D: QueryData, F: QueryFilter = ()> be that it have access as defined by state.component_access that is valid for the lifetime 'world, and similarly for struct QueryLens<'w, Q: QueryData, F: QueryFilter = ()> and 'w.

I am not including the 'state lifetime in any way, because it is only used for safe borrows of QueryState and so can be checked by the compiler.

Do you think that is a reasonable safety requirement for those types?

The signature of QueryLens::query() is fn(&'a mut QueryLens<'w, Q, F>) -> Query<'w, 'a, Q, F>. Under my proposal, the QueryLens there has some access for lifetime 'w, and we are borrowing that access exclusively for some shorter lifetime 'a. But then we hand out a Query that has that access for the full lifetime 'w. That's unsound, because the original QueryLens still holds that access after the 'a lifetime ends. It's essentially doing fn(&'a mut &'w mut T) -> &'w mut T.

Do you agree that QueryLens::query() is unsound under my proposal?

@Victoronz
Copy link
Contributor

Here we're building safe abstractions with unsafe implementations, so we need to define our own safety requirements. That is, somewhere in <&T as QueryData>::fetch() we get a raw pointer from storage and cast it to an &T. We need a proof that this doesn't cause aliasing violations. To construct that proof, we want to define safety requirements for individual pieces so that we can check each step locally.

Do you agree that that's a good way to construct proofs for unsafe operations?

I think we agree? In the sense that we need to determine each use of unsafe to be sound for safe abstraction to be sound in turn.

I'm proposing that the safety requirements for struct Query<'world, 'state, D: QueryData, F: QueryFilter = ()> be that it have access as defined by state.component_access that is valid for the lifetime 'world, and similarly for struct QueryLens<'w, Q: QueryData, F: QueryFilter = ()> and 'w.

I am not including the 'state lifetime in any way, because it is only used for safe borrows of QueryState and so can be checked by the compiler.

Do you think that is a reasonable safety requirement for those types?

Hmm, I don't have full overview over the access infrastructure in the engine, so I am a bit cautious about this, I would definitely want other eyes to confirm that 'state isn't relied upon elsewhere, or will be in the future.

The signature of QueryLens::query() is fn(&'a mut QueryLens<'w, Q, F>) -> Query<'w, 'a, Q, F>. Under my proposal, the QueryLens there has some access for lifetime 'w, and we are borrowing that access exclusively for some shorter lifetime 'a. But then we hand out a Query that has that access for the full lifetime 'w. That's unsound, because the original QueryLens still holds that access after the 'a lifetime ends. It's essentially doing fn(&'a mut &'w mut T) -> &'w mut T.

Do you agree that QueryLens::query() is unsound under my proposal?

I agree that it is unsound. If 'w is the only constraint in the end, then that is what we have to limit when we hand out borrows.

@chescock
Copy link
Contributor

chescock commented Feb 7, 2025

I think we agree?

Oh, good! It must just have been me failing to understand something, then. Sorry!

Hmm, I don't have full overview over the access infrastructure in the engine, so I am a bit cautious about this, I would definitely want other eyes to confirm that 'state isn't relied upon elsewhere, or will be in the future.

Yup, agreed that more eyes are good! I already missed that QueryLens didn't work the way I thought, so I've almost certainly missed something else.

It might also help to document the safety rules a little better. I'm not sure where to put them, though. The obvious place is on the Query type itself, but that's one of the first types a user will encounter, so I don't think we want a lot of scary implementation details about internal engine safety requirements on the front page of its documentation. Maybe Query::new(), which is pub(crate)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior P-Unsound A bug that results in undefined compiler behavior S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants