-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce API to safely initialize Packets #3533
base: master
Are you sure you want to change the base?
Conversation
5db3dcd
to
9cdcfc6
Compare
This avoids potential UB by calling .set_len() on the PacketBatch before the items have properly been initialized
9cdcfc6
to
c904a9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few initial comments.
@alessandrod all this work to avoid setting zeros. Additional motivation for Packet!
fn init_packet_meta(packet: &mut mem::MaybeUninit<Packet>, meta: Meta) { | ||
// SAFETY: Access the field by pointer as creating a reference to | ||
// and/or within the uninitialized Packet is undefined behavior | ||
unsafe { ptr::addr_of_mut!((*packet.as_mut_ptr()).meta).write(meta) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't there a new syntax for getting ptr of a field introduced in 1.82?
Iirc all this concern came from the upgrade to that version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't there a new syntax for getting ptr of a field introduced in 1.82?
Oh nice, didn't know about this; thanks for mentioning and will read up on it
irc all this concern came from the upgrade to that version
Yep, validator was panicking with 1.82. The panic was addressed with #3325, so we could hypothetically go back to 1.82 to take advantage of the new syntax (&raw
) in 1.82
@@ -224,6 +292,61 @@ impl PartialEq for Packet { | |||
} | |||
} | |||
|
|||
/// A custom implementation of io::Write to facilitate safe (non-UB) | |||
/// initialization of a MaybeUninit<Packet> | |||
struct PacketWriter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simple wrapper around MaybeUninit packet?
We know the capacity of the buffer, and can determine remaining bytes from the current length and the fixed capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the capacity of the buffer, and can determine remaining bytes from the current length and the fixed capacity
The main motivation for writing this wrapper was to have something that implements std::io::Write
that we could pass to bincode::serialize_into()
.
If you drill down into bincode, writer.write()
might get called repeatedly for one invocation of bincode::serialize_into()
. Thus, we need to track how many bytes we have written after each call to write
; we don't have the ability to update packet.meta
(which may not have been initialized yet) as we go.
Not convinced right now that keeping set_len is the ideal safe API. Can we not do some sort of 'push_data'? This could be a safe fn that can set_len internally. but our function would be safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced right now that keeping set_len is the ideal safe API.
Can we not do some sort of 'push_data'? This could be a safe fn that can set_len internally. but our function would be safe.
Would push_data()
do anything else besides call set_len()
? I'd be open to demoting set_len()
to private in favor of something else where we do the set_len()
+ add comments to hopefully avoid future abuse. Just want to make sure I follow your suggestion
fn init_packet_meta(packet: &mut mem::MaybeUninit<Packet>, meta: Meta) { | ||
// SAFETY: Access the field by pointer as creating a reference to | ||
// and/or within the uninitialized Packet is undefined behavior | ||
unsafe { ptr::addr_of_mut!((*packet.as_mut_ptr()).meta).write(meta) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't there a new syntax for getting ptr of a field introduced in 1.82?
Oh nice, didn't know about this; thanks for mentioning and will read up on it
irc all this concern came from the upgrade to that version
Yep, validator was panicking with 1.82. The panic was addressed with #3325, so we could hypothetically go back to 1.82 to take advantage of the new syntax (&raw
) in 1.82
@@ -224,6 +292,61 @@ impl PartialEq for Packet { | |||
} | |||
} | |||
|
|||
/// A custom implementation of io::Write to facilitate safe (non-UB) | |||
/// initialization of a MaybeUninit<Packet> | |||
struct PacketWriter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the capacity of the buffer, and can determine remaining bytes from the current length and the fixed capacity
The main motivation for writing this wrapper was to have something that implements std::io::Write
that we could pass to bincode::serialize_into()
.
If you drill down into bincode, writer.write()
might get called repeatedly for one invocation of bincode::serialize_into()
. Thus, we need to track how many bytes we have written after each call to write
; we don't have the ability to update packet.meta
(which may not have been initialized yet) as we go.
I was imagining something like this: fn try_push_data(&mut self, bytes: &[u8], addr: Option<&SocketAddr>) -> bool {
if self.len() == self.capacity() {
return false;
}
let uninitialized_packet = self.x.spare_capacity_mut()[0];
Packet::init_packet_from_bytes(uninitialized_packet, bytes, addr);
self.set_len(self.len() + 1);
true
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify where exactly the undefined behavior happens with the existing code?
unsafe { | ||
packet_batch.set_len(PACKETS_PER_BATCH); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removing one unsafe
but then adding two new ones.
Why is the new code better than the old one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set_len
Safety
- new_len must be less than or equal to capacity().
- The elements at old_len..new_len must be initialized.
We are not initializing the data first so we are violating the second bullet
packet_batch.set_len(num_transactions); | ||
} | ||
|
||
let uninitialized_packets = packet_batch.spare_capacity_mut().iter_mut(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need assume_init
somewhere below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not, set_len()
does the work for us at the end:
assume_init()
yields aT
from aMaybeUninit<T>
- We are starting with a
Vec<Packet>
(the type under the hood forPacketBatch
) packet_batch.spare_capacity_mut()
allows us to access elements at indexi
wherevec_length <= i < vec_capacity
- But, our access to these is of type
MaybeUninit<Packet>
- But, our access to these is of type
- We initialize the elements in place, so calling
set_len()
is saying "these are valid elements of the Vec now and can be accessed normally; also drop them normally when dropping the Vec`
addr: Option<&SocketAddr>, | ||
) -> io::Result<()> { | ||
let mut writer = PacketWriter::new_from_uninit_packet(packet); | ||
let num_bytes_written = writer.write(bytes)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably use Write::write_all
.
Write::write
is not meant to write the entire buffer.
And there is no need to rely on the implementation details of PacketWriter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no need to rely on the implementation details of PacketWriter.
Yeah, that's fair and also simplifies things on the caller side (ie no longer need the debug_assert
); will make this change
// SAFETY: We previously verifed that buf.len() <= self.spare_capacity | ||
// so this write will not push us past the end of the buffer. Likewise, | ||
// we can update self.spare_capacity without fear of overflow | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these new instances of unsafe
are not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should be very stingy with our use of unsafe
. However, our current code has the potential for UB which is even less ideal than unsafe
's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify where exactly the undefined behavior happens with the existing code?
Basically, we're doing something the docs tell us not to:
https://doc.rust-lang.org/beta/std/mem/union.MaybeUninit.html#initialization-invariant
For example, when we do:
*packet.meta_mut() = Meta::default();
On a packet that wasn't actually initialized, drop
will get called on the old Meta
that wasn't actually initialized.
The particular areas this PR changes don't currently appear to be having an observable effect, but this PR was made to be proactive instead of reactive in light of the issue that #3325 addressed
unsafe { | ||
packet_batch.set_len(PACKETS_PER_BATCH); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set_len
Safety
- new_len must be less than or equal to capacity().
- The elements at old_len..new_len must be initialized.
We are not initializing the data first so we are violating the second bullet
packet_batch.set_len(num_transactions); | ||
} | ||
|
||
let uninitialized_packets = packet_batch.spare_capacity_mut().iter_mut(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not, set_len()
does the work for us at the end:
assume_init()
yields aT
from aMaybeUninit<T>
- We are starting with a
Vec<Packet>
(the type under the hood forPacketBatch
) packet_batch.spare_capacity_mut()
allows us to access elements at indexi
wherevec_length <= i < vec_capacity
- But, our access to these is of type
MaybeUninit<Packet>
- But, our access to these is of type
- We initialize the elements in place, so calling
set_len()
is saying "these are valid elements of the Vec now and can be accessed normally; also drop them normally when dropping the Vec`
addr: Option<&SocketAddr>, | ||
) -> io::Result<()> { | ||
let mut writer = PacketWriter::new_from_uninit_packet(packet); | ||
let num_bytes_written = writer.write(bytes)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no need to rely on the implementation details of PacketWriter.
Yeah, that's fair and also simplifies things on the caller side (ie no longer need the debug_assert
); will make this change
// SAFETY: We previously verifed that buf.len() <= self.spare_capacity | ||
// so this write will not push us past the end of the buffer. Likewise, | ||
// we can update self.spare_capacity without fear of overflow | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should be very stingy with our use of unsafe
. However, our current code has the potential for UB which is even less ideal than unsafe
's
Gotcha, we could do something like that and I don't feel too strongly either way. To confirm, is the idea to concentrate as much as possible (if not all) of the |
If I understand this correctly rust says doing To avoid |
The issue is Drop. Say that you have
This is never ok. Because |
I now understand the issue you're calling out Behzad & thanks for the elaboration Alessandro; the fill-rest-with-zero is something I didn't consider / account for. So I guess we have several options:
I don't immediately see any other options without a much more major change to avoid the |
But does drop implementation for type |
The latter is what we should do EDIT: OR we create our custom array type that doesn't drop the uninitialized part
drop of |
okay, but where does this emit any instructions to read the memory? |
The problem is that this is not C: the compiler is telling you what happens if you make me call drop on something that isn't initialized is undefined and I might crash or sell your SOL. See the SIGILL bug we just fixed. |
Please feel free to tell me I'm dumb. Can we not make |
I was thinking along the same line ( By having the entire thing be |
all correct! |
Okay, so there is no uninitialized memory read but this is still considered UB arbitrarily by the compiler. Is that what you are saying? and do you have any reference for this?
That was different though. There was a pointer there and a pointer initialized with garbage can be invalid. Trying this out on godbolt I only get a ud2 only if I uncomment the last line which uses a pointer: |
Correct me if I'm wrong @steviez. There is not really an issue now with the current code. It compiles, it may work, even if we're relying on undefined behavior (drop may not work correctly at some points). That can change in any future update of our rustc version, if the compiler becomes more strict about checking for UB - which is what happened to us in 1.82. The goal of the PR is to "future proof" this so it doesn't break in a future compiler version - and since we just fixed another issue related to this, it is fresh in our minds. It makes sense to fix now rather than wait until it breaks and we have to re-gather the context. We're breaking the assumptions laid out by
Also,
I'm pretty sure this isn't true. |
Problem
We currently abuse uninitialized data with the
Packet
type that leaves us open to UB in several places. A common pattern is:PacketBatch
(mostly aVec<Packet>
) with capacityN
set_len(N)
on thatPacketBatch
Packet
's as if they are properly initializedagave/entry/src/entry.rs
Lines 546 to 569 in f621667
As that comment suggests, we did this to avoid writing data / 0's that we will immediately overwrite. However, the manner in which we're doing it is not safe.
Summary of Changes
PacketWriter
to help fill the buffer of aMaybeUninit<Packet>
through pointers instead of referencesPacket
to initialize from some serializable data or a regular byte streamentry.rs
andShredFetchStage
Subsequent PR's
Some other work that I have in mind that I'd like to defer to subsequent PR's for the sake of keeping each individual PR as small as possible.
Remaining packet_batch.set_len()
There is still one more location where we do
set_len()
and access the item:agave/streamer/src/nonblocking/quic.rs
Lines 960 to 972 in f621667
This one is slightly different as the writes could come out of order, so I was thinking of refactoring that in a different PR
clippy::uninit_assumed_init()
Another instance where me may immediately access uninitialized data is here:
agave/sdk/packet/src/lib.rs
Lines 210 to 219 in f621667
I would like to address this separately as well, as I think the proper solution is to deprecate
Packet::default()
. However,Packet
lives in sdk so we have to be a little more careful with that one