-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use gRPC streams for linera storage service. #3346
base: main
Are you sure you want to change the base?
Use gRPC streams for linera storage service. #3346
Conversation
…nto linera-storage-service-grpc-streams
@@ -53,7 +32,7 @@ message RequestContainsKeys { | |||
} | |||
|
|||
message ReplyContainsKeys { | |||
repeated bool tests = 1; | |||
repeated bool tests = 1 [packed = true]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't packed
the default anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its in proto3, it was not in proto2.
linera-storage-service/src/client.rs
Outdated
if value.is_none() { | ||
return Ok(None); | ||
} | ||
|
||
reformed_value.append(&mut value.unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid the unwrap
(and why the &mut
?):
if value.is_none() { | |
return Ok(None); | |
} | |
reformed_value.append(&mut value.unwrap()); | |
let Some(value) = value else { | |
return Ok(None); | |
}; | |
reformed_value.append(value); |
Also, should we log a warning if one of the values is None
? That isn't expected behavior, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Append takes an exclusive reference?
No, its fine. None is actually part of the message optional field. If key is not found None is send and we return from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd just like to avoid unwrap
wherever it's easily possible. if let else
works just as well here, without the unwrap
.
Right, the append
could probably be replaced with extend
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd just like to avoid
unwrap
wherever it's easily possible.if let else
works just as well here, without theunwrap
.
Actually, was talking about the need to log the None. It's fine and expected.
Right, the
append
could probably be replaced withextend
.
That would be quite bad, extend
will iterate move things on the stack for each element, append
moves things in bulk in-place on the heap. It will be like O(1) vs O(n), where n is at least 6 orders of magnitude greater. Sometimes the compiler inlines and optimizes it away so both are same, but let's not rely on that.
let mut chunk_size = 0; | ||
for full_key in full_keys { | ||
if MAX_PAYLOAD_SIZE >= full_key.len() + chunk_size { | ||
chunk_size += full_key.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the chunk size probably also contains the encoded length of the full key, and then the encoded length of grouped_keys
?
Not sure if MAX_PAYLOAD_SIZE
leaves enough room for that or if we should add them here? (@MathieuDutSik)
(Also below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, gRPC proto encodes a little more than even the length.
There were two extreme cases
- keys are closer to MAX_KEY_SIZE, then there is a lot more than enough room in MAX_PAYLOAD_SIZE.
- keys are small like 10 bytes, then there will be like hundreds of thousands of keys to test before the limit, even in synthetic tests we are not doing that.
Both are unrealistic cases, and there will performance and efficiency tradeoffs. Extra encodings are also variable for each element as non primitive types are not packed together. Its like tag + length + data where tag can be anything from 1 to 5 bytes depending on the field number and same for length.
Just a simple buffer of like 10-20 bytes will dominate the size of very small keys.
Let me know.
linera-storage-service/src/client.rs
Outdated
} | ||
}; | ||
|
||
let _ = std::thread::spawn(spawner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth doing that in a separate thread? There's no heavy computation going on in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to get to the receiver at the end of functions body to take the advantages of streaming for switching the compute bound and IO bound operations. Preprocessing should be avoided as possible.
I just did that as there was no future to be awaited in the closure.
It just hit me that the receiver will likely be awaited by the tonic, we can just do this like a normal future.
|
||
if 0 == num_messages { | ||
values.push(Some(value)); | ||
value = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather declare value
and num_messages
at the beginning of the while
loop's body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't they declared at beginning only?
Its just reassignment of the binding after moving the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean after the while
, inside the outer loop's body: That would remove the need to reset it. I'd also find it a bit more readable, because it conceptually tracks the progress of the inner loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it needs to be persisted across the while loop. You may refer to the server module.
linera-storage-service/src/client.rs
Outdated
self.read_entries(message_index, num_chunks).await | ||
|
||
let is_zero = |buf: &[u8]| { | ||
let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid unsafe
.
Why not keep using repeated KeyValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I was going to remove the unsafe, it was for me just during the testing as its just a very fast way to check if slice is all 0s. Code should do fine removing the condition using it.
repeated KeyValue
will give unnecessary complexity to the code and will degrade the performance and efficiency.
Best way to approach it serialize the whole thing mimicking bcs, and de-serialize at the other end on-the-fly.
get_random_key_values2(30, 4, 10), | ||
get_random_key_values2(30, 4, 100), | ||
//get_random_key_values2(10, 200, 3_999_700), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that value have a particular significance?
Anyway, let's either remove the line or the //
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large key lengths or number of keys are exponentially slow in the tests. Likely due to the way run_reads
is designed.
Just left this for the thoughts.
@@ -19,7 +19,7 @@ test = ["linera-views/test"] | |||
|
|||
[[bin]] | |||
name = "linera-storage-server" | |||
path = "src/server.rs" | |||
path = "src/main.rs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was quite handy for me during the testing. We get a server
library function we can spawn any time to perform tests using #[tokio::test]
without building everything.
…nto linera-storage-service-grpc-streams
addresses #1793