-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quietly prepare unprepared queries in batches #812
Conversation
Please change the PR title to mention that it's about unprepared queries in batches. Currently, it's too similar to the previous PR's title IMO (#806). |
scylla/src/transport/connection.rs
Outdated
let mut batch: Batch = Default::default(); | ||
batch.config = init_batch.config.clone(); | ||
|
||
let mut to_prepare = HashSet::<usize>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a HashSet
to store the index? Every index you are going to insert will be unique, so it using just Vec
would be simpler.
Besides that, using a HashSet
would make more sense if you stored query strings in it - this way if somebody wants to execute a single unprepared multiple times with different parameters, it will only have to be prepared once. I think we should follow this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced HashSet
with Vec
. I tried to store query strings, but I would still need to know the indexes to maintain the right order of the statements in batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the code looks more complicated than before, actually. The vec_id
variable and comparing indexes looks tricky...
I replaced
HashSet
withVec
. I tried to store query strings, but I would still need to know the indexes to maintain the right order of the statements in batch
You can first create a hashset with all needed strings. Then, you can prepare the statements and put them in a hashmap, indexed by statement string. Finally, when you re-create the batch, when you iterate and encounter an unprepared statement/query in the original batch, look up the prepared statement in the hashmap.
scylla/src/transport/connection.rs
Outdated
values: impl BatchValues, | ||
consistency: Consistency, | ||
serial_consistency: Option<SerialConsistency>, | ||
) -> Result<QueryResult, QueryError> { | ||
let batch = { | ||
let mut batch: Batch = Default::default(); | ||
batch.config = init_batch.config.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no statements to prepare, can we avoid creating a new batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added checking if to_prepare
is empty
scylla/src/transport/connection.rs
Outdated
{ | ||
let mut value_iter = values.batch_values_iter(); | ||
for (id, stmt) in init_batch.statements.iter().enumerate() { | ||
let value = value_iter.next_serialized().transpose()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serializing values for a query can be relatively expensive, so it would be best to avoid it. Let's use skip_next
instead of next_serialized
in case of BatchStatement::PreparedStatement
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
c4bf01a
to
27c733d
Compare
27c733d
to
b581e02
Compare
scylla/src/transport/connection.rs
Outdated
let mut batch: Batch = Default::default(); | ||
|
||
if to_prepare.is_empty() { | ||
batch = init_batch.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid cloning anything if we can use the init_batch
directly. Can we use Cow
for this? If we can use init_batch
, assign Cow::Borrowed(init_batch)
to batch
. Otherwise, construct a new one and assign Cow::Owned(new_batch)
to batch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scylla/src/transport/connection.rs
Outdated
let mut batch: Batch = Default::default(); | ||
batch.config = init_batch.config.clone(); | ||
|
||
let mut to_prepare = HashSet::<usize>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the code looks more complicated than before, actually. The vec_id
variable and comparing indexes looks tricky...
I replaced
HashSet
withVec
. I tried to store query strings, but I would still need to know the indexes to maintain the right order of the statements in batch
You can first create a hashset with all needed strings. Then, you can prepare the statements and put them in a hashmap, indexed by statement string. Finally, when you re-create the batch, when you iterate and encounter an unprepared statement/query in the original batch, look up the prepared statement in the hashmap.
scylla/src/transport/connection.rs
Outdated
} | ||
|
||
batch | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, let's see if we can extract all of the added logic here to a separate method/function. It's getting quite long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but to extract whole logic I have to pass values
as an argument and that's not obvious, because I cannot clone/copy it, and it is used by the rest of the function so I moving is not possible. I am not sure if it is better to extract only part of the logic or to not extract it at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to copy or clone values
, you only need to pass it by reference. As I can see, it's only needed for .batch_values_iter()
which takes values
by reference - so you can pass values
by reference to that function.
I imagine that the final method will look like this (I hope I didn't mess up lifetimes):
async fn prepare_batch<'b>(
&self,
batch: &'b Batch,
values_iter: impl BatchValuesIterator<'_>, // or: `values: &impl BatchValues,`
) -> Cow<'b, Batch> {
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few errors and I have to change it a little bit to:
async fn prepare_batch<'b>(
&self,
init_batch: &'b Batch,
mut values_iter: impl BatchValuesIterator<'_>,
) -> Result<Cow<'b, Batch>, QueryError> {
//...
}
but it still does not build, there is a problem with lifetimes in session_test.rs:408
(implementation of std::marker::Send
is not general enough
|
= note: std::marker::Send
would have to be implemented for the type &'0 str
, for any lifetime '0
...
= note: ...but std::marker::Send
is actually implemented for the type &'1 str
, for some specific lifetime '1
).
I am not sure how to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Please try passing &values
instead. For some reason, the BatchValuesIterator
cannot be held across await points, so I suppose there might be issues when trying to pass it to async functions.
3c58014
to
a623af9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some final nitpicks with regards to the implementation. What about tests?
I'd like to see some simple test that sends a batch, queries the table and sees whether the batch modified the table as expected. No need to involve the proxy. I'd like to see several test cases that exercise different paths in the prepare_batch
code:
- Batch only contains prepared statements,
- Batch only contains unprepared statements (queries) without values,
- Batch only contains unprepared statements with values,
- Batch that contains a combination of the above
scylla/src/transport/connection.rs
Outdated
init_batch: &'b Batch, | ||
values: impl BatchValues, | ||
) -> Result<Cow<'b, Batch>, QueryError> { | ||
let mut to_prepare = HashSet::<String>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can probably use &str
as the key type. The strings will be borrowed from the init_batch
object which is alive for the whole duration of this function and we don't modify it, so lifetimes should check out, This will save you an allocation when inserting to the hashset.
scylla/src/transport/connection.rs
Outdated
let mut batch: Cow<Batch>; | ||
|
||
if to_prepare.is_empty() { | ||
batch = Cow::Borrowed(init_batch); | ||
} else { | ||
batch = Cow::Owned(Default::default()); | ||
batch.to_mut().config = init_batch.config.clone(); | ||
for stmt in &init_batch.statements { | ||
match stmt { | ||
BatchStatement::Query(query) => match prepared_queries.get(&query.contents) { | ||
Some(prepared) => batch.to_mut().append_statement(prepared.clone()), | ||
None => batch.to_mut().append_statement(query.clone()), | ||
}, | ||
BatchStatement::PreparedStatement(prepared) => { | ||
batch.to_mut().append_statement(prepared.clone()); | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rust, if
s are expressions and evaluate to a value - you can use that to assign to batch
immediately.
Moreover, in the else
part, I suggest creating a value of type Batch
, initializing it and then assigning it to Cow<Batch>
. This will allow to avoid calling batch.to_mut()
in multiple places.
let batch = if to_prepare.is_empty() {
Cow::Borrowed(init_batch)
} else {
let mut batch = Batch::default();
// ...
Cow::Owned(batch)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, if you implement an early return like I suggested before, here you will be able to drop the if
altogether and only leave the code from the else
branch.
scylla/src/transport/connection.rs
Outdated
} | ||
} | ||
|
||
let mut prepared_queries = HashMap::<String, PreparedStatement>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can probably take a &str
, too.
Before sending an unprepared statement in batch, quietly prepare it on one connection to obtain the information about the bind markers. If the list of values to be bound is empty, we just send it as an unprepared query.
a623af9
to
fad88a1
Compare
fad88a1
to
b8a330e
Compare
Motivation
See the referenced issue (#800)
What's done
Information about the types of the bind markers is necessary to make the serialization API safe (i.e. so that the user data won't be misinterpreted when sending it to the database). The problem with unprepared statements in batch is that the driver doesn't have a good way to determine column names and types of the bind markers.
Now before sending a batch, every unprepared statement is quietly prepared on one connection to obtain the information about the bind markers. If the list of values to be bound to the statement is empty, we just send it as an unprepared query.
Fixes: #800
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.