-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgresql Copy Mode Support #26
Comments
I think this would be a great addition, but it requires some though. I'll think out loud now, but I would also like to hear any further though you've had about the API. First, I would prefer if Caqti do the encoding or decoding of data sent or received with corresponding type information included in the request type. Apart from type safety and convenience, this offers more abstraction from the RDBMS, which is one of the goals of Caqti. This means we can't just attach auxiliary input/output streaming to existing response-processing functions. From my reading of the manual the most general solution would allow parameters, result rows, and either input or output stream in the same request. This suggests we would need one or two more parameters for val copy_in :
?env: (Caqti_driver_info.t -> string -> query) ->
?oneshot: bool ->
'a Caqti_type.t -> 'b Caqti_type.t ->
string -> ('a, 'b, [> `Copy_in]) t
val copy_out :
?env: (Caqti_driver_info.t -> string -> query) ->
?oneshot: bool ->
'a Caqti_type.t -> 'b Caqti_type.t ->
string -> ('a, 'b, [> `Copy_out]) t However, I think we could unify the latter with regular response processing. This would allow falling back to But there is another issue: there are various escape mechanisms indicated by the query string, which would be better chosen by the driver. So, maybe the user should just supply the table and column names instead of the full query. Then we can skip the potentially fragile inspection of the query string to determine the mode, and we could probably also simulate copy-in with So, a revision of the above might look like: val copy_in :
?env: (Caqti_driver_info.t -> string -> query) ->
?oneshot: bool ->
'a Caqti_type.t -> 'b Caqti_type.t ->
table: string -> ?columns: string list -> unit -> ('a, 'b, [> `Many_in]) t
val copy_out :
?env: (Caqti_driver_info.t -> string -> query) ->
?oneshot: bool ->
'a Caqti_type.t -> 'b Caqti_type.t ->
table: string -> ?columns: string list -> unit -> ('a, 'b, [> `Many]) t I am open to a copy-in only implementation, but I think it's useful to discuss both directions to see how it will fit into a full solution. Before we start coding anything: The current function supported by the postgresql bindings are obsolete, so we may want to contribute support for |
I proposed the addition of the new COPY-handing functions in mmottl/postgresql-ocaml#31. |
Firstly, thanks for handling the updates to I initially envisaged only adding copy-in functionality, but I agree working out an API that suits both is a good way to do this. Some points on the above:
I couldn't agree more - this is definitely the right approach.
This probably needs broadening slightly as it is possible to copy out from the result of an arbitrary query, but that is not a huge issue here, as you could change val copy_out :
?env: (Caqti_driver_info.t -> string -> query) ->
?oneshot: bool ->
'a Caqti_type.t -> 'b Caqti_type.t ->
?table: string -> ?columns: string list -> ?query: string -> unit -> ('a, 'b, [> `Many]) t and then validate that either a query or a table name and list of columns was passed in. More generally, a quick question on naming: is it worth changing the naming of the functions to something like Other than that, this looks good to me. |
Do you mean that one can use The name changes are okay. I also considered |
Fortunately not - I just mean that the docs say that you can do
|
You are right. But the variant with a general query only applies to There is one think which has been bugging me about my proposed |
About my last point on how to type in input stream, we could do something like: type counit
(** The empty type. *)
type ('a, 'b, 'c, +'m) t4 constraint 'm = [< `Zero | `One | `Many]
(** The full request type, generalizing {!t} with the type parameter ['b], being
the type of a row of an input stream used to populate a database table. *)
type ('a, 'c, 'm) t = ('a, counit, 'c, 'm) t4
(** ... *) |
That makes sense, but could we just reuse the existing multiplicity type, as it seems like this would duplicate the behaviour? |
I am not sure if I understand your question or suggestion, but .the |
Sorry ignore the last comment, I completely misread your suggestion. Just to check I understand everything, what you are actually proposing is to change the type of That sounds sensible to me - and I guess for the initial version of the copy in function, As a side question, would you be open to renaming the variables on the type to make this a bit clearer at-a-glance in function signatures etc? I am going to attempt a first implementation of this over the next few days, as I think the design is now at a point where it should be possible to get something working. I'll focus on the copy-in behaviour on postgres first, as that should require all the new interfaces to be put in place. |
I think it is good in the long run to rename |
Sorry for more confusion here - I actually mean making the type parameters more explicit, so the above becomes something like: type counit
(** The empty type. *)
type ('params, 'input_row_type, 'output_row_type, +'multiplicity) t4 constraint 'm = [< `Zero | `One | `Many]
(** The full request type, generalizing {!t} with the type parameter ['b], being
the type of a row of an input stream used to populate a database table. *)
type ('params, 'row_type, 'multiplicity) t = ('params, counit, 'row_type, 'multiplicity) t4
(** ... *) maybe something a bit less verbose than this, but you get the idea. |
Yes, that may be a good idea, esp. in the type definition. |
I have put up a (very rough) first attempt at this for comments in #27, but I have run into quite a few issues when attempting to implement - I am happy to move the discussion there if it is easier. |
As a slight improvement, I also changed I am planning to do a release soon, so this is a good time for any final comments before fixing the |
Thanks for the heads up about this. I have a half-baked version of actually using copy mode for On the For the |
Yes, we could include The reason we didn't include a convenience version of |
That all sounds sensible to me - can I just check I have understood something correctly?
I don't quite understand why this is an issue. Is it that a convenience version would end up returning a stream that could be used later in the code, which would present issues because we would no longer be able to control exactly when the query that generates the stream is executed - whereas with the existing convenience functions like If this is the issue, then could a potential convenience function just eagerly evaluate the query and then pass around the resulting objects? |
Your assessment is right. Even if the underlying libraries supported detaching the result retrieval, I think it would be wrong to rely on GC to release the external resources. Yes, we could copy the result, but then the user might as well use |
Cool - thanks for the clarification 👍 |
… the postgresql driver
… the postgresql driver
… the postgresql driver
… the postgresql driver
… the postgresql driver
… the postgresql driver
… the postgresql driver
… the postgresql driver
… the postgresql driver
Further to #24, I am now trying to work out the best way to insert large amounts of data into a
postgres
database quickly. Based on the documentation here, the best way to do this is to make use of theCOPY
command to stream in the data.I have run some experiments, and the use of
COPY
speeds up data insertion by roughly 10x in the case of my data, so I definitely want to make use of it (although I guess it is not always this much of a speed up).Unfortunately,
COPY
is apostgres
-specific extension (which is supported byocaml-postgres
but unused bycaqti
). It seems there is an equivalent formariadb
, but I haven't been able to find a convincing equivalent forsqlite
.This leads to a couple of questions:
caqti
should be supporting, or should it live in application code?My opinion is that this would be a nice addition to
caqti
. I think that a good equivalent here is transaction support, which seems to trigger different connection code based on thecan_transact
parameter in thedriver_info
for a given driver - maybe acan_copy
parameter would allow the drivers to declare copy methods, and would raise an error if you tried to use them with a driver that did not support it? This would be combined with some new helper functions forCONNECTION
objects.I am happy to make a PR with and attempt at changes if you think this is worth pursuing.
The text was updated successfully, but these errors were encountered: