feat: support bind named parameters via `Adbc.Buffer` #68

cocoa-xu · 2024-05-03T09:55:56Z

Hi @josevalim, this PR is currently a WIP and a draft for issue #66.

If I understand your suggestions in #66 correctly, we're expecting users to do things like

Adbc.Connection.query(conn, "INSERT INTO table(column_name) VALUES(?)", [Adbc.Buffer.i8([1])])

in adbc_buffer_to_arrow_type_struct (we can rename this later), we will iterate the parameter list, [Adbc.Buffer.<type>...],
each of them corresponds to a field/column and will be parsed in adbc_buffer_to_adbc_field, and a corresponding ArrowArray and ArrowSchema will be built.
the parsed fields/columns will be assembled in adbc_buffer_to_arrow_type_struct and the result will be passed to AdbcStatementBind

Although I haven't quite decided how we should represent more complex data types like nested structs and lists using Adbc.Buffer yet.

cocoa-xu · 2024-05-03T10:22:04Z

For complex data types it can be something like below

[
    Adbc.Buffer.i8([1,2,3]),
    Adbc.Buffer.list(
        Adbc.Buffer.i8([1,2,3])
    ),
    Adbc.Buffer.list(
        Adbc.Buffer.list(
            Adbc.Buffer.i8([1,2,3])
        )
    ),
    Adbc.Buffer.struct([
        Adbc.Buffer.i8([1,2,3], name: "col1"),
        Adbc.Buffer.i8([4,5,6], name: "col2")
    ]),
    Adbc.Buffer.struct([
        Adbc.Buffer.struct([
            Adbc.Buffer.i8([1,2,3], name: "field1_1"),
            Adbc.Buffer.i8([4,5,6], name: "field1_2")
        ], name: "field1"),
        Adbc.Buffer.struct([
            Adbc.Buffer.i8([7,8,9], name: "field2_1"),
            Adbc.Buffer.i8([10,11,12], name: "field2_2")
        ], name: "field2")
    ])
]

But I'm not sure if it's good or okay-ish design. WDYT?

cocoa-xu · 2024-05-03T19:18:34Z

All primitive data types are supported now. The left ones are

NANOARROW_TYPE_HALF_FLOAT
NANOARROW_TYPE_DATE32
NANOARROW_TYPE_DATE64
NANOARROW_TYPE_TIMESTAMP
NANOARROW_TYPE_TIME32
NANOARROW_TYPE_TIME64
NANOARROW_TYPE_INTERVAL_MONTHS
NANOARROW_TYPE_INTERVAL_DAY_TIME
NANOARROW_TYPE_DECIMAL128
NANOARROW_TYPE_DECIMAL256
NANOARROW_TYPE_LIST
NANOARROW_TYPE_STRUCT
NANOARROW_TYPE_SPARSE_UNION
NANOARROW_TYPE_DENSE_UNION
NANOARROW_TYPE_DICTIONARY
NANOARROW_TYPE_MAP
NANOARROW_TYPE_EXTENSION
NANOARROW_TYPE_FIXED_SIZE_LIST
NANOARROW_TYPE_DURATION
NANOARROW_TYPE_LARGE_LIST
NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO

josevalim

This looks great to me. My only question is if in some cases it is better to keep the buffer as a binary, instead of a list, but I think that's just impossible to certain data types, such as strings, so a list sounds good to me.

Also, a follow up question is, should we change Adbc.Connection.query to return a map of buffers (instead of a map of lists)? So this way we have more information available (such as the metadata of each buffer)? In any case, if we want to do this, let's do it in a separate pull request.

cocoa-xu · 2024-05-06T13:03:02Z

My only question is if in some cases it is better to keep the buffer as a binary, instead of a list, but I think that's just impossible to certain data types, such as strings, so a list sounds good to me.

Actually I think we probably can keep the buffer as a binary for some types (i.e., all integer types and f32, f64), it should be more efficient when passing to other libraries.

should we change Adbc.Connection.query to return a map of buffers (instead of a map of lists)?

Return a map of Adbc.Buffer seems to be better I think. We can perhaps provide a helper function that converts a buffer to a list if the user wishes to process the returned data without other libraries.

So this way we have more information available (such as the metadata of each buffer)? In any case, if we want to do this, let's do it in a separate pull request.

Absolutely, let's do this in another PR.

cocoa-xu · 2024-05-06T14:05:39Z

My only question is if in some cases it is better to keep the buffer as a binary, instead of a list, but I think that's just impossible to certain data types, such as strings, so a list sounds good to me.

Actually I think we probably can keep the buffer as a binary for some types (i.e., all integer types and f32, f64), it should be more efficient when passing to other libraries.

Oh wait, if there is a nil somewhere in the list then we can't simply represent the data of that list with a single binary, we'd basically do the same thing as arrow - using a bitmap (or similar thing) to indicate if an element at that index is nil or not...

josevalim · 2024-05-06T18:25:51Z

Oh wait, if there is a nil somewhere in the list then we can't simply represent the data of that list with a single binary, we'd basically do the same thing as arrow - using a bitmap (or similar thing) to indicate if an element at that index is nil or not...

Yes, exactly. Given they are lists, I think we should probably have not named Adbc.Buffer. What do you think about:

Renaming Adbc.Buffer to Adbc.Column
Change Arrow.ResultSet to return a list of Adbc.Columns (so we preserve the returned order)
Add Adbc.ResultSet.to_map that converts a list of columns into a map of lists

Could you please send a PR?

cocoa-xu · 2024-05-06T18:31:46Z

Oh wait, if there is a nil somewhere in the list then we can't simply represent the data of that list with a single binary, we'd basically do the same thing as arrow - using a bitmap (or similar thing) to indicate if an element at that index is nil or not...

Yes, exactly. Given they are lists, I think we should probably have not named Adbc.Buffer. What do you think about:

Renaming Adbc.Buffer to Adbc.Column

Change Arrow.ResultSet to return a list of Adbc.Columns (so we preserve the returned order)

Add Adbc.ResultSet.to_map that converts a list of columns into a map of lists

Could you please send a PR?

No problem! I'll send a PR for this ;)

cocoa-xu · 2024-05-17T16:32:24Z

Updated the list for the types left to be implemented:

NANOARROW_TYPE_HALF_FLOAT
NANOARROW_TYPE_INTERVAL_MONTHS
NANOARROW_TYPE_INTERVAL_DAY_TIME
NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO
NANOARROW_TYPE_LIST
NANOARROW_TYPE_FIXED_SIZE_LIST
NANOARROW_TYPE_LARGE_LIST
NANOARROW_TYPE_STRUCT
NANOARROW_TYPE_SPARSE_UNION
NANOARROW_TYPE_DENSE_UNION
NANOARROW_TYPE_DICTIONARY
NANOARROW_TYPE_MAP
NANOARROW_TYPE_EXTENSION

cocoa-xu · 2024-06-18T12:40:27Z

So now I've done most of these, the only things left to do is

encode/decode dictionary
encode sparse/dense unions

As for map, it's a logical type, which is essentially a list of structs, and we can already encode/decode that.

For NANOARROW_TYPE_EXTENSION, it stands for user defined type so perhaps we can skip it for now.

I'll do encode/decode dictionary today, and if there's enough time I'll also add support for encoding sparse/dense unions. And after that, all official types will be available for encoding/decoding in this library.

/cc @josevalim

cocoa-xu added 2 commits May 3, 2024 17:37

WIP

cf7a7a1

WIP

511999e

done Adbc.Buffer

bb8545d

cocoa-xu marked this pull request as ready for review May 3, 2024 18:59

added Adbc.Buffer.fixed_size_binary

e346d25

cocoa-xu added 3 commits May 4, 2024 04:00

added support for metadata

423a442

minor fix

c11e912

fix spec for Adbc.Buffer.boolean/2

10db689

josevalim approved these changes May 5, 2024

View reviewed changes

cocoa-xu merged commit 8bcf492 into main May 6, 2024
3 checks passed

cocoa-xu deleted the cx/support-bind-maps branch May 6, 2024 13:03

cocoa-xu mentioned this pull request May 8, 2024

return Adbc.Column #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support bind named parameters via `Adbc.Buffer` #68

feat: support bind named parameters via `Adbc.Buffer` #68

cocoa-xu commented May 3, 2024

cocoa-xu commented May 3, 2024

cocoa-xu commented May 3, 2024

josevalim left a comment

cocoa-xu commented May 6, 2024

cocoa-xu commented May 6, 2024

josevalim commented May 6, 2024

cocoa-xu commented May 6, 2024

cocoa-xu commented May 17, 2024

cocoa-xu commented Jun 18, 2024 •

edited

Loading

feat: support bind named parameters via Adbc.Buffer #68

feat: support bind named parameters via Adbc.Buffer #68

Conversation

cocoa-xu commented May 3, 2024

cocoa-xu commented May 3, 2024

cocoa-xu commented May 3, 2024

josevalim left a comment

Choose a reason for hiding this comment

cocoa-xu commented May 6, 2024

cocoa-xu commented May 6, 2024

josevalim commented May 6, 2024

cocoa-xu commented May 6, 2024

cocoa-xu commented May 17, 2024

cocoa-xu commented Jun 18, 2024 • edited Loading

feat: support bind named parameters via `Adbc.Buffer` #68

feat: support bind named parameters via `Adbc.Buffer` #68

cocoa-xu commented Jun 18, 2024 •

edited

Loading