Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support bind named parameters via Adbc.Buffer #68

Merged
merged 7 commits into from
May 6, 2024

Conversation

cocoa-xu
Copy link
Member

@cocoa-xu cocoa-xu commented May 3, 2024

Hi @josevalim, this PR is currently a WIP and a draft for issue #66.

If I understand your suggestions in #66 correctly, we're expecting users to do things like

Adbc.Connection.query(conn, "INSERT INTO table(column_name) VALUES(?)", [Adbc.Buffer.i8([1])])
  1. in adbc_buffer_to_arrow_type_struct (we can rename this later), we will iterate the parameter list, [Adbc.Buffer.<type>...],
  2. each of them corresponds to a field/column and will be parsed in adbc_buffer_to_adbc_field, and a corresponding ArrowArray and ArrowSchema will be built.
  3. the parsed fields/columns will be assembled in adbc_buffer_to_arrow_type_struct and the result will be passed to AdbcStatementBind

Although I haven't quite decided how we should represent more complex data types like nested structs and lists using Adbc.Buffer yet.

@cocoa-xu
Copy link
Member Author

cocoa-xu commented May 3, 2024

For complex data types it can be something like below

[
    Adbc.Buffer.i8([1,2,3]),
    Adbc.Buffer.list(
        Adbc.Buffer.i8([1,2,3])
    ),
    Adbc.Buffer.list(
        Adbc.Buffer.list(
            Adbc.Buffer.i8([1,2,3])
        )
    ),
    Adbc.Buffer.struct([
        Adbc.Buffer.i8([1,2,3], name: "col1"),
        Adbc.Buffer.i8([4,5,6], name: "col2")
    ]),
    Adbc.Buffer.struct([
        Adbc.Buffer.struct([
            Adbc.Buffer.i8([1,2,3], name: "field1_1"),
            Adbc.Buffer.i8([4,5,6], name: "field1_2")
        ], name: "field1"),
        Adbc.Buffer.struct([
            Adbc.Buffer.i8([7,8,9], name: "field2_1"),
            Adbc.Buffer.i8([10,11,12], name: "field2_2")
        ], name: "field2")
    ])
]

But I'm not sure if it's good or okay-ish design. WDYT?

@cocoa-xu cocoa-xu marked this pull request as ready for review May 3, 2024 18:59
@cocoa-xu
Copy link
Member Author

cocoa-xu commented May 3, 2024

All primitive data types are supported now. The left ones are

  • NANOARROW_TYPE_HALF_FLOAT
  • NANOARROW_TYPE_DATE32
  • NANOARROW_TYPE_DATE64
  • NANOARROW_TYPE_TIMESTAMP
  • NANOARROW_TYPE_TIME32
  • NANOARROW_TYPE_TIME64
  • NANOARROW_TYPE_INTERVAL_MONTHS
  • NANOARROW_TYPE_INTERVAL_DAY_TIME
  • NANOARROW_TYPE_DECIMAL128
  • NANOARROW_TYPE_DECIMAL256
  • NANOARROW_TYPE_LIST
  • NANOARROW_TYPE_STRUCT
  • NANOARROW_TYPE_SPARSE_UNION
  • NANOARROW_TYPE_DENSE_UNION
  • NANOARROW_TYPE_DICTIONARY
  • NANOARROW_TYPE_MAP
  • NANOARROW_TYPE_EXTENSION
  • NANOARROW_TYPE_FIXED_SIZE_LIST
  • NANOARROW_TYPE_DURATION
  • NANOARROW_TYPE_LARGE_LIST
  • NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. My only question is if in some cases it is better to keep the buffer as a binary, instead of a list, but I think that's just impossible to certain data types, such as strings, so a list sounds good to me.

Also, a follow up question is, should we change Adbc.Connection.query to return a map of buffers (instead of a map of lists)? So this way we have more information available (such as the metadata of each buffer)? In any case, if we want to do this, let's do it in a separate pull request.

@cocoa-xu
Copy link
Member Author

cocoa-xu commented May 6, 2024

My only question is if in some cases it is better to keep the buffer as a binary, instead of a list, but I think that's just impossible to certain data types, such as strings, so a list sounds good to me.

Actually I think we probably can keep the buffer as a binary for some types (i.e., all integer types and f32, f64), it should be more efficient when passing to other libraries.

should we change Adbc.Connection.query to return a map of buffers (instead of a map of lists)?

Return a map of Adbc.Buffer seems to be better I think. We can perhaps provide a helper function that converts a buffer to a list if the user wishes to process the returned data without other libraries.

So this way we have more information available (such as the metadata of each buffer)? In any case, if we want to do this, let's do it in a separate pull request.

Absolutely, let's do this in another PR.

@cocoa-xu cocoa-xu merged commit 8bcf492 into main May 6, 2024
3 checks passed
@cocoa-xu cocoa-xu deleted the cx/support-bind-maps branch May 6, 2024 13:03
@cocoa-xu
Copy link
Member Author

cocoa-xu commented May 6, 2024

My only question is if in some cases it is better to keep the buffer as a binary, instead of a list, but I think that's just impossible to certain data types, such as strings, so a list sounds good to me.

Actually I think we probably can keep the buffer as a binary for some types (i.e., all integer types and f32, f64), it should be more efficient when passing to other libraries.

Oh wait, if there is a nil somewhere in the list then we can't simply represent the data of that list with a single binary, we'd basically do the same thing as arrow - using a bitmap (or similar thing) to indicate if an element at that index is nil or not...

@josevalim
Copy link
Member

Oh wait, if there is a nil somewhere in the list then we can't simply represent the data of that list with a single binary, we'd basically do the same thing as arrow - using a bitmap (or similar thing) to indicate if an element at that index is nil or not...

Yes, exactly. Given they are lists, I think we should probably have not named Adbc.Buffer. What do you think about:

  1. Renaming Adbc.Buffer to Adbc.Column
  2. Change Arrow.ResultSet to return a list of Adbc.Columns (so we preserve the returned order)
  3. Add Adbc.ResultSet.to_map that converts a list of columns into a map of lists

Could you please send a PR?

@cocoa-xu
Copy link
Member Author

cocoa-xu commented May 6, 2024

Oh wait, if there is a nil somewhere in the list then we can't simply represent the data of that list with a single binary, we'd basically do the same thing as arrow - using a bitmap (or similar thing) to indicate if an element at that index is nil or not...

Yes, exactly. Given they are lists, I think we should probably have not named Adbc.Buffer. What do you think about:

  1. Renaming Adbc.Buffer to Adbc.Column
  2. Change Arrow.ResultSet to return a list of Adbc.Columns (so we preserve the returned order)
  3. Add Adbc.ResultSet.to_map that converts a list of columns into a map of lists

Could you please send a PR?

No problem! I'll send a PR for this ;)

@cocoa-xu cocoa-xu mentioned this pull request May 8, 2024
@cocoa-xu
Copy link
Member Author

Updated the list for the types left to be implemented:

  • NANOARROW_TYPE_HALF_FLOAT
  • NANOARROW_TYPE_INTERVAL_MONTHS
  • NANOARROW_TYPE_INTERVAL_DAY_TIME
  • NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO
  • NANOARROW_TYPE_LIST
  • NANOARROW_TYPE_FIXED_SIZE_LIST
  • NANOARROW_TYPE_LARGE_LIST
  • NANOARROW_TYPE_STRUCT
  • NANOARROW_TYPE_SPARSE_UNION
  • NANOARROW_TYPE_DENSE_UNION
  • NANOARROW_TYPE_DICTIONARY
  • NANOARROW_TYPE_MAP
  • NANOARROW_TYPE_EXTENSION

@cocoa-xu
Copy link
Member Author

cocoa-xu commented Jun 18, 2024

So now I've done most of these, the only things left to do is

  • encode/decode dictionary
  • encode sparse/dense unions

As for map, it's a logical type, which is essentially a list of structs, and we can already encode/decode that.

For NANOARROW_TYPE_EXTENSION, it stands for user defined type so perhaps we can skip it for now.

I'll do encode/decode dictionary today, and if there's enough time I'll also add support for encoding sparse/dense unions. And after that, all official types will be available for encoding/decoding in this library.

/cc @josevalim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants