feat: avoid allocating ADBC inputs and outputs twice #97

cocoa-xu · 2024-06-22T23:09:01Z

This PR should allow us to use query results as inputs (query parameters) without allocating twice.

However, by the design of ADBC, the value of ArrowArray and ArrowSchema will be moved once used as a parameter. And there is no deep copy function for ArrowArray comes with the nanoarrow library.

~~It's possible to write one that does a deep copy of the ArrowArray but I'm not sure if that's what we wanted to do. But if so, I'll be happy to write one and send another PR for it.~~

Actually, a shallow copy + setting release to nullptr seems to be fine, and we can now reuse the results multiple times.

cocoa-xu · 2024-06-23T08:13:16Z

At the moment we have something like

Current version (up til 614e4a4)

{:ok,
  results = %Adbc.Result{
  data: %Adbc.Column{
    data: [#Reference<0.351247108.3006922760.20174>],
    name: "",
    type:
      {:struct,
        [
          %Adbc.Column{
            name: "num",
            type: :s64,
            metadata: nil,
            nullable: true
          }
        ]},
    metadata: nil,
    nullable: true
  },
  num_rows: nil
  }} = Connection.query(conn, "SELECT 123 as num")

Maybe it would be easier to use if the reference data is associated with the actual column:

Result with a single column (todo)

{:ok,
  results = %Adbc.Result{
  data: %Adbc.Column{
    data: [
      %Adbc.Column{
        name: "num",
        type: :s64,
        metadata: nil,
        nullable: true,
        data: #Reference<0.351247108.3006922760.20174>
      }
    ],
    name: "",
    type: :struct,
    metadata: nil,
    nullable: true
  },
  num_rows: nil
  }} = Connection.query(conn, "SELECT 123 as num")

Result with multiple columns (todo)

{:ok,
  results = %Adbc.Result{
  data: %Adbc.Column{
    data: [
      %Adbc.Column{
        name: "num",
        type: :s64,
        metadata: nil,
        nullable: true,
        data: #Reference<0.351247108.3006922760.20174>
      },
      %Adbc.Column{
        name: "fp",
        type: :f64,
        metadata: nil,
        nullable: true,
        data: #Reference<0.351247108.3006922760.20175>
      }
    ],
    name: "",
    type: :struct,
    metadata: nil,
    nullable: true
  },
  num_rows: nil
  }} = Connection.query(conn, "SELECT 123 as num, 456.78 as fp")

This probably looks better. WDYT? @josevalim

josevalim · 2024-06-23T09:55:49Z

@cocoa-xu what does ADBC return? Several columns or a struct with multiple entries?

lib/adbc_column.ex

lib/adbc_connection.ex

cocoa-xu · 2024-06-23T10:10:04Z

@cocoa-xu what does ADBC return? Several columns or a struct with multiple entries?

It's the latter. The results will always be a struct at the top-level, and it contains all the columns.

Co-authored-by: José Valim <[email protected]>

cocoa-xu · 2024-06-23T16:18:17Z

Hrmmm, msvc doesn't like some code in arrow-adbc when compiling in debug mode

D:\a\adbc\adbc\3rd_party\apache-arrow-adbc\c\driver\framework\base_driver.cc(69) : error C2220: the following warning is treated as an error
D:\a\adbc\adbc\3rd_party\apache-arrow-adbc\c\driver\framework\base_driver.cc(57) : error C2220: the following warning is treated as an error
D:\a\adbc\adbc\3rd_party\apache-arrow-adbc\c\driver\framework\base_driver.cc(57) : error C2220: the following warning is treated as an error
D:\a\adbc\adbc\3rd_party\apache-arrow-adbc\c\driver\framework\base_driver.cc(57) : warning C4702: unreachable code
D:\a\adbc\adbc\3rd_party\apache-arrow-adbc\c\driver\framework\base_driver.cc(57) : warning C4702: unreachable code
D:\a\adbc\adbc\3rd_party\apache-arrow-adbc\c\driver\framework\base_driver.cc(69) : warning C4702: unreachable code
NMAKE : fatal error U1077: '"C:\Program Files\CMake\bin\cmake.exe" -E cmake_cl_compile_depends --dep-file=CMakeFiles\adbc_driver_framework.dir\base_driver.cc.obj.d --working-dir=D:\a\adbc\adbc\_build\test\lib\adbc\cmake_adbc\driver\framework --filter-prefix="Note: including file: " -- C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1440~1.338\bin\Hostx64\x64\cl.exe @C:\Users\RUNNER~1\AppData\Local\Temp\nm9629.tmp' : return code '0x2'
Stop.

lib/adbc_connection.ex

josevalim

This is great! One comment to migrate to a list of refs and we can ship it!!

Co-authored-by: José Valim <[email protected]>

cocoa-xu · 2024-06-23T18:20:04Z

CI all green and looks great, I'll go ahead and merge it!

cocoa-xu added 12 commits June 22, 2024 12:38

arrow_metadata_to_nif_term

3418111

res->private_data should only alloc once

9853ebc

check if stream ends in adbc_arrow_array_stream_next

d7c1a7b

fix get_arrow_array_map_children

b7a31ac

fix: use strcmp/strncmp properly

1fe4fd0

added a helper function

9b828d2

return reference(s) in Adbc.Column.data

b4d645f

implemented Adbc.{Result,Column}.materialize/1

18a49be

updated existing test cases

c660668

added support for using data with a single reference inside

7463a45

added rwlock

4243341

updated new test case

614e4a4

cocoa-xu requested a review from josevalim June 22, 2024 23:09

a shallow copy + setting release to nullptr seems to be fine

4583350

josevalim reviewed Jun 23, 2024

View reviewed changes

lib/adbc_column.ex Outdated Show resolved Hide resolved

josevalim reviewed Jun 23, 2024

View reviewed changes

lib/adbc_connection.ex Outdated Show resolved Hide resolved

cocoa-xu and others added 9 commits June 23, 2024 11:11

Update lib/adbc_connection.ex

318be00

Co-authored-by: José Valim <[email protected]>

flatten top-level columns in result

409017c

updated test cases

23212f4

minor changes to helper functions

7cdb887

fix materialize/1

740086c

fix segfault

94ffcae

removed unused var

85facc1

updated make_env

7032f88

removed make_env

ee1a56e

josevalim reviewed Jun 23, 2024

View reviewed changes

lib/adbc_connection.ex Outdated Show resolved Hide resolved

josevalim approved these changes Jun 23, 2024

View reviewed changes

Update lib/adbc_connection.ex

f1edabe

Co-authored-by: José Valim <[email protected]>

cocoa-xu merged commit 10f74c8 into main Jun 23, 2024
3 checks passed

cocoa-xu deleted the cx-data-ref branch June 23, 2024 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: avoid allocating ADBC inputs and outputs twice #97

feat: avoid allocating ADBC inputs and outputs twice #97

cocoa-xu commented Jun 22, 2024 •

edited

Loading

cocoa-xu commented Jun 23, 2024 •

edited

Loading

josevalim commented Jun 23, 2024

cocoa-xu commented Jun 23, 2024

cocoa-xu commented Jun 23, 2024

josevalim left a comment

cocoa-xu commented Jun 23, 2024

feat: avoid allocating ADBC inputs and outputs twice #97

feat: avoid allocating ADBC inputs and outputs twice #97

Conversation

cocoa-xu commented Jun 22, 2024 • edited Loading

cocoa-xu commented Jun 23, 2024 • edited Loading

Current version (up til 614e4a4)

Result with a single column (todo)

Result with multiple columns (todo)

josevalim commented Jun 23, 2024

cocoa-xu commented Jun 23, 2024

cocoa-xu commented Jun 23, 2024

josevalim left a comment

Choose a reason for hiding this comment

cocoa-xu commented Jun 23, 2024

cocoa-xu commented Jun 22, 2024 •

edited

Loading

cocoa-xu commented Jun 23, 2024 •

edited

Loading