Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy Arrow JS objects directly into Rust without IPC #41

Merged
merged 4 commits into from
Dec 16, 2023

Conversation

kylebarron
Copy link
Owner

@kylebarron kylebarron commented Dec 16, 2023

Change list

  • Implement copying from Arrow JS DataType, Field, Schema, Data, RecordBatch and Table into their Rust equivalents, without going through an intermediate representation like IPC.
  • This differs from arrow-js-ffi and the prototype of having JS write into wasm memory because that had JS drive the transfer, whereas here instead of JS pushing it's Rust pulling. This means that I don't have to manage any raw pointers directly; I just have to rely on JS APIs to copy buffers back and forth.
  • This also holds a good deal of copying Arrow JS property definitions type codes and enums, but that's inescapable to allow Rust to compile against it.

Todo:

  • This is expected to support all data types except dictionary-encoded arrays, which is left for future work

With this, we should be able to make functions in wasm-bindgen that can transparently accept either JS-owned or Rust-owned data.

Example:

use arrow::array::make_array;
use wasm_bindgen::prelude::*;

use crate::arrow1::arrow_js::data::{import_data, JSData};
use crate::arrow1::arrow_js::r#type::{import_data_type, JSDataType};
use crate::arrow1::arrow_js::record_batch::{import_record_batch, JSRecordBatch};
use crate::arrow1::arrow_js::table::{import_table, JSTable};

macro_rules! log {
    ( $( $t:tt )* ) => {
        web_sys::console::log_1(&format!( $( $t )* ).into());
    }
}

#[wasm_bindgen]
pub fn read_type(input: &JSDataType) {
    let data_type = import_data_type(input);
    log!("{:?}", data_type);
}

#[wasm_bindgen]
pub fn read_table(input: &JSTable) {
    let table = import_table(input);
    log!("{:?}", table);
}

#[wasm_bindgen]
pub fn read_record_batch(input: &JSRecordBatch) {
    let table = import_record_batch(input);
    log!("{:?}", table);
}

#[wasm_bindgen]
pub fn read_array(input: &JSData) {
    let data = import_data(input);
    let arr = make_array(data);
    log!("{:?}", arr);
}
> var arrow = require('apache-arrow')
> var wasm = require('./pkg/node/arrow1')
> table = arrow.tableFromArrays({'a': [1, 2, 3, 4]})
Table {
  schema: Schema {
    fields: [ [Field] ],
    metadata: Map(0) {},
    dictionaries: Map(0) {},
    metadataVersion: 4
  },
  batches: [ RecordBatch { schema: [Schema], data: [Data] } ],
  _offsets: Uint32Array(2) [ 0, 4 ]
}
> wasm.read_table(table)
Table([RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Float64>
[
  1.0,
  2.0,
  3.0,
  4.0,
]], row_count: 4 }])
undefined

Closes #40

@kylebarron
Copy link
Owner Author

cc @domoritz; you may like to know this exists!

@kylebarron kylebarron merged commit a088cce into main Dec 16, 2023
1 check passed
@kylebarron kylebarron deleted the kyle/read-arrow-js branch December 16, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copy Arrow JS Data into Wasm directly using duck-typed APIs
1 participant