Interpret Arrow memory across the WebAssembly boundary without serialization.
Arrow is a high-performance memory layout for analytical programs. Since Arrow's memory layout is defined to be the same in every implementation, programs that use Arrow in WebAssembly are using the same exact layout that Arrow JS implements! This means we can use plain ArrayBuffer
s to move highly structured data back and forth to WebAssembly memory, entirely avoiding serialization.
I wrote an interactive blog post that goes into more detail on why this is useful and how this library implements Arrow's C Data Interface in JavaScript.
This package exports two functions, parseField
for parsing the ArrowSchema
struct into an arrow.Field
and parseVector
for parsing the ArrowArray
struct into an arrow.Vector
.
Parse an ArrowSchema
C FFI struct into an arrow.Field
instance. The Field
is necessary for later using parseVector
below.
buffer
(ArrayBuffer
): TheWebAssembly.Memory
instance to read from.ptr
(number
): The numeric pointer inbuffer
where the C struct is located.
const WASM_MEMORY: WebAssembly.Memory = ...
const field = parseField(WASM_MEMORY.buffer, fieldPtr);
Parse an ArrowArray
C FFI struct into an arrow.Vector
instance. Multiple Vector
instances can be joined to make an arrow.Table
.
buffer
(ArrayBuffer
): TheWebAssembly.Memory
instance to read from.ptr
(number
): The numeric pointer inbuffer
where the C struct is located.dataType
(arrow.DataType
): The type of the vector to parse. This is retrieved fromfield.type
on the result ofparseField
.copy
(boolean
): Iftrue
, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. Iffalse
, the resultingarrow.Vector
objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.
const WASM_MEMORY: WebAssembly.Memory = ...
const wasmVector = parseVector(WASM_MEMORY.buffer, arrayPtr, field.type);
// Copy arrays into JS instead of creating views
const wasmVector = parseVector(WASM_MEMORY.buffer, arrayPtr, field.type, true);
Parse an ArrowArray
C FFI struct plus an ArrowSchema
C FFI struct into an arrow.RecordBatch
instance. Note that the underlying array and field must be a Struct
type. In essence a Struct
array is used to mimic a RecordBatch
while only being one array.
buffer
(ArrayBuffer
): TheWebAssembly.Memory
instance to read from.arrayPtr
(number
): The numeric pointer inbuffer
where the array C struct is located.schemaPtr
(number
): The numeric pointer inbuffer
where the field C struct is located.copy
(boolean
): Iftrue
, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. Iffalse
, the resultingarrow.Vector
objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.
const WASM_MEMORY: WebAssembly.Memory = ...
// Pass `true` to copy arrays across the boundary instead of creating views.
const recordBatch = parseRecordBatch(WASM_MEMORY.buffer, arrayPtr, fieldPtr, true);
Most of the unsupported types should be pretty straightforward to implement; they just need some testing.
- Null
- Boolean
- Int8
- Uint8
- Int16
- Uint16
- Int32
- Uint32
- Int64
- Uint64
- Float16
- Float32
- Float64
- Binary
- Large Binary (Not implemented by Arrow JS but supported by downcasting to
Binary
.) - String
- Large String (Not implemented by Arrow JS but supported by downcasting to
String
.) - Fixed-width Binary
- Decimal128 (failing a test)
- Decimal256 (failing a test)
- Date32
- Date64
- Time32
- Time64
- Timestamp (with timezone)
- Duration
- Interval
- List
- Large List (Not implemented by Arrow JS but supported by downcasting to
List
.) - Fixed-size List
- Struct
- Map
- Dense Union
- Sparse Union
- Dictionary-encoded arrays
- Field metadata is preserved.
- Call the release callback on the C structs. This requires figuring out how to call C function pointers from JS.