Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Improve JS documentation on how to read/deserialize arrow data #37856

Open
bluehat974 opened this issue Sep 25, 2023 · 3 comments
Open

[JS] Improve JS documentation on how to read/deserialize arrow data #37856

bluehat974 opened this issue Sep 25, 2023 · 3 comments

Comments

@bluehat974
Copy link

Describe the enhancement requested

cc @domoritz

Current JS documentation is not clear on how to read & manipulate the data from Apache Arrow JS

JS version of Apache Arrow is used in JS environment (DuckDB Wasm, ObservableHQ, Arquero)
and people are asking on how to properly read the data, but there is no clear answer
duckdb/duckdb-wasm#1418

There is some documentation to read arrow data or deserialize to JSON
https://duckdb.org/docs/api/wasm/query.html#arrow-table-to-json
https://observablehq.com/@theneuralbit/using-apache-arrow-js-with-large-datasets

but this examples should be unified to the original Apache Arrow JS documentation
https://github.com/apache/arrow/blob/main/js/README.md

Some ideas of code example to provide to the documentation:

  • Best way to read data without deserialize into JSON version
  • Explain how to take advantage of JS Proxy to read data faster instead of deserialize to JSON
  • If serialization is required, how to do it properly
  • How to convert column to row
  • How to read nested type (STRUCT, MAP, DICTIONNARY...)
  • How to cast arrow type (from DECIMAL to DOUBLE)
  • How to cast arrow type (LONG, DOUBLE, DECIMAL) to desired js type (bigint, number, string...)

Component(s)

Documentation, JavaScript

@bluehat974 bluehat974 changed the title Improve JS documentation on how to read/deserialize arrow data [JS] Improve JS documentation on how to read/deserialize arrow data Sep 25, 2023
@domoritz domoritz self-assigned this Sep 26, 2023
@domoritz domoritz removed their assignment Jan 5, 2024
@kevinschaich
Copy link

100% agree on points mentioned above. I'm also curious if there is built-in Arrow functionality to handle casting to native Javascript types.

My workaround:

import { Table } from 'apache-arrow'
import { mapValues } from 'lodash'

export const arrowTableToRecords = (arrow: Table): Record<string, any>[] => {
    // this does not handle BigInts, can't override prototype because it refers to private symbol
    // const after = arrow.toArray().map((row) => row.toJSON())

    return arrow.toArray().map((obj: object) => {
        return mapValues(obj, (v: any) => {
            if (typeof v === 'bigint') {
                if (v < Number.MIN_SAFE_INTEGER || v > Number.MAX_SAFE_INTEGER) {
                    throw new TypeError(`${v} is not safe to convert to a number.`)
                }
                return Number(v)
            }
            return v
        })
    })
}

LMK if others have a better way to do this.

@domoritz
Copy link
Member

I'm thinking about adding a way to tell arrow that you want data to be returned in more compatible types (e.g. arrays of numbers instead of bigints, numbers instead of decimal objects). It's not there yet but I think toArray is often not generating what people want.

@Fil
Copy link

Fil commented Oct 10, 2024

Today I've tripped on the (non-)handling of nulls in toArray(). See observablehq/plot#2195 (I welcome comments on this PR! not sure it's correct)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants