-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow table from query result much larger than equivalent inserted table #1741
Comments
I think this comes down to these lines: const buffer = await this._bindings.runQuery(this._conn, text);
const reader = arrow.RecordBatchReader.from<T>(buffer); On the JS side, the arrow.RecordBatchReader could potentially be modified to return dictionary-encoded columns. However if the buffer passed is not already dictionary-encoded, then the memory issue may persist due to the buffer being very large. |
Hmm, but if it's a buffer, doesn't that mean that JS here is just instantiating an arrow record batch from IPC and the IPC dictates the schema and so we can't just dictionary encode after the fact? |
Yes that is what i was trying to convey with
i.e., that the fix might need to be on the C++ side |
EDIT I suppose the above statement does not make sense if there is no "parsing" step https://github.com/apache/arrow/blob/ff9921ffa89585be69ae85674bb365d03cb22ba4/js/src/ipc/reader.ts#L357 - then the only option would be on the C++ side |
Yep, that's what I'm thinking as well but I may be wrong. |
What happens?
Arrow tables returned by
conn.query
are much larger than expected due to lack of usage of Dictionary encoding. In the web browser, large (>1 million row) tables that can be inserted into the database successfully cannot be subsequently queried in their entirety due to the memory footprint of the returned Arrow table.To Reproduce
This Observable notebook contains a minimal reproduction: https://observablehq.com/d/56ceaa780133858a
Browser/Environment:
Firefox Developer Edition 127.0b4
Device:
Macbook
DuckDB-Wasm Version:
1.24.0
DuckDB-Wasm Deployment:
Observable Notebook
Full Name:
Mark Keller
Affiliation:
Harvard Medical School
The text was updated successfully, but these errors were encountered: