Skip to content

Asynchronous HTTP reads #1723

Open
Open
@ravwojdyla

Description

@ravwojdyla

#381 has some great observations on the sequential nature of the HTTP range requests. We have a largish parquet file ~500MB, with 34 row-groups. It takes less time to download the whole file, than to perform a sequential reads on a small subset of columns. Granted less data is being downloaded in case of range reads. I could not find a dedicated open issue for asynchronous read.

... we're (not yet) fully async during I/O. This has a few far reaching requirements for the query execution model that haven't been been tackled yet.
Right now, we're always sitting in a C++ callstack when doing I/O which restricts us to single blocking http reads (via XHR).
Threads would offer an escape hatch here but they're immediately bringing up the problems with SharedArrayBuffers and cross-origin-isolation.
I'd love to implement the web filesystem using multiple concurrent fetches but that's not quite possible today.

Originally posted by @ankoh in #381 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions