parquet: support predicate pushdown for partial reads / exclusions #47231

derekperkins · 2024-06-19T16:38:59Z

Feature request

Is your feature request related to a problem? Please describe.

When accessing a remote parquet file using FILES, the entire file is fetched across the network before executing. This can result in waiting for hundreds of megabytes to download, then seeing an error like #37169 where the encoding isn't supported

Error when reading parquet file using DELTA_BINARY_PACKED encoding (IOError: Not yet implemented: Unsupported encoding.) #37169

As of StarRocks v3.3.0-rc02, an unsupported encoding in a parquet file, even if it isn't referenced by the query, makes the entire file unqueryable. Only the specific columns in the SELECT should be fetched, which saves both network, and should make it so that StarRocks can read columns even if others aren't supported.

This is listed in the 2024 roadmap, but I couldn't find a tracking issue for it

StarRocks Roadmap 2024 #39686

Describe the solution you'd like

Support parquet predicate pushdown, so that only specific metadata and/or columns are read.

By looking at the metadata, unsupported encodings could throw an error without reading the entire file
By utilizing object store range reads, only fetch the column data requested by the query, rather than the whole file

Describe alternatives you've considered

DuckDB, Clickhouse, etc

Additional context

The text was updated successfully, but these errors were encountered:

derekperkins added the type/feature-request label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parquet: support predicate pushdown for partial reads / exclusions #47231

parquet: support predicate pushdown for partial reads / exclusions #47231

derekperkins commented Jun 19, 2024 •

edited

Loading

parquet: support predicate pushdown for partial reads / exclusions #47231

parquet: support predicate pushdown for partial reads / exclusions #47231

Comments

derekperkins commented Jun 19, 2024 • edited Loading

Feature request

derekperkins commented Jun 19, 2024 •

edited

Loading