You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When accessing a remote parquet file using FILES, the entire file is fetched across the network before executing. This can result in waiting for hundreds of megabytes to download, then seeing an error like #37169 where the encoding isn't supported
As of StarRocks v3.3.0-rc02, an unsupported encoding in a parquet file, even if it isn't referenced by the query, makes the entire file unqueryable. Only the specific columns in the SELECT should be fetched, which saves both network, and should make it so that StarRocks can read columns even if others aren't supported.
This is listed in the 2024 roadmap, but I couldn't find a tracking issue for it
Feature request
Is your feature request related to a problem? Please describe.
When accessing a remote parquet file using
FILES
, the entire file is fetched across the network before executing. This can result in waiting for hundreds of megabytes to download, then seeing an error like #37169 where the encoding isn't supportedAs of StarRocks v3.3.0-rc02, an unsupported encoding in a parquet file, even if it isn't referenced by the query, makes the entire file unqueryable. Only the specific columns in the SELECT should be fetched, which saves both network, and should make it so that StarRocks can read columns even if others aren't supported.
This is listed in the 2024 roadmap, but I couldn't find a tracking issue for it
Describe the solution you'd like
Support parquet predicate pushdown, so that only specific metadata and/or columns are read.
Describe alternatives you've considered
DuckDB, Clickhouse, etc
Additional context
The text was updated successfully, but these errors were encountered: