DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns #2937

…h existing ones 1. In ParquetSchema#createMissingColumn replaced col.toExpr() to col.getAsUnescapedPath() so that missing column name wouldn't be quoted with backticks 2. Fixed a typo in UnionAllRecordBatch ("counthas" -> "counts") 3. In TestParquetFilterPushDown workarounded NumberFormatException with CONVERT_TO 4. Removed testCoalesceWithUntypedNullValues* test methods from TestCaseNullableTypes 5. Moved testCoalesceOnNotExistentColumns* test methods from TestUntypedNull to a separate TestParquetMissingColumns and made them expect Nullable Int instead of Untyped Null 6. Created new TestParquetPartiallyMissingColumns test class with test cases for "backticks problem"

…ing parquet column (minor type solution) 1. Passed an overall table schema from AbstractParquetRowGroupScan to ParquetSchema 2. In ParquetSchema#createMissingColumn used the minor type from that schema instead of hardcoding the INT

…ing parquet column (data mode solution) 1. Added TypeCastRules#getLeastRestrictiveMajorType method for convenience 2. In Metadata, added resolving data mode (so it always prefer less restrictive one) when collecting file schemas and merging them into a single table schema. Synchronized merging to accomplish that 3. In ParquetTableMetadataUtils made the column either found OPTIONAL or missing in any of the files be OPTIONAL in the overall table schema 4. For such cases, added enforcing OPTIONAL data mode in ParquetSchema, ParquetColumnMetadata and ColumnReaderFactory. Now even if the file has the column as REQUIRED, but we need it as OPTIONAL, the nullable column reader and nullable value vector would be created 5. Added "() -> 1" initialization for definitionLevels in PageReader so that nullable column reader would be able to read REQUIRED columns 6. Added testEnforcingOptional* test cases in TestParquetPartiallyMissingColumns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns #2937

DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns #2937

Commits on Aug 28, 2024

DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns #2937

Are you sure you want to change the base?

DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns #2937

Commits on Aug 28, 2024