-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28732: Sorted dynamic partition optimization does not apply hive.default.nulls.last #5627
base: master
Are you sure you want to change the base?
HIVE-28732: Sorted dynamic partition optimization does not apply hive.default.nulls.last #5627
Conversation
f33d1f8
to
f3fc77a
Compare
f3fc77a
to
92fc68f
Compare
….default.nulls.last
92fc68f
to
91c7542
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
val_238 238 | ||
val_238 238 | ||
val_238 238 | ||
val_238 238 | ||
val_238_n 238 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this doesn't violate the semantics, I didn't understand why this diff showed up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the query
SELECT * FROM partition_mv_3 where key = 238
does not specify ordering I think it depends on the execution engine internals.
if (sortField.nullOrder() == NullOrder.NULLS_FIRST) { | ||
nullOrdering = NullOrdering.NULLS_FIRST; | ||
} | ||
customSortNullOrder.add(nullOrdering.getCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be OK as a conclusion, but I am leaving a note that came to my mind.
We added a feature to configure the sort order of Iceberg tables. I wondered if we can keep the semantics of WRITE LOCALLY ORDERED BY a
.
https://github.com/apache/hive/pull/5541/files#r1894857758
The null ordering of sort order specs is required. So, our DDL specifies either NULLS_FIRST or NULLS_LAST, and then createDPContext
just follows the definition. In other words, sortField.nullOrder()
is non-null and this change never takes effect in the real world.
https://github.com/apache/iceberg/blob/8839c9bf1f1d8c9b718f9766302ff8a2018e515f/core/src/main/java/org/apache/iceberg/SortOrderParser.java#L159-L175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were two failing test without this change.
TestHiveIcebergInserts.testSortedInsert
TestHiveIcebergInserts.testSortedAndTransformedInsertIntoPartitionedTable
When I analyzed the cause of the failure of these tests I bumped into this bug:
sortField.nullOrder() == NullOrder.NULLS_FIRST ? 0 : 1
maps NULLS_FIRST
to 0
which actually means nulls last in Hive
hive/ql/src/java/org/apache/hadoop/hive/ql/util/NullOrdering.java
Lines 30 to 31 in fa6813f
NULLS_FIRST(1, HiveParser.TOK_NULLS_FIRST, NullValueOption.MINVALUE, 'a', RelFieldCollation.NullDirection.FIRST), | |
NULLS_LAST(0, HiveParser.TOK_NULLS_LAST, NullValueOption.MAXVALUE, 'z', RelFieldCollation.NullDirection.LAST); |
We added a feature to configure the sort order of Iceberg tables. I wondered if we can keep the semantics of WRITE LOCALLY ORDERED BY a
I found that null sort order is set here when it was not specified explicitly at create table:
hive/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Line 292 in fa6813f
sortFields = JSON_OBJECT_MAPPER.reader().readValue(sortOderJSONString, SortFields.class); |
and the defaults are defined at the parser level
hive/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Lines 2205 to 2206 in fa6813f
-> {$orderSpec.tree == null && $nullSpec.tree == null}? | |
^(TOK_TABSORTCOLNAMEASC ^(TOK_NULLS_FIRST identifier)) |
This is not affected by the setting hive.default.nulls.last
.
sortField.nullOrder() is non-null and this change never takes effect in the real world.
I don't understand this part. Could you please elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this PR, assuming we don't change the semantics of customSortNullOrder.add(sortField.nullOrder() == NullOrder.NULLS_FIRST ? 0 : 1);
. Now, I understand 0 and 1 will be correctly inverted. Let me check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved again.
if (i == 0) { | ||
nullOrderStr += "a"; | ||
} else { | ||
nullOrderStr += "z"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my note. HiveIcebergStorageHandler
mapped NULLS_FIRST to 0 and NULLS_LAST to 1. It should have been inverted for the consistency with NullOrdering#getCode
. But the inconsistency didn't introduce real issues because this part didn't also follow NullOrdering#getCode
.
|
||
for (Integer ignored : keyColsPosInVal) { | ||
newSortNullOrder.add(nullOrder); | ||
nullOrder = NullOrdering.fromCode(sortNullOrder.get(0)).getSign(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered why we pick up only the first element of sortOrder
and sortNullOrder
. This question doesn't directly relate to the problem of this ticket
What changes were proposed in this pull request?
hive.default.nulls.last
.NullOrder.NULLS_FIRST
-> HiveNullOrdering.NULLS_LAST
and vice versa.Why are the changes needed?
To unify nulls ordering.
Does this PR introduce any user-facing change?
No but explain output of affected queries may change.
Is the change a dependency upgrade?
No.
How was this patch tested?