-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28732: Sorted dynamic partition optimization does not apply hive.default.nulls.last #5627
Open
kasakrisz
wants to merge
1
commit into
apache:master
Choose a base branch
from
kasakrisz:HIVE-28732-master-sorted-dyn-part-opt-nulls
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be OK as a conclusion, but I am leaving a note that came to my mind.
We added a feature to configure the sort order of Iceberg tables. I wondered if we can keep the semantics of
WRITE LOCALLY ORDERED BY a
.https://github.com/apache/hive/pull/5541/files#r1894857758
The null ordering of sort order specs is required. So, our DDL specifies either NULLS_FIRST or NULLS_LAST, and then
createDPContext
just follows the definition. In other words,sortField.nullOrder()
is non-null and this change never takes effect in the real world.https://github.com/apache/iceberg/blob/8839c9bf1f1d8c9b718f9766302ff8a2018e515f/core/src/main/java/org/apache/iceberg/SortOrderParser.java#L159-L175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were two failing test without this change.
When I analyzed the cause of the failure of these tests I bumped into this bug:
maps
NULLS_FIRST
to0
which actually means nulls last in Hivehive/ql/src/java/org/apache/hadoop/hive/ql/util/NullOrdering.java
Lines 30 to 31 in fa6813f
I found that null sort order is set here when it was not specified explicitly at create table:
hive/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Line 292 in fa6813f
and the defaults are defined at the parser level
hive/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Lines 2205 to 2206 in fa6813f
This is not affected by the setting
hive.default.nulls.last
.I don't understand this part. Could you please elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this PR, assuming we don't change the semantics of
customSortNullOrder.add(sortField.nullOrder() == NullOrder.NULLS_FIRST ? 0 : 1);
. Now, I understand 0 and 1 will be correctly inverted. Let me check...