Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28732: Sorted dynamic partition optimization does not apply hive.default.nulls.last #5627

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Jan 31, 2025

What changes were proposed in this pull request?

  • Currently the nulls order of RS keys in Sorted dynamic partition optimization are hardcoded to ASC NULLS FIRST and DESC NULLS LAST. This patch changes the logic to respect the value of setting hive.default.nulls.last.
  • Fix The mapping between Iceberg null order and Hive null order: the current mapping is mixed: Iceberg NullOrder.NULLS_FIRST -> Hive NullOrdering.NULLS_LAST and vice versa.

Why are the changes needed?

To unify nulls ordering.

Does this PR introduce any user-facing change?

No but explain output of affected queries may change.

Is the change a dependency upgrade?

No.

How was this patch tested?

mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=dynpart_sort_optimization.q -pl itests/qtest -Pitests

Copy link

sonarqubecloud bot commented Feb 6, 2025

Copy link
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

val_238 238
val_238 238
val_238 238
val_238 238
val_238_n 238
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this doesn't violate the semantics, I didn't understand why this diff showed up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the query

SELECT * FROM partition_mv_3 where key = 238

does not specify ordering I think it depends on the execution engine internals.

if (sortField.nullOrder() == NullOrder.NULLS_FIRST) {
nullOrdering = NullOrdering.NULLS_FIRST;
}
customSortNullOrder.add(nullOrdering.getCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be OK as a conclusion, but I am leaving a note that came to my mind.

We added a feature to configure the sort order of Iceberg tables. I wondered if we can keep the semantics of WRITE LOCALLY ORDERED BY a.
https://github.com/apache/hive/pull/5541/files#r1894857758

The null ordering of sort order specs is required. So, our DDL specifies either NULLS_FIRST or NULLS_LAST, and then createDPContext just follows the definition. In other words, sortField.nullOrder() is non-null and this change never takes effect in the real world.
https://github.com/apache/iceberg/blob/8839c9bf1f1d8c9b718f9766302ff8a2018e515f/core/src/main/java/org/apache/iceberg/SortOrderParser.java#L159-L175

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two failing test without this change.

TestHiveIcebergInserts.testSortedInsert
TestHiveIcebergInserts.testSortedAndTransformedInsertIntoPartitionedTable

When I analyzed the cause of the failure of these tests I bumped into this bug:

sortField.nullOrder() == NullOrder.NULLS_FIRST ? 0 : 1

maps NULLS_FIRST to 0 which actually means nulls last in Hive

NULLS_FIRST(1, HiveParser.TOK_NULLS_FIRST, NullValueOption.MINVALUE, 'a', RelFieldCollation.NullDirection.FIRST),
NULLS_LAST(0, HiveParser.TOK_NULLS_LAST, NullValueOption.MAXVALUE, 'z', RelFieldCollation.NullDirection.LAST);

We added a feature to configure the sort order of Iceberg tables. I wondered if we can keep the semantics of WRITE LOCALLY ORDERED BY a

I found that null sort order is set here when it was not specified explicitly at create table:

sortFields = JSON_OBJECT_MAPPER.reader().readValue(sortOderJSONString, SortFields.class);

and the defaults are defined at the parser level
-> {$orderSpec.tree == null && $nullSpec.tree == null}?
^(TOK_TABSORTCOLNAMEASC ^(TOK_NULLS_FIRST identifier))

This is not affected by the setting hive.default.nulls.last.

sortField.nullOrder() is non-null and this change never takes effect in the real world.

I don't understand this part. Could you please elaborate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR, assuming we don't change the semantics of customSortNullOrder.add(sortField.nullOrder() == NullOrder.NULLS_FIRST ? 0 : 1);. Now, I understand 0 and 1 will be correctly inverted. Let me check...

Copy link
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved again.

if (i == 0) {
nullOrderStr += "a";
} else {
nullOrderStr += "z";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my note. HiveIcebergStorageHandler mapped NULLS_FIRST to 0 and NULLS_LAST to 1. It should have been inverted for the consistency with NullOrdering#getCode. But the inconsistency didn't introduce real issues because this part didn't also follow NullOrdering#getCode.


for (Integer ignored : keyColsPosInVal) {
newSortNullOrder.add(nullOrder);
nullOrder = NullOrdering.fromCode(sortNullOrder.get(0)).getSign();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered why we pick up only the first element of sortOrder and sortNullOrder. This question doesn't directly relate to the problem of this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants