Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6196] feat(iceberg): adjust table distribution if creating table without specifying disribution mode #6214

Merged
merged 5 commits into from
Jan 16, 2025

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Jan 13, 2025

What changes were proposed in this pull request?

Adjust the distribution mode for creating Iceberg table with none distribution. the following is the Spark adjust logic, the flink is similar.

  private DistributionMode defaultWriteDistributionMode() {
    if (table.sortOrder().isSorted()) {
      return RANGE;
    } else if (table.spec().isPartitioned()) {
      return HASH;
    } else {
      return NONE;
    }
  }

Why are the changes needed?

Fix: #6196

Does this PR introduce any user-facing change?

Yes, add document

How was this patch tested?

add UT and IT

@FANNG1 FANNG1 marked this pull request as draft January 13, 2025 11:23
@FANNG1 FANNG1 changed the title [#6196] feat(iceberg): adjust distribution if creating table with none disribution. [#6196] feat(iceberg): adjust distribution if creating table without specifying disribution mode Jan 13, 2025
@FANNG1 FANNG1 changed the title [#6196] feat(iceberg): adjust distribution if creating table without specifying disribution mode [#6196] feat(iceberg): adjust table distribution if creating table without specifying disribution mode Jan 13, 2025
@FANNG1 FANNG1 self-assigned this Jan 16, 2025
@FANNG1 FANNG1 added the 0.8.0 Release v0.8.0 label Jan 16, 2025
@FANNG1 FANNG1 force-pushed the iceberg_distribution branch 3 times, most recently from ac4876f to 75c9027 Compare January 16, 2025 03:07
@FANNG1 FANNG1 force-pushed the iceberg_distribution branch from 75c9027 to b6b71fb Compare January 16, 2025 03:14
@FANNG1 FANNG1 marked this pull request as ready for review January 16, 2025 03:18
@FANNG1
Copy link
Contributor Author

FANNG1 commented Jan 16, 2025

@jerryshao @jerqi PTAL

@FANNG1 FANNG1 added branch-0.8 Automatically cherry-pick commit to branch-0.8 and removed 0.8.0 Release v0.8.0 labels Jan 16, 2025
@@ -588,6 +596,16 @@ public void testConnection(
}
}

private static Distribution getIcebergDefaultDistribution(
Boolean isSorted, Boolean isPartitioned) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean -> boolean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -513,6 +514,13 @@ public Table createTable(
.build())
.toArray(IcebergColumn[]::new);

// Gravitino NONE distribution means the client side doesn't specify distribution not the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distribution not the -> distribution, which is not the same as none distribution in Iceberg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -513,6 +514,13 @@ public Table createTable(
.build())
.toArray(IcebergColumn[]::new);

// Gravitino NONE distribution means the client side doesn't specify distribution, which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this isn't accurate for me. The unspecified distribution and unpartitioned should be different. But it may change more code. The fix may better than legacy implement. So I approve this.

@FANNG1 FANNG1 merged commit e6225a0 into apache:main Jan 16, 2025
28 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 16, 2025
…thout specifying disribution mode (#6214)

### What changes were proposed in this pull request?

Adjust the distribution mode for creating Iceberg table with none
distribution. the following is the Spark adjust logic, the flink is
similar.

```java
  private DistributionMode defaultWriteDistributionMode() {
    if (table.sortOrder().isSorted()) {
      return RANGE;
    } else if (table.spec().isPartitioned()) {
      return HASH;
    } else {
      return NONE;
    }
  }
```

### Why are the changes needed?

Fix: #6196 

### Does this PR introduce _any_ user-facing change?

Yes, add document

### How was this patch tested?

add UT and IT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-0.8 Automatically cherry-pick commit to branch-0.8
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] correct the behaviors when creating Iceberg table with none distribution
3 participants