Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of hidden objects in migration during add_files/add_files_from_table #24092

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

uditvarshney
Copy link
Contributor

Fixes 23891

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 10, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Nov 10, 2024
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests to TestIcebergMigrateProcedure and TestIcebergAddFilesProcedure.

@uditvarshney
Copy link
Contributor Author

Please add tests to TestIcebergMigrateProcedure and TestIcebergAddFilesProcedure.

Added the test cases

@wendigo wendigo requested a review from ebyhr November 12, 2024 15:27
@@ -109,7 +109,7 @@ public static List<DataFile> buildDataFiles(
FileEntry file = files.next();
String fileLocation = file.location().toString();
String relativePath = fileLocation.substring(location.length());
if (relativePath.contains("/_") || relativePath.contains("/.")) {
if (relativePath.contains("/_") || fileLocation.contains("/.")) {
Copy link
Member

@ebyhr ebyhr Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems wrong. Hive doesn't ignore files where the parent directory starts with . as far as I know.
The reason we used relativePath here was ignoring files under the table location.
You can investigate Hive behavior with S3HiveQueryRunner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Fix handling of hidden objects in migrate and add_files/add_files_from_table procedures
2 participants