Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce deletion files to DataSplit #2988

Merged
merged 2 commits into from
Mar 12, 2024

Conversation

JingsongLi
Copy link
Contributor

Purpose

Deletion file should be in Split, and the reader should respect deletion files in Split, instead of reading deletion manifest again which is very costly.

Tests

API and Format

Documentation

Copy link
Contributor

@Zouxxyy Zouxxyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the modification! leave some questions

file.schemaId(), fileName, file.fileSize(), file.level());
if (deletionFile != null) {
DeletionVector deletionVector =
DeletionVector.read(fileIO, deletionFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can add this to IndexFileHandler, readDeletionVector contains version and crc verification

public DeletionVector readDeletionVector(DeletionFile deletionFile) {
    return deletionVectorsIndex.readDeletionVector(
            new Path(deletionFile.path()),
            Pair.of(deletionFile.offset(), deletionFile.length()));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want introduce IndexFileHandler here, imagine a C++Reader, which should not rely on any complex classes, the simpler the better.

if (deletionFile != null) {
DeletionVector deletionVector =
DeletionVector.read(fileIO, deletionFile);
reader = ApplyDeletionVectorReader.create(reader, deletionVector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worry about conflicts with readerFactoryBuilder.withDeletionVectorSupplier in the future

Copy link
Contributor Author

@JingsongLi JingsongLi Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified here.

Copy link
Contributor

@Zouxxyy Zouxxyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 1024c56 into apache:master Mar 12, 2024
8 of 9 checks passed
@Zouxxyy
Copy link
Contributor

Zouxxyy commented Mar 13, 2024

#2898

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants