Skip to content

Convert RunEndEncoded field to Parquet #8069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vegarsti
Copy link

@vegarsti vegarsti commented Aug 6, 2025

Which issue does this PR close?

Rationale for this change

TODO

What changes are included in this PR?

TODO

Are these changes tested?

TODO

Are there any user-facing changes?

TODO

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Aug 6, 2025
@vegarsti vegarsti force-pushed the ree-to-parquet branch 3 times, most recently from ec40c7f to 5a43f13 Compare August 10, 2025 05:29
Comment on lines +225 to +228
DataType::RunEndEncoded(_, v) if is_leaf(v.data_type()) => {
let levels = ArrayLevels::new(parent_ctx, is_nullable, array.clone());
Ok(Self::Primitive(levels))
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly same as Dictionary above:

DataType::Dictionary(_, v) if is_leaf(v.as_ref()) => {
let levels = ArrayLevels::new(parent_ctx, is_nullable, array.clone());
Ok(Self::Primitive(levels))
}

@@ -59,6 +59,28 @@ macro_rules! downcast_dict_op {
};
}

macro_rules! downcast_ree_impl {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mimicked downcast_dict

@@ -167,6 +168,7 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool {
can_cast_types(from_key.data_type(), to_key.data_type()) && can_cast_types(from_value.data_type(), to_value.data_type()),
_ => false
},
// TODO: RunEndEncoded here?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled in #7713

ArrowDataType::FixedSizeBinary(_) => out.push(bytes(leaves.next().unwrap())?),
_ => out.push(col(leaves.next().unwrap())?),
},
ArrowDataType::RunEndEncoded(_run_ends, value_type) => match value_type.data_type() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've basically copied what Dictionary does. Not sure if correct!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support converting RunEndEncodedType to parquet
1 participant