Skip to content

Introduce MetricsMaxInferredColumnDefaultsStrategy #13039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jkolash
Copy link
Contributor

@jkolash jkolash commented May 13, 2025

These changes address issue #11253 allowing for setting of a new default strategy that considers the total number of field metrics rather than just the number of top level columns.

the new property is
write.metadata.metrics.max-inferred-column-defaults.strategy

and valid values would be original, depth, breadth

It currently preserves the original default behavior as changing
that may be a more disruptive change as it could lead to unexpected
performance regressions. A Breadth first strategy would likely be most
compatible with the original strategy so it would be safer to default
into vs the depth strategy. The original strategy could then be
deprecated and removed in the future

This could also easily support a previously discussed feature of
reversing order of field ids for considering defaults. Though that
won't be included in this PR

I'm inclined to remove the depth strategy unless there is a strong
desire to keep it.

These changes address issue apache#11253 allowing for setting
of a new default strategy that considers the total number
of field metrics rather than just the number of top level columns.

the new property is
```write.metadata.metrics.max-inferred-column-defaults.strategy```

and valid values would be ```original, depth, breadth```

It currently preserves the original default behavior as changing
that may be a more disruptive change as it could lead to unexpected
performance regressions. A Breadth first strategy would likely be most
compatible with the original strategy so it would be safer to default
into vs the depth strategy. The original strategy could then be
deprecated and removed in the future

This could also easily support a previously discussed feature of
reversing order of field ids for considering defaults. Though that
won't be included in this PR

I'm inclined to remove the depth strategy unless there is a strong
desire to keep it.
… level

columns list not the number of total projected fieldIds from the schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant