Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add to concat different data types error message the data types #7166

Merged
merged 5 commits into from
Mar 8, 2025

Conversation

rluvaton
Copy link
Contributor

Which issue does this PR close?

N/A

Rationale for this change

Better debugging experience

What changes are included in this PR?

Only added the unique data types in the concat message and updated the tests

Are there any user-facing changes?

yes, they will see more helpful error message

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 20, 2025
@tustvold
Copy link
Contributor

I wonder if we need to incorporate some sort of cardinality limit here, e.g. similar to what we do when printing long arrays. I think this could potentially lead to long error messages, which in turn can lead to application hangs that are hard to diagnose.

WDYT?

.map(|dt| format!("{dt}"))
.collect::<Vec<_>>();

// Only sort in tests to make the error message is deterministic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could just use a BTreeSet? It will be slightly slower, but having non-deterministic error messages I think would be surprising for people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I kept the HashSet but just for tracking unique values, and now the error message have the data type in the order of the input which is deterministic and better so people can get a sense about where the input exists

rluvaton added 2 commits March 6, 2025 13:59
and also change the data type order to appear in the same order as the arrays for easier debugging
@rluvaton
Copy link
Contributor Author

rluvaton commented Mar 6, 2025

I wonder if we need to incorporate some sort of cardinality limit here, e.g. similar to what we do when printing long arrays. I think this could potentially lead to long error messages, which in turn can lead to application hangs that are hard to diagnose.

WDYT?

I've added a limit of 10

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton and @tustvold -- this looks like a very nice and useful improvement to me

@tustvold tustvold merged commit f5138fc into apache:main Mar 8, 2025
26 checks passed
@rluvaton rluvaton deleted the add-to-concat-what-is-incompatible branch March 9, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants