Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44501: [C#] Use an index lookup for O(1) field index access #44633

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vthemelis
Copy link

@vthemelis vthemelis commented Nov 4, 2024

Fixes #44501

Rationale for this change

This should speed up the reasonably common operation of looking up a column via name in Schema or RecordBatch.

I would argue that the fact that this operation was previously O(N) (now O(1)) is unintuitive and could easily lead to accidental performance woes.

Note that I would like to also replace the existing Lookups with signature string -> Field but unfortunately those use StringComparer.Default instead of StringComparer.CurrentCulture (which I'm using to make the changed public function backwards compatible). Not sure if this is intentional.

What changes are included in this PR?

This PR simply memoizes the index lookup of the fields. It should not break any existing behaviour.

It also fixes a recent regression about the behaviour of the changed function in the case of using a custom comparator and have no matches in the schema.

Are these changes tested?

All code paths changed are covered by new unit tests.

Are there any user-facing changes?

Yes, the lookup function throws an exception when there are no matches fuels rather than returning -1. This was a regression introduces O(days) ago.

Copy link

github-actions bot commented Nov 4, 2024

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@vthemelis vthemelis changed the title Use an index lookup for O(1) field index access GH-44501: [C#] Use an index lookup for O(1) field index access Nov 4, 2024
Copy link

github-actions bot commented Nov 4, 2024

⚠️ GitHub issue #44501 has been automatically assigned in GitHub to PR creator.

@vthemelis vthemelis force-pushed the constant-col-lookup branch 4 times, most recently from ce4dda4 to c7a0c58 Compare November 4, 2024 16:45
@vthemelis
Copy link
Author

@CurtHagenlocher it looks like my tests are failing in CI but I can't replicate this locally.

@vthemelis
Copy link
Author

Oh, I see the issue. The CI does a merge into main first and this has changed since my PR: #44576

The PR above has changed the lookup behaviour when the field is not matched.

CC @georgevanburgh

@vthemelis vthemelis force-pushed the constant-col-lookup branch 2 times, most recently from e6d024d to 52b374a Compare November 5, 2024 10:26
@vthemelis
Copy link
Author

@CurtHagenlocher and @georgevanburgh I've reverted to the original behaviour (throwing an InvalidOperationException) for now. Let me know if you prefer keeping the return -1 regression.

@CurtHagenlocher
Copy link
Contributor

I'm a bit tied up with work but will look at this by Friday.

@vthemelis
Copy link
Author

I'm a bit tied up with work but will look at this by Friday.

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C#] Column(string) method in RecordBatch is linear to the number of columns
2 participants