-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44501: [C#] Use an index lookup for O(1)
field index access
#44633
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
O(1)
field index access
|
ce4dda4
to
c7a0c58
Compare
@CurtHagenlocher it looks like my tests are failing in CI but I can't replicate this locally. |
c7a0c58
to
21a86cd
Compare
Oh, I see the issue. The CI does a merge into The PR above has changed the lookup behaviour when the field is not matched. |
e6d024d
to
52b374a
Compare
@CurtHagenlocher and @georgevanburgh I've reverted to the original behaviour (throwing an |
52b374a
to
d099cdf
Compare
I'm a bit tied up with work but will look at this by Friday. |
Thank you very much! |
Fixes #44501
Rationale for this change
This should speed up the reasonably common operation of looking up a column via name in
Schema
orRecordBatch
.I would argue that the fact that this operation was previously O(N) (now O(1)) is unintuitive and could easily lead to accidental performance woes.
Note that I would like to also replace the existing Lookups with signature
string -> Field
but unfortunately those useStringComparer.Default
instead ofStringComparer.CurrentCulture
(which I'm using to make the changedpublic
function backwards compatible). Not sure if this is intentional.What changes are included in this PR?
This PR simply memoizes the index lookup of the fields. It should not break any existing behaviour.
It also fixes a recent regression about the behaviour of the changed function in the case of using a custom comparator and have no matches in the schema.
Are these changes tested?
All code paths changed are covered by new unit tests.
Are there any user-facing changes?
Yes, the lookup function throws an exception when there are no matches fuels rather than returning -1. This was a regression introduces O(days) ago.
Column(string)
method inRecordBatch
is linear to the number of columns #44501