PERF-#7397: Avoid materializing index/columns in shape checks #7398
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What do these changes do?
Calling
len(pd.DataFrame(...))
will currently materialize the frame's Index, and return the length of thepd.Index
object. This PR adds aget_axis_len
method to the query compiler to potentially avoid this materialization when determining the length of the columns or index.This may not make a large difference for existing backends, as the underlying
PandasDataFrame
caches the index/column labels together with the length of that axis. However, other backends may choose to cache the shape separate from the actual labels, and this extra method lets us potentially avoid materializing those labels. As such, frontend methods that previously calledlen(df.index)
should instead call the equivalentlen(df)
to avoid potentially triggering this materialization.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date