Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support to read IVF partitions #3462

Merged
merged 16 commits into from
Feb 21, 2025
Merged

Conversation

BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Feb 19, 2025

No description provided.

@github-actions github-actions bot added enhancement New feature or request python labels Feb 19, 2025
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
@BubbleCal BubbleCal marked this pull request as ready for review February 19, 2025 08:22
@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 1.60000% with 123 lines in your changes missing coverage. Please review.

Project coverage is 78.71%. Comparing base (cca98fc) to head (f8a8205).

Files with missing lines Patch % Lines
rust/lance/src/index/vector/pq.rs 0.00% 34 Missing ⚠️
rust/lance/src/index.rs 6.06% 31 Missing ⚠️
rust/lance-index/src/vector/hnsw/index.rs 0.00% 21 Missing ⚠️
rust/lance/src/index/vector/ivf/v2.rs 0.00% 21 Missing ⚠️
rust/lance/src/index/vector/ivf.rs 0.00% 7 Missing ⚠️
rust/lance-index/src/vector.rs 0.00% 3 Missing ⚠️
rust/lance/src/index/vector/fixture_test.rs 0.00% 3 Missing ⚠️
rust/lance/src/session/index_extension.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3462      +/-   ##
==========================================
- Coverage   78.82%   78.71%   -0.11%     
==========================================
  Files         251      251              
  Lines       92866    92983     +117     
  Branches    92866    92983     +117     
==========================================
- Hits        73202    73192      -10     
- Misses      16686    16814     +128     
+ Partials     2978     2977       -1     
Flag Coverage Δ
unittests 78.71% <1.60%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: BubbleCal <[email protected]>


class VectorIndexReader:
def __init__(self, dataset: LanceDataset, index_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to support version ?

Btw, can we write example in the docstring, also file a doc issue to write docs.

Copy link
Contributor Author

@BubbleCal BubbleCal Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can be done by:

old_ds = ds.checkout_version(old)
reader = VectorIndexReader(old_ds, index_name)

self.index_name = index_name
self.stats = stats

def num_partitions(self) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure this docstring is appropriately rendered?


Returns
-------
pa.RecordBatchReader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would it happen if partition_id is out of range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would raise an IndexError


self.dataset = dataset
self.index_name = index_name
self.stats = stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it yield error if the index by the index name is not vector index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would raise a ValueError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write detailed doc string with:

  1. What does this class do
  2. Parameters
  3. Exampels
  4. Exceptions

Lets make this work well with copilot and cursor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

self, partition_id: int, *, with_vector: bool = False
) -> pa.RecordBatchReader:
"""
Returns a reader for the given IVF partition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put an example section here, also mention what is the schema of output recordBatch?

Is there a reason that we dont return pa.Table here? pa.RecordBatchReader is not very end-user friendly, you dont see it on the first level of pyarrow docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to return pyarrow table

Signed-off-by: BubbleCal <[email protected]>
@BubbleCal BubbleCal mentioned this pull request Feb 20, 2025
2 tasks
@BubbleCal BubbleCal requested a review from eddyxu February 20, 2025 10:38
Copy link
Contributor

@eddyxu eddyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Pending docstring fix.

Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
@BubbleCal BubbleCal merged commit 59b414b into lancedb:main Feb 21, 2025
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants