-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support to read IVF partitions #3462
Conversation
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3462 +/- ##
==========================================
- Coverage 78.82% 78.71% -0.11%
==========================================
Files 251 251
Lines 92866 92983 +117
Branches 92866 92983 +117
==========================================
- Hits 73202 73192 -10
- Misses 16686 16814 +128
+ Partials 2978 2977 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: BubbleCal <[email protected]>
|
||
|
||
class VectorIndexReader: | ||
def __init__(self, dataset: LanceDataset, index_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to support version ?
Btw, can we write example in the docstring, also file a doc issue to write docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this can be done by:
old_ds = ds.checkout_version(old)
reader = VectorIndexReader(old_ds, index_name)
self.index_name = index_name | ||
self.stats = stats | ||
|
||
def num_partitions(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make sure this docstring is appropriately rendered?
python/python/lance/dataset.py
Outdated
|
||
Returns | ||
------- | ||
pa.RecordBatchReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would it happen if partition_id is out of range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would raise an IndexError
|
||
self.dataset = dataset | ||
self.index_name = index_name | ||
self.stats = stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it yield error if the index by the index name is not vector index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would raise a ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write detailed doc string with:
- What does this class do
- Parameters
- Exampels
- Exceptions
Lets make this work well with copilot and cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
python/python/lance/dataset.py
Outdated
self, partition_id: int, *, with_vector: bool = False | ||
) -> pa.RecordBatchReader: | ||
""" | ||
Returns a reader for the given IVF partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets put an example section here, also mention what is the schema of output recordBatch?
Is there a reason that we dont return pa.Table here? pa.RecordBatchReader
is not very end-user friendly, you dont see it on the first level of pyarrow docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to return pyarrow table
Signed-off-by: BubbleCal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Pending docstring fix.
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
No description provided.