Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dimension support #72

Merged

Conversation

izellevy
Copy link
Collaborator

@izellevy izellevy commented Jan 31, 2024

Problem

OpenAI released some models that can receive dimension as a parameter. We want to support that usecase.

Solution

Added dimension property to all the base dense encoders.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

@izellevy izellevy requested a review from acatav January 31, 2024 16:36
Copy link
Collaborator

@acatav acatav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see some small comments



class BaseDenseEncoder(ABC):
def __init__(self, *, dimension: Optional[int] = None, **kwargs: Any):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we put dimension as optional param for the base class, we should either allow it in all the child classes or not in the base at all. Right now it seems we only going to allow it for OpenAI, so maybe better to not put it in the base class? WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to have dimension accessible to all the child classes since it will be weird if only OpenAI exposes. You are right in the fact that in the base init we should not receive this parameter. I create a private property with a default now and override in OpenAI now. I only define as an initialization parameter in OpenAI one.

pinecone_text/dense/openai_encoder.py Show resolved Hide resolved
)
if self._dimension:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either we should have here self._dimension is not None or better just assert on input that dimension > 0, right now if someone pass 0 it will work with OpenAI default which is kind of weird

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right fixing

pyproject.toml Show resolved Hide resolved
Copy link
Collaborator

@acatav acatav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

)
if self._dimension is not None:
assert self._dimension > 0, "dimension must be a positive integer"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think it's better to check that on init and not here
  2. Maybe it's better to use here self.dimension and not the private property

@izellevy izellevy added this pull request to the merge queue Feb 1, 2024
Merged via the queue into pinecone-io:main with commit 0f50445 Feb 1, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants