-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Amazon Bedrock Embedding function #1361
Changes from 1 commit
49a33b3
0b4bfd0
eedf49d
a53e415
cc77bb7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -581,6 +581,105 @@ def __call__(self, input: Union[Documents, Images]) -> Embeddings: | |
return embeddings | ||
|
||
|
||
class AmazonBedrockEmbeddingFunction(EmbeddingFunction[Documents]): | ||
def __init__( | ||
self, | ||
profile_name: Optional[str] = None, | ||
region: Optional[str] = None, | ||
model_name: str = "amazon.titan-embed-text-v1", | ||
): | ||
"""Initialize AmazonBedrockEmbeddingFucntion. | ||
|
||
Args: | ||
profile_name (str, optional): The name of a profile to use. If not given, then the default profile is used, defaults to None | ||
region (str, optional): Default region when creating new connections, defaults to None | ||
model_name (str, optional): Identifier of the model, defaults to "amazon.titan-embed-text-v1" | ||
""" | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @chezou, it would be nice to check if boto3 package is installed and if not guide to user with appropriate error message. Check other EFs as ref. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally did checking it, but I followed @beggers suggestion to remove the dependency of boto3 from chroma itself. |
||
self._model_name = model_name | ||
|
||
try: | ||
import boto3 | ||
from botocore.config import Config | ||
except ImportError: | ||
raise ValueError( | ||
"The boto3 python package is not installed. Please install it with `pip install boto3`" | ||
) | ||
|
||
if not region: | ||
target_region = os.environ.get( | ||
"AWS_REGION", os.environ.get("AWS_DEFAULT_REGION") | ||
) | ||
else: | ||
target_region = region | ||
|
||
session_kwargs = {"region_name": target_region} | ||
client_kwargs = {**session_kwargs} | ||
|
||
if profile_name: | ||
target_profile = profile_name | ||
else: | ||
target_profile = os.environ.get("AWS_PROFILE") | ||
|
||
if target_profile: | ||
session_kwargs["profile_name"] = target_profile | ||
|
||
retry_config = Config( | ||
region_name=target_region, | ||
retries={ | ||
"max_attempts": 10, | ||
"mode": "standard", | ||
}, | ||
) | ||
|
||
session = boto3.Session(**session_kwargs) | ||
self._client = session.client( | ||
service_name="bedrock-runtime", config=retry_config, **client_kwargs | ||
) | ||
|
||
def __call__(self, input: Documents) -> Embeddings: | ||
"""Get the embeddings for a list of texts. | ||
|
||
Args: | ||
input (Documents): A list of texts to get embeddings for. | ||
|
||
Returns: | ||
Embeddings: The embeddings for the texts. | ||
|
||
Example: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we move this example into the docstring for the class instead of its There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed 0b4bfd0 |
||
>>> bedrock = AmazonBedrockEmbeddingFunction(profile_name="profile") | ||
>>> texts = ["Hello, world!", "How are you?"] | ||
>>> embeddings = bedrock(texts) | ||
""" | ||
embeddings = [] | ||
for text in input: | ||
embeddings.append(self._invoke(text)) | ||
return embeddings | ||
|
||
def _invoke( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed 0b4bfd0 |
||
self, | ||
text: str, | ||
) -> Embedding: | ||
import json | ||
|
||
input_body = {"inputText": text} | ||
body = json.dumps(input_body) | ||
accept = "application/json" | ||
content_type = "application/json" | ||
try: | ||
response = self._client.invoke_model( | ||
body=body, | ||
modelId=self._model_name, | ||
accept=accept, | ||
contentType=content_type, | ||
) | ||
embedding = json.load(response.get("body")).get("embedding") | ||
except Exception as e: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we remove this and the re-throw? I want clients to be able to handle exceptions differently depending on the type of exception, e.g. trying again in the case of a timeout. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed 0b4bfd0 |
||
raise ValueError(f"Error raised by bedrock service: {e}") | ||
|
||
return embedding | ||
|
||
|
||
# List of all classes in this module | ||
_classes = [ | ||
name | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a cleaner API boundary here and I'd like to hear your thoughts.
This constructor could accept a
boto3.Session
and optional kwargs to pass into theclient()
call. That way we give users full access to the configuration options for both theirSession
and theirclient
, and we have less code to maintain (we don't need to read fromos.environ
for example). I may be missing something here -- is there a reason we need to accept these and construct theSession
ourselves? If not could we accept aSession
and config kwargs?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with just passing a
boto3.Session
and necessary kwargs for theclient()
call instead of creating a session within the class.I don't have a specific reason to create a
boto3.Session
inside the class. I tried to align with the existing codes. Looking at the other classes creates a client internally, andHuggingFaceEmbeddingFunction
createsrequests.Session
within the class.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, do you think we should support all the possible config kwargs as much as possible?
Looking at the boto3 document, I think most of the kwargs are covered by the session, so we may need to pass
api_version
,use_ssl
,verify
, andendpoint_url
, at least. Or, we may postpone introducing those arguments at this moment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tentatively, I removed most of the redundant arguments without adding client-specific arguments. Let me know your thoughts on it.
eedf49d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already looking way better -- thank you!
One more small ask and then we can check it in: could we have the constructor accept
**kwargs
and pass them directly to theclient()
we create? That would give us three wins:1 - We don't have to manage the
retry_config
which I imagine some folks will want control over.2 - Users can specify whatever they want to override in their
Session
config without us needing to make any code changes.3 - We don't need to import
boto3
in this codebase anymore.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Fixed by using
**kwargs
a53e415