Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding models v3 for integrated vectorization #1942

Closed
egor-yudkin opened this issue Aug 27, 2024 · 3 comments
Closed

Embedding models v3 for integrated vectorization #1942

egor-yudkin opened this issue Aug 27, 2024 · 3 comments

Comments

@egor-yudkin
Copy link


This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Deploy application with azd up

Any log messages given by the failure

n/a

Expected/desired behavior

OS and Version?

Windows 10

azd version?

1.9.5

Versions

n/a

Mention any other details that might be useful

Integrated vectorization supports embedding v3 models now. I tried to set up the application with these environment variables (I have the model deployment called "embedding-3s" already):

USE_FEATURE_INT_VECTORIZATION="true"
AZURE_OPENAI_EMB_DEPLOYMENT="embedding-3s"
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION=1
AZURE_OPENAI_EMB_DIMENSIONS=1536
AZURE_OPENAI_EMB_MODEL_NAME="text-embedding-3-small"

It seems to be working fine - the prepdocs.py finished with no issues and the indexer worked fine on my small test set of the documents.

The skillset has a couple of null values for dimensions and modelName fields, is it something that matters? I don't know enough about this...

{
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "#2",
      "description": "Skill to generate embeddings via Azure OpenAI",
      "context": "/document/pages/*",
      "resourceUri": "https://azoai-shared-eus2-dev.openai.azure.com",
      "apiKey": null,
      "deploymentId": "embedding-3s",
      "dimensions": null,
      "modelName": null,
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "vector"
        }
      ],
      "authIdentity": null
    }

Maybe you can confirm if Embedding v3 models work fine and can be deployed with prepdocs.py correctly, and then update the documentation?


@pamelafox
Copy link
Collaborator

I'm checking in with the Azure AI Search team about this, it's possible that an Azure AI Search SDK update would be needed.

@pamelafox
Copy link
Collaborator

Response from AI Search team:

This is supported using the latest SDK versions (preview and GA). Here's how to use them with Python:
azure-search-vector-samples/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb at main · Azure/azure-search-vector-samples (github.com)
. The newest AOAI embedding models have a model property that wasn't needed/present when using ada-002 so this is present only in the newest SDK versions.

So we'd need to update this repo to bring in the latest SDK version and verify everything still works as expected. If you have time to make that change, please consider making a PR. I don't know when I'll get to it.

@pamelafox
Copy link
Collaborator

I've updated the SDK and I now pass in embedding dimensions, so this should work:
https://github.com/Azure-Samples/azure-search-openai-demo/releases/tag/2024-10-17
Please file a new issue if that's not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants