Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos Full Text Search support #35868

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Cosmos Full Text Search support #35868

wants to merge 3 commits into from

Conversation

maumar
Copy link
Contributor

@maumar maumar commented Mar 29, 2025

Cosmos Full Text Search support

  • Adding model building API to configure property as full-text search enabled, as well as setup the index for it,
  • Adding model validation (e.g. FTS index not matching FTS property),
  • Adding EF.Functions stubs and translations for FullTextContains, FullTextContainsAll, FullTextContainsAny, FullTextScore and RRF (for hybrid),
  • Adding logic in SelectExpression to produce ORDER BY RANK when necessary,
  • Adding validation when attempting to mix with ORDER BY RANK with regular ORDER BY,
  • Rewrite OFFSET/LIMIT from parameter to constant when ORDER BY RANK is present.

Also fixed / added support for vector search on owned types (since it shares logic with FTS) and added some tests.

outstanding work:

  • support for FTS Container building using Azure.ResourceManager.CosmosDb (currently blocked on updated package being released)

Fixes #35476
Fixes #35851
Fixes #35852
Fixes #35853
Fixes #35867

@maumar maumar requested a review from Copilot March 29, 2025 02:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for full-text search in Cosmos by introducing new full-text index types, translators, model validations, and extensions.

  • Introduces full-text indexing support in container creation and client wrappers.
  • Adds a new CosmosFullTextSearchTranslator and corresponding DbFunctions extensions.
  • Updates model validation and Fluent API extensions to include full-text search configuration.

Reviewed Changes

Copilot reviewed 23 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/EFCore.Cosmos/Storage/Internal/CosmosClientWrapper.cs Adds support for full-text indexing in container creation, but the new full-text index assignment may override vector index settings.
src/EFCore.Cosmos/Query/Internal/Translators/CosmosVectorSearchTranslator.cs Modifies method checking logic that may prematurely reject valid method calls by changing && to
src/EFCore.Cosmos/Query/Internal/Translators/CosmosFullTextSearchTranslator.cs Implements a new translator for full-text search functions.
src/EFCore.Cosmos/Query/Internal/Expressions/SqlFunctionExpression.cs Introduces an extra constructor parameter to flag scoring functions.
src/EFCore.Cosmos/Query/Internal/Expressions/SelectExpression.cs Adds a RankOrdering property and methods to apply rank-based ordering.
src/EFCore.Cosmos/Query/Internal/CosmosQueryableMethodTranslatingExpressionVisitor.cs Adjusts ordering logic to support scoring functions and rank ordering.
src/EFCore.Cosmos/Query/Internal/CosmosQuerySqlGenerator.cs Updates SQL generation to include rank ordering handling.
src/EFCore.Cosmos/Metadata/Internal/CosmosAnnotationNames.cs Defines new annotations for full-text search configuration.
src/EFCore.Cosmos/Infrastructure/Internal/CosmosModelValidator.cs Validates full-text index configuration, ensuring only single-property indexes with language annotation are defined.
src/EFCore.Cosmos/Extensions/CosmosPropertyExtensions.cs, CosmosPropertyBuilderExtensions.cs, CosmosIndexExtensions.cs, CosmosIndexBuilderExtensions.cs, CosmosDbFunctionsExtensions.cs Adds Fluent API and extension methods for configuring full-text search on properties and indexes.
Files not reviewed (3)
  • Directory.Packages.props: Language not supported
  • src/EFCore.Cosmos/EFCore.Cosmos.csproj: Language not supported
  • src/EFCore.Cosmos/Properties/CosmosStrings.Designer.cs: Language not supported
Comments suppressed due to low confidence (2)

src/EFCore.Cosmos/Query/Internal/Translators/CosmosVectorSearchTranslator.cs:33

  • Changing the logical operator from '&&' to '||' alters the intended logic and causes the translator to return null for methods declared on CosmosDbFunctionsExtensions that are not named 'VectorDistance'. Revisit the condition to ensure it only returns null when both conditions are unmet.
if (method.DeclaringType != typeof(CosmosDbFunctionsExtensions) || method.Name != nameof(CosmosDbFunctionsExtensions.VectorDistance))

src/EFCore.Cosmos/Extensions/CosmosPropertyExtensions.cs:137

  • [nitpick] The Experimental attribute for full-text search methods is using 'CosmosVectorSearchExperimental' instead of the more appropriate 'CosmosFullTextSearchExperimental'. This may cause confusion regarding the feature's experiment status.
[Experimental(EFDiagnostics.CosmosVectorSearchExperimental)]

@maumar maumar force-pushed the cosmos_fts_take_omega branch 2 times, most recently from 0f39e44 to a836b63 Compare March 31, 2025 23:41
@maumar maumar force-pushed the cosmos_fts_take_omega branch from a836b63 to 898cc5f Compare April 1, 2025 00:17
@roji
Copy link
Member

roji commented Apr 1, 2025

@maumar just to note that I'm holding back from reviewing since this is still in draft (you seemed to maybe want people to take a look in yesterday's meeting). Let me know if you want a review etc.

@maumar
Copy link
Contributor Author

maumar commented Apr 1, 2025

@roji feel free to hold off on reviewing a draft. I'm pretty close to marking it for review I think

@maumar maumar force-pushed the cosmos_fts_take_omega branch from a45f839 to 245066a Compare April 1, 2025 23:20
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up doing a quick review even though the PR is still in draft... Here are some comments.

@maumar maumar force-pushed the cosmos_fts_take_omega branch 3 times, most recently from 73b0297 to 30e0bd9 Compare April 2, 2025 02:27
@maumar maumar force-pushed the cosmos_fts_take_omega branch from 30e0bd9 to ebebcfc Compare April 2, 2025 03:18
@maumar maumar force-pushed the cosmos_fts_take_omega branch from ebebcfc to 68ca715 Compare April 3, 2025 01:20
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @AndriySvyryd in case you want to take a quick look for the metadata side and Cosmos container building part.

@maumar maumar force-pushed the cosmos_fts_take_omega branch 3 times, most recently from e7d281d to 81f7758 Compare April 3, 2025 23:26
@maumar maumar marked this pull request as ready for review April 3, 2025 23:26
@maumar maumar requested a review from a team as a code owner April 3, 2025 23:26
@maumar
Copy link
Contributor Author

maumar commented Apr 3, 2025

@roji @AndriySvyryd the PR is now ready for review.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maumar see batch of comments.

I think we should maybe discuss the FTS property vs. index question in design...

@maumar maumar force-pushed the cosmos_fts_take_omega branch 2 times, most recently from 64a2808 to 0933839 Compare April 5, 2025 01:27
maumar added a commit that referenced this pull request Apr 7, 2025
Fixes #35476
Fixes #35853 (need to fix this one, otherwise vector translator will try to translate full text methods and fail)

Description
This PR enables full-text search queries using EF Core 9 when targeting Azure Cosmos Db. This is one of the flagship new features for Cosmos and the aim here is to help with it's adoption. This is very limited port of the full feature we are adding in EF 10 Preview 4. We are only adding querying capabilities: function stubs for FullTextContains, FullTextContainsAll, FullTextContainsAny, FullTextScore and RRF as well as logic translating these signatures to built-in Cosmos functions.
No model building or data manipulation - containers need to be created outside EF (using Cosmos SDK or in the Data Explorer).

Customer impact
Customers will be able to use the upcoming full text search capabilities when working with Azure Cosmos Db without the need to upgrade to EF 10 preview.

How found
Partner team ask.

Regression
No

Testing
Extensively tested on EF 10, manual testing on EF9. End-to-end testing is not possible because we can't create containers programmatically (no support for it inside EF Core itself, and the Cosmos SDK which supports it is currently only available in beta, so we can't take dependency on it). Instead, we created containers and data using EF 10, ported all the query tests from EF 10 and ran them using the EF9 bits.

Risk
Low. Code here is purely additive and actually localized to only a handful of places in the code: validation in ApplyOrdering/AppendOrdering, FTS method translator, parameter inliner and sql generator. Feature is marked as experimental and quirks have been added.
- Adding model building API to configure property as full-text search enabled, as well as setup the index for it,
- Adding model validation (e.g. FTS index not matching FTS property),
- Adding EF.Functions stubs and translations for FullTextContains, FullTextContainsAll, FullTextContainsAny, FullTextScore and RRF (for hybrid),
- Adding logic in SelectExpression to produce ORDER BY RANK when necessary,
- Adding validation when attempting to mix with ORDER BY RANK with regular ORDER BY,
- Rewrite OFFSET/LIMIT from parameter to constant when ORDER BY RANK is present.

Also fixed / added support for vector search on owned types (since it shares logic with FTS) and added some tests.

outstanding work:
- support for FTS Container building using Azure.ResourceManager.CosmosDb (currently blocked on updated package being released)
- add model building support for default language (superfluous for now, since only one language is supported),

Fixes #35476
Fixes #35853
Fixes #35867
Fixes #35852
@maumar maumar force-pushed the cosmos_fts_take_omega branch 3 times, most recently from d9fa4f3 to b554e29 Compare April 9, 2025 01:50
@maumar
Copy link
Contributor Author

maumar commented Apr 9, 2025

@roji @AndriySvyryd I updated the PR addressing the Property/Index API and fixing a few small errors. You can review just the second commit - all the changes are there.

  • I put the HasDefaultFullTextLanguage setting on the entity type builder, rather than creating a container builder just for it. We already have some container-level settings (e.g. TTL) and we defined them on entity type builder, so going for consistency here.
  • Added tests to make sure values are propagated correctly (if property specifies language, use that, otherwise use default, otherwise use en-US). These tests need to be adjusted once cosmos supports more languages, for now they all just throw Cosmos exception complaining about not supported language. I manually verified that we wire-up things correctly when creating the container.
  • Added test for using full text property without the index and removed validation preventing this scenario.
  • Conditionally skipping the tests when running on live service due to Azure.ResourceManager.CosmosDb not supporting FTS.

…reparation for Cosmos supporting more languages than just en-US.
@maumar maumar force-pushed the cosmos_fts_take_omega branch from b554e29 to 65cc1e5 Compare April 9, 2025 02:05
/// <param name="language">The language for the full-text search.</param>
/// <returns>The same builder instance so that multiple calls can be chained.</returns>
[Experimental(EFDiagnostics.CosmosFullTextSearchExperimental)]
public static PropertyBuilder IsFullTextProperty(
Copy link
Member

@roji roji Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find the discrepancy between builder (IsFullTextProperty) and metadata (SetFullTextSearchLanguage) names odd and unclear. For example, what does IsFullTextProperty() actually do when a language isn't passed? Since we just call SetFullTextSearchLanguage with null, it doesn't seem to actually do anything (apart erasing a previously-set language); if so, then this method doesn't actually make the property into a full-text property - it really just sets the language.

Here's another way to look at it. Since Cosmos allows defining a default language at the container level, I'm assuming it's possible to omit it from the property level (otherwise why have a default at all). If that's the case (should confirm), then there needs to be a way to way "this is a full text property" without setting a language. How would this work with the current scheme, i.e. how does you code know to emit a container policy specifying the property as a full-text property?

If the above is true on the Cosmos side - and it needs to be confirmed by simple experimentation - we simply need to have two separate metadata: one "IsFullTextSearchEnabled" (boolean), and one "FullTextSearchLanguage" (which is optional, if not specified the container default applies). We could still of course have a single convenience builder method (I'd call it EnableFullTextSearch), which sets both metadata. According to my understand of how Cosmos actually works - that the language specification on the property level is optional - this is what we need.

One last possibility here is to use a null language to mean "this is a full text property who's language hasn't been set by the user (and should default to the container default" (is this the current design @maumar?). If so, then there's no way to turn off a property being a full-text property (since passing a null language activate FTS).


On another note, an alternative possible naming for the builder API could simply be EnableFullTextSearch(), since the property still remains a regular property that can be used for other things (xmldoc can also be updated to reflect that). Constrast this with vector properties, which have no other possible use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last possibility here is to use a null language to mean "this is a full text property who's language hasn't been set by the user (and should default to the container default" (is this the current design @maumar?). If so, then there's no way to turn off a property being a full-text property (since passing a null language activate FTS).

you are right, this is the current design - will switch to 2 annotations, it is much cleaner

Copy link
Member

@AndriySvyryd AndriySvyryd Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the Cosmos design correctly there is no such setting as "this is an FTS property", FTS can be used on anything by default, but to avoid specifying the language in the call it should be configured either on the container or the property. Then, if you want it to be read-efficient, you can also create an index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method should be called HasFullTextSearchLanguage and have a required parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And EnableFullTextSearch() is completely unnecessary

Copy link
Contributor Author

@maumar maumar Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roji it is currently not allowed to set individual path's language to null or empty. SDK throws a validation error "Argument {0} can't be null or empty. (Parameter 'Language')". I'm following up with Cosmos folks to figure out exactly how it's supposed to work then, but it might be what @AndriySvyryd suggests (all properties are FTS enabled out of the box and you only specify path/language if the language is supposed to be different than the default). Documentation says something different: "For every text property you'd like to configure for full text search, you must declare both the path of the property with text and the language of the text.", so we'll see.

@maumar maumar force-pushed the cosmos_fts_take_omega branch from c52fdaa to 8d8771f Compare April 9, 2025 10:27
@maumar maumar force-pushed the cosmos_fts_take_omega branch from 8d8771f to 993bc56 Compare April 9, 2025 10:56
@maumar
Copy link
Contributor Author

maumar commented Apr 9, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants