Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A targeted multi-partition query on a sub-partitioned container becomes a fan-out cross-partition query if the filter condition is always false #4916

Open
alesk-kontent opened this issue Nov 28, 2024 · 3 comments
Assignees
Labels

Comments

@alesk-kontent
Copy link

Describe the bug

When I run a targeted multi-partition query on a sub-partitioned container, the query becomes a fan-out cross-partition query if the following conditions are met:

  • The filter condition is always false.
  • The filter condition contains the first part of the partition key.
  • The query options do not contain a partition key.

To reproduce

I wrote a console application to reproduce the issue. Please download the solution, open the Program.cs file and specify a connection string to an Azure Cosmos DB for NoSQL account. The application creates a new database with a container Subpartitioned. It also upserts random data, runs a query (with and without a partition key in the request options) and displays relevant information. You can run the application several times; it will only create resources and upsert data once.

The container contains 100 documents (articles) with the following properties:

Id : string
CustomerId : Guid
Label : string
Title : string

The value of the Id property is a random GUID. The value of the CustomerId and Label properties is always the same. The value of the Title property is unique for each document.

The indexing policy is automatic and consistent. As for the partition key, it has two components, customerId and id, so there are 100 logical partitions with one document. The maximum throughput is 20,000 RU/s so the container has two physical partitions, one of which is empty.

The query is SELECT * FROM c WHERE c.customerId = @customerId AND c.label = @label AND c.label <> @label and the application runs it twice:

  • Request options contain a partition key with the customer id.
  • Request options do not contain a partition key.

You can also simplify the query to SELECT * FROM c WHERE c.customerId = @customerId AND false.

I can see the following results:

Container: Subpartitioned
Partition key: ["df165b31-7641-4664-9549-37862ed806ee"]
Request charge: 2,25
Physical partitions: 1

Container: Subpartitioned
Partition key:
Request charge: 4,50
Physical partitions: 0, 1

Expected behavior

As the filter condition contains the first component of the partition key, the request charge should be the same and the SDK should only read data only from one physical partition.

Actual behavior

Without a partition key in the request options the query becomes a fan-out cross-partition query so the request charge is higher as the SDK reads data from all physical partitions.

Environment summary

SDK Version: 3.46.0
OS Version: Windows 11 Enterprise (10.0.22631 Build 22631)

Additional context

It happens in both Direct and Gateway mode.

@alesk-kontent
Copy link
Author

It's been over a month and still no response. Have you been able to reproduce the issue? What could be the cause? Is this an expected behavior?

@Pilchie
Copy link
Member

Pilchie commented Jan 10, 2025

Tagging @adityasa

@adityasa adityasa self-assigned this Jan 10, 2025
@adityasa
Copy link
Contributor

adityasa commented Jan 13, 2025

@alesk-kontent - this is the current limitation of the system. Note that it's not limited to hierarchical partitioned container, but also applies to single partition key container.
As you may be aware, due to the contradiction in the filter condition, query is not supposed to return any documents.

Here's what's common between both invocations:
Since the query filter is effectively false,
(WHERE c.customerId = @customerId AND c.label = @label AND c.label <> @label is same as c.customerId = @customerId AND false which is same as WHERE false), query plan returns empty partition range.
However, at the moment, client doesn't seem to distinguish between empty partition range (for e.g. due to the contradiction in the filter condition) v/s full range (because partition elimination could not be performed - either because filter condition doesn't contain partition key or it's too complex). Client instead interprets the range as all partitions in the collection. We are looking into this issue.

Here's what's different between both invocations:
The partition ranges returned by the query plan are subsequently intersected with those specified in query request options (if present), which restricts the execution to a single partition only in case of first query. The second query, due to lack of query request options fans out to all partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants