Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Vector Search #1639

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
7284ac5
Add vector value type
cindy-peng Oct 22, 2024
fe5e966
Add vectorValue type
cindy-peng Oct 22, 2024
9c19afa
Add vector value test
cindy-peng Oct 22, 2024
8163c95
Add unit tests and system tests
cindy-peng Oct 26, 2024
6df2d8a
Fix formatting
cindy-peng Oct 26, 2024
32fdd41
Fix empty FindNearest pb instance
cindy-peng Oct 28, 2024
2b11399
Fix formatting
cindy-peng Oct 28, 2024
6165685
Fix javadoc
cindy-peng Oct 28, 2024
4461f5b
Merge from main
cindy-peng Oct 28, 2024
17b411f
fix(sample): change update entity sample to use transaction (#1633)
cindy-peng Oct 24, 2024
2aafa17
deps: update dependency com.google.cloud:sdk-platform-java-config to …
renovate-bot Oct 24, 2024
949a0ae
chore(main): release 2.24.0 (#1631)
release-please[bot] Oct 25, 2024
11b3227
deps: update googleapis/sdk-platform-java action to v2.49.0 (#1638)
renovate-bot Oct 28, 2024
742c7b9
chore(main): release 2.24.1-SNAPSHOT (#1635)
release-please[bot] Oct 28, 2024
089b68e
chore: Update generation configuration at Sun Oct 27 02:26:19 UTC 202…
cloud-java-bot Oct 28, 2024
81980d2
deps: update dependency com.google.cloud:sdk-platform-java-config to …
renovate-bot Oct 28, 2024
546cf81
chore(main): release 2.24.1 (#1641)
release-please[bot] Oct 28, 2024
4b21c3e
merging conflict
cindy-peng Oct 28, 2024
c495bb6
chore: generate libraries at Mon Oct 28 20:25:23 UTC 2024
cloud-java-bot Oct 28, 2024
e93ce5c
Fix import
cindy-peng Oct 28, 2024
0072de4
chore: generate libraries at Mon Oct 28 20:30:34 UTC 2024
cloud-java-bot Oct 28, 2024
a371467
Add Integration test
cindy-peng Oct 30, 2024
c8340bf
Add comment and fix formatting
cindy-peng Oct 30, 2024
62da35d
Modify comment and fix formatting
cindy-peng Oct 30, 2024
9694abb
Add setExcludeFromIndexes back to vectorvalue builder
cindy-peng Oct 30, 2024
f267339
Adjust testVectorSearch sample code
cindy-peng Oct 31, 2024
e2f8e87
Add system tests check details
cindy-peng Dec 2, 2024
d66e0d6
Merge branch 'main' of https://github.com/googleapis/java-datastore i…
cindy-peng Dec 2, 2024
1806f0a
chore: generate libraries at Mon Dec 2 20:09:57 UTC 2024
cloud-java-bot Dec 2, 2024
fca3689
fix initial numer of entities for ITDatastoreTst
cindy-peng Dec 2, 2024
9d79efc
Merge branch 'cindy/vector-search-1' of https://github.com/googleapis…
cindy-peng Dec 2, 2024
60dd19d
Add sample code for Java datastore vector search
cindy-peng Dec 3, 2024
d7c2559
chore: generate libraries at Tue Dec 3 00:29:52 UTC 2024
cloud-java-bot Dec 3, 2024
4d0a125
Merge branch 'main' into cindy/vector-search-1
cindy-peng Dec 5, 2024
6679cd2
Add tests to sample code
cindy-peng Dec 14, 2024
402d2cd
Resolving conflicts
cindy-peng Dec 14, 2024
cf1bb19
Fix added interface method warning
cindy-peng Dec 14, 2024
7c46b40
Fix mvn lint formatting
cindy-peng Dec 14, 2024
a4f2a6c
Merge branch 'main' into cindy/vector-search-1
cindy-peng Dec 14, 2024
1c73ec9
chore: generate libraries at Sat Dec 14 01:46:19 UTC 2024
cloud-java-bot Dec 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 22 additions & 60 deletions .readme-partials.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,33 +110,28 @@ custom_content: |
-------
In this feature launch, the [Java Datastore client](https://github.com/googleapis/java-datastore) now offers gRPC as a transport layer option with experimental support. Using [gRPC connection pooling](https://grpc.io/docs/guides/performance/) enables distributing RPCs over multiple connections which may improve performance.

#### Installation Instructions
The client can be built from the `grpc-experimental` branch on GitHub. For private preview, you can also download the artifact with the instructions provided below.

1. Download the datastore private preview package with dependencies:
```
curl -o <path-to-downloaded-jar> https://datastore-sdk-feature-release.web.app/google-cloud-datastore-2.20.0-grpc-experimental-1-SNAPSHOT-jar-with-dependencies.jar
```
2. Run the following commands to install JDK locally:
```
mvn install:install-file -Dfile=<path-to-downloaded-jar> -DgroupId=com.google.cloud -DartifactId=google-cloud-datastore -Dversion=2.20.0-grpc
```
3. Edit your pom.xml to add above package to `<dependencies/>` section:
```xml
<dependency>
#### Download Instructions
Instructions:
1. Clone the grpc-experimental branch from GitHub:
```python
git clone -b grpc-experimental https://github.com/googleapis/java-datastore.git
```
2. Run the following commands to build the library:
```python
# Go to the directory the code was downloaded to
cd java-datastore/

# Build the library
mvn clean install -DskipTests=true
```
3. Add the following dependency to your project:
```xml
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-datastore</artifactId>
<version>2.20.0-grpc-experimental-1-SNAPSHOT</version>
</dependency>
```

And if you have not yet, add below to `<repositories/>` section:
```xml
<repository>
<id>local-repo</id>
<url>file://${user.home}/.m2/repository</url>
</repository>
```
</dependency>
```

#### How to Use
To opt-in to the gRPC transport behavior, simply add the below line of code (`setTransportOptions`) to your Datastore client instantiation.
Expand Down Expand Up @@ -186,44 +181,11 @@ custom_content: |
#### New Features
There are new gRPC specific features available to use in this update.

##### Connection Pool
A connection pool, also known as a channel pool, is a cache of database connections that are shared and reused to improve connection latency and performance. With this update, now you will be able to configure the channel pool to improve application performance. This section guides you in determining the optimal connection pool size and configuring it within the Java datastore client.
To customize the number of channels your client uses, you can update the channel provider in the DatastoreOptions.
###### Determine the best connection pool size
The default connection pool size is right for most applications, and in most cases there's no need to change it.

However sometimes you may want to change your connection pool size due to high throughput or buffered requests. Ideally, to leave room for traffic fluctuations, a connection pool has about twice the number of connections it takes for maximum saturation. Because a connection can handle a maximum of 100 concurrent requests, between 10 and 50 outstanding requests per connection is optimal. The limit of 100 concurrent streams per gRPC connection is enforced in Google's middleware layer, and you are not able to reconfigure this number.

The following steps help you calculate the optimal number of connections in your channel pool using estimate per-client QPS and average latency numbers.

To calculate the optimal connections, gather the following information:

1. The maximum number of queries per second (QPS) per client when your application is running a typical workload.
2. The average latency (the response time for a single request) in ms.
3. Determine the number of requests that you can send serially per second by dividing 1,000 by the average latency value.
4. Divide the QPS in seconds by the number of serial requests per second.
5. Divide the result by 50 requests per channel to determine the minimum optimal channel pool size. (If your calculation is less than 2, use at least 2 channels anyway, to ensure redundancy.)
6. Divide the same result by 10 requests per channel to determine the maximum optimal channel pool size.

These steps are expressed in the following equations:
```java
(QPS ÷ (1,000 ÷ latency ms)) ÷ 50 streams = Minimum optimal number of connections
(QPS ÷ (1,000 ÷ latency ms)) ÷ 10 streams = Maximum optimal number of connections
```

###### Example
Your application typically sends 50,000 requests per second, and the average latency is 10 ms. Divide 1,000 by 10 ms to determine that you can send 100 requests serially per second.
Divide that number into 50,000 to get the parallelism needed to send 50,000 QPS: 500. Each channel can have at most 100 requests out concurrently, and your target channel utilization
is between 10 and 50 concurrent streams. Therefore, to calculate the minimum, divide 500 by 50 to get 10. To find the maximum, divide 500 by 10 to get 50. This means that your channel
pool size for this example should be between 10 and 50 connections.

It is also important to monitor your traffic after making changes and adjust the number of connections in your pool if necessary.

###### Set the pool size
The following code sample demonstrates how to configure the channel pool in the client libraries using `DatastoreOptions`.
##### Channel Pooling
To customize the number of channels your client uses, you can update the channel provider in the DatastoreOptions.
See [ChannelPoolSettings](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings) and [Performance Best Practices](https://grpc.io/docs/guides/performance/) for more information on channel pools and best practices for performance.

Code Example
Example:
```java
InstantiatingGrpcChannelProvider channelProvider =
DatastoreSettings.defaultGrpcTransportProviderBuilder()
Expand Down
89 changes: 29 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,33 +208,28 @@ gRPC Java Datastore Client User Guide
-------
In this feature launch, the [Java Datastore client](https://github.com/googleapis/java-datastore) now offers gRPC as a transport layer option with experimental support. Using [gRPC connection pooling](https://grpc.io/docs/guides/performance/) enables distributing RPCs over multiple connections which may improve performance.

#### Installation Instructions
The client can be built from the `grpc-experimental` branch on GitHub. For private preview, you can also download the artifact with the instructions provided below.

1. Download the datastore private preview package with dependencies:
```
curl -o <path-to-downloaded-jar> https://datastore-sdk-feature-release.web.app/google-cloud-datastore-2.20.0-grpc-experimental-1-SNAPSHOT-jar-with-dependencies.jar
```
2. Run the following commands to install JDK locally:
```
mvn install:install-file -Dfile=<path-to-downloaded-jar> -DgroupId=com.google.cloud -DartifactId=google-cloud-datastore -Dversion=2.20.0-grpc
```
3. Edit your pom.xml to add above package to `<dependencies/>` section:
```xml
<dependency>
#### Download Instructions
Instructions:
1. Clone the grpc-experimental branch from GitHub:
```python
git clone -b grpc-experimental https://github.com/googleapis/java-datastore.git
```
2. Run the following commands to build the library:
```python
# Go to the directory the code was downloaded to
cd java-datastore/

# Build the library
mvn clean install -DskipTests=true
```
3. Add the following dependency to your project:
```xml
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-datastore</artifactId>
<version>2.20.0-grpc-experimental-1-SNAPSHOT</version>
</dependency>
```

And if you have not yet, add below to `<repositories/>` section:
```xml
<repository>
<id>local-repo</id>
<url>file://${user.home}/.m2/repository</url>
</repository>
```
</dependency>
```

#### How to Use
To opt-in to the gRPC transport behavior, simply add the below line of code (`setTransportOptions`) to your Datastore client instantiation.
Expand Down Expand Up @@ -284,44 +279,11 @@ boolean isHTTP = datastore.getOptions().getTransportOptions() instanceof HTTPTra
#### New Features
There are new gRPC specific features available to use in this update.

##### Connection Pool
A connection pool, also known as a channel pool, is a cache of database connections that are shared and reused to improve connection latency and performance. With this update, now you will be able to configure the channel pool to improve application performance. This section guides you in determining the optimal connection pool size and configuring it within the Java datastore client.
To customize the number of channels your client uses, you can update the channel provider in the DatastoreOptions.
###### Determine the best connection pool size
The default connection pool size is right for most applications, and in most cases there's no need to change it.

However sometimes you may want to change your connection pool size due to high throughput or buffered requests. Ideally, to leave room for traffic fluctuations, a connection pool has about twice the number of connections it takes for maximum saturation. Because a connection can handle a maximum of 100 concurrent requests, between 10 and 50 outstanding requests per connection is optimal. The limit of 100 concurrent streams per gRPC connection is enforced in Google's middleware layer, and you are not able to reconfigure this number.

The following steps help you calculate the optimal number of connections in your channel pool using estimate per-client QPS and average latency numbers.

To calculate the optimal connections, gather the following information:

1. The maximum number of queries per second (QPS) per client when your application is running a typical workload.
2. The average latency (the response time for a single request) in ms.
3. Determine the number of requests that you can send serially per second by dividing 1,000 by the average latency value.
4. Divide the QPS in seconds by the number of serial requests per second.
5. Divide the result by 50 requests per channel to determine the minimum optimal channel pool size. (If your calculation is less than 2, use at least 2 channels anyway, to ensure redundancy.)
6. Divide the same result by 10 requests per channel to determine the maximum optimal channel pool size.

These steps are expressed in the following equations:
```java
(QPS ÷ (1,000 ÷ latency ms)) ÷ 50 streams = Minimum optimal number of connections
(QPS ÷ (1,000 ÷ latency ms)) ÷ 10 streams = Maximum optimal number of connections
```

###### Example
Your application typically sends 50,000 requests per second, and the average latency is 10 ms. Divide 1,000 by 10 ms to determine that you can send 100 requests serially per second.
Divide that number into 50,000 to get the parallelism needed to send 50,000 QPS: 500. Each channel can have at most 100 requests out concurrently, and your target channel utilization
is between 10 and 50 concurrent streams. Therefore, to calculate the minimum, divide 500 by 50 to get 10. To find the maximum, divide 500 by 10 to get 50. This means that your channel
pool size for this example should be between 10 and 50 connections.

It is also important to monitor your traffic after making changes and adjust the number of connections in your pool if necessary.

###### Set the pool size
The following code sample demonstrates how to configure the channel pool in the client libraries using `DatastoreOptions`.
##### Channel Pooling
To customize the number of channels your client uses, you can update the channel provider in the DatastoreOptions.
See [ChannelPoolSettings](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings) and [Performance Best Practices](https://grpc.io/docs/guides/performance/) for more information on channel pools and best practices for performance.

Code Example
Example:
```java
InstantiatingGrpcChannelProvider channelProvider =
DatastoreSettings.defaultGrpcTransportProviderBuilder()
Expand Down Expand Up @@ -413,6 +375,13 @@ Samples are in the [`samples/`](https://github.com/googleapis/java-datastore/tre
| Query Profile Explain Aggregation | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/queryprofile/QueryProfileExplainAggregation.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/queryprofile/QueryProfileExplainAggregation.java) |
| Query Profile Explain Analyze | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/queryprofile/QueryProfileExplainAnalyze.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/queryprofile/QueryProfileExplainAnalyze.java) |
| Query Profile Explain Analyze Aggregation | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/queryprofile/QueryProfileExplainAnalyzeAggregation.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/queryprofile/QueryProfileExplainAnalyzeAggregation.java) |
| Store Vectors | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/StoreVectors.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/StoreVectors.java) |
| Vector Search Basic | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchBasic.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchBasic.java) |
| Vector Search Distance Result Property | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchDistanceResultProperty.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchDistanceResultProperty.java) |
| Vector Search Distance Result Property Projection | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchDistanceResultPropertyProjection.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchDistanceResultPropertyProjection.java) |
| Vector Search Distance Threshold | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchDistanceThreshold.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchDistanceThreshold.java) |
| Vector Search Large Response | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchLargeResponse.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchLargeResponse.java) |
| Vector Search Prefilter | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchPrefilter.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/example/datastore/vectorsearch/VectorSearchPrefilter.java) |
| Task List | [source code](https://github.com/googleapis/java-datastore/blob/main/samples/snippets/src/main/java/com/google/datastore/snippets/TaskList.java) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/java-datastore&page=editor&open_in_editor=samples/snippets/src/main/java/com/google/datastore/snippets/TaskList.java) |


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.FindNearest;
import com.google.cloud.datastore.FullEntity;
import com.google.cloud.datastore.IncompleteKey;
import com.google.cloud.datastore.Key;
Expand All @@ -47,6 +48,7 @@
import com.google.cloud.datastore.StructuredQuery.OrderBy;
import com.google.cloud.datastore.StructuredQuery.PropertyFilter;
import com.google.cloud.datastore.Transaction;
import com.google.cloud.datastore.VectorValue;
import com.google.cloud.datastore.testing.LocalDatastoreHelper;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
Expand Down Expand Up @@ -407,6 +409,7 @@ private void setUpQueryTests() {
"description",
StringValue.newBuilder("Learn Cloud Datastore").setExcludeFromIndexes(true).build())
.set("tag", "fun", "l", "programming", "learn")
.set("vector_property", VectorValue.newBuilder(3.0, 1.0, 2.0).build())
.build());
}

Expand Down Expand Up @@ -1192,4 +1195,18 @@ public void testStaleReads() throws InterruptedException {
// [END datastore_stale_read]
assertValidQueryRealBackend(query);
}

@Test
public void testVectorSearch() {
setUpQueryTests();
// [START datastore_vector_search]
VectorValue vectorValue = VectorValue.newBuilder(1.78, 2.56, 3.88).build();
FindNearest vectorQuery =
new FindNearest(
"vector_property", vectorValue, FindNearest.DistanceMeasure.COSINE, 1, "distance");

Query<Entity> query = Query.newEntityQueryBuilder().setFindNearest(vectorQuery).build();
// [END datastore_vector_search]
assertValidQuery(query);
}
}
Loading
Loading