Skip to content

Commit

Permalink
feat: [discoveryengine] add Chunk resource in the search response (#5526
Browse files Browse the repository at this point in the history
)

* feat: add Chunk resource in the search response
feat: add NO_RELEVANT_CONTENT to Answer API
feat: support AlloyDB Connector
docs: keep the API doc up-to-date with recent changes

PiperOrigin-RevId: 649156977

Source-Link: googleapis/googleapis@ff081c9

Source-Link: googleapis/googleapis-gen@bbee862
Copy-Tag: eyJwIjoicGFja2FnZXMvZ29vZ2xlLWNsb3VkLWRpc2NvdmVyeWVuZ2luZS8uT3dsQm90LnlhbWwiLCJoIjoiYmJlZTg2MjU3ZTE3ZDU1MGU3OTU2ZDQyYTk1ZDljMWZlNTAyMWM4OCJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: sofisl <[email protected]>
  • Loading branch information
3 people authored Jul 9, 2024
1 parent 9a80089 commit b0dc1b2
Show file tree
Hide file tree
Showing 67 changed files with 28,464 additions and 9,763 deletions.
2 changes: 2 additions & 0 deletions packages/google-cloud-discoveryengine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ Samples are in the [`samples/`](https://github.com/googleapis/google-cloud-node/
| Sample | Source Code | Try it |
| --------------------------- | --------------------------------- | ------ |
| Completion_service.complete_query | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.complete_query.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.complete_query.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.import_completion_suggestions | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_completion_suggestions.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_completion_suggestions.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.import_suggestion_deny_list_entries | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_suggestion_deny_list_entries.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_suggestion_deny_list_entries.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.purge_completion_suggestions | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_completion_suggestions.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_completion_suggestions.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.purge_suggestion_deny_list_entries | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_suggestion_deny_list_entries.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_suggestion_deny_list_entries.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Control_service.create_control | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/control_service.create_control.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/control_service.create_control.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Control_service.delete_control | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/control_service.delete_control.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/control_service.delete_control.js,packages/google-cloud-discoveryengine/samples/README.md) |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,13 @@ message Answer {
// If citation_type is CHUNK_LEVEL_CITATION and chunk mode is on,
// populate chunk info.
repeated ChunkInfo chunk_info = 5;

// Data representation.
// The structured JSON data for the document.
// It's populated from the struct data from the Document (code
// pointer: http://shortn/_objzAfIiHq), or the Chunk in search result
// (code pointer: http://shortn/_Ipo6KFFGBL).
google.protobuf.Struct struct_data = 6;
}

// Search results observed by the search action, it can be snippets info
Expand Down Expand Up @@ -296,6 +303,12 @@ message Answer {
// Google skips the answer if there is a potential policy violation
// detected. This includes content that may be violent or toxic.
POTENTIAL_POLICY_VIOLATION = 4;

// The no relevant content case.
//
// Google skips the answer if there is no relevant content in the
// retrieved search results.
NO_RELEVANT_CONTENT = 5;
}

// Immutable. Fully qualified name
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
// Copyright 2024 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax = "proto3";

package google.cloud.discoveryengine.v1;

import "google/api/field_behavior.proto";
import "google/api/resource.proto";
import "google/protobuf/struct.proto";

option csharp_namespace = "Google.Cloud.DiscoveryEngine.V1";
option go_package = "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb;discoveryenginepb";
option java_multiple_files = true;
option java_outer_classname = "ChunkProto";
option java_package = "com.google.cloud.discoveryengine.v1";
option objc_class_prefix = "DISCOVERYENGINE";
option php_namespace = "Google\\Cloud\\DiscoveryEngine\\V1";
option ruby_package = "Google::Cloud::DiscoveryEngine::V1";

// Chunk captures all raw metadata information of items to be recommended or
// searched in the chunk mode.
message Chunk {
option (google.api.resource) = {
type: "discoveryengine.googleapis.com/Chunk"
pattern: "projects/{project}/locations/{location}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
pattern: "projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
};

// Document metadata contains the information of the document of the current
// chunk.
message DocumentMetadata {
// Uri of the document.
string uri = 1;

// Title of the document.
string title = 2;

// Data representation.
// The structured JSON data for the document. It should conform to the
// registered [Schema][google.cloud.discoveryengine.v1.Schema] or an
// `INVALID_ARGUMENT` error is thrown.
google.protobuf.Struct struct_data = 3;
}

// Page span of the chunk.
message PageSpan {
// The start page of the chunk.
int32 page_start = 1;

// The end page of the chunk.
int32 page_end = 2;
}

// Metadata of the current chunk. This field is only populated on
// [SearchService.Search][google.cloud.discoveryengine.v1.SearchService.Search]
// API.
message ChunkMetadata {
// The previous chunks of the current chunk. The number is controlled by
// [SearchRequest.ContentSearchSpec.ChunkSpec.num_previous_chunks][google.cloud.discoveryengine.v1.SearchRequest.ContentSearchSpec.ChunkSpec.num_previous_chunks].
// This field is only populated on
// [SearchService.Search][google.cloud.discoveryengine.v1.SearchService.Search]
// API.
repeated Chunk previous_chunks = 1;

// The next chunks of the current chunk. The number is controlled by
// [SearchRequest.ContentSearchSpec.ChunkSpec.num_next_chunks][google.cloud.discoveryengine.v1.SearchRequest.ContentSearchSpec.ChunkSpec.num_next_chunks].
// This field is only populated on
// [SearchService.Search][google.cloud.discoveryengine.v1.SearchService.Search]
// API.
repeated Chunk next_chunks = 2;
}

// The full resource name of the chunk.
// Format:
// `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}/chunks/{chunk_id}`.
//
// This field must be a UTF-8 encoded string with a length limit of 1024
// characters.
string name = 1;

// Unique chunk ID of the current chunk.
string id = 2;

// Content is a string from a document (parsed content).
string content = 3;

// Output only. Represents the relevance score based on similarity.
// Higher score indicates higher chunk relevance.
// The score is in range [-1.0, 1.0].
// Only populated on [SearchService.SearchResponse][].
optional double relevance_score = 8
[(google.api.field_behavior) = OUTPUT_ONLY];

// Metadata of the document from the current chunk.
DocumentMetadata document_metadata = 5;

// Output only. This field is OUTPUT_ONLY.
// It contains derived data that are not in the original input document.
google.protobuf.Struct derived_struct_data = 4
[(google.api.field_behavior) = OUTPUT_ONLY];

// Page span of the chunk.
PageSpan page_span = 6;

// Output only. Metadata of the current chunk.
ChunkMetadata chunk_metadata = 7 [(google.api.field_behavior) = OUTPUT_ONLY];
}
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,6 @@ option (google.api.resource_definition) = {
type: "healthcare.googleapis.com/FhirStore"
pattern: "projects/{project}/locations/{location}/datasets/{dataset}/fhirStores/{fhir_store}"
};
option (google.api.resource_definition) = {
type: "discoveryengine.googleapis.com/Chunk"
pattern: "projects/{project}/locations/{location}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
pattern: "projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
};

// The industry vertical associated with the
// [DataStore][google.cloud.discoveryengine.v1.DataStore].
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,34 @@ message SuggestionDenyListEntry {
// exact phrase, or block any suggestions containing this phrase.
MatchOperator match_operator = 2 [(google.api.field_behavior) = REQUIRED];
}

// Autocomplete suggestions that are imported from Customer.
message CompletionSuggestion {
// Ranking metrics of this suggestion.
oneof ranking_info {
// Global score of this suggestion. Control how this suggestion would be
// scored / ranked.
double global_score = 2;

// Frequency of this suggestion. Will be used to rank suggestions when score
// is not available.
int64 frequency = 3;
}

// Required. The suggestion text.
string suggestion = 1 [(google.api.field_behavior) = REQUIRED];

// BCP-47 language code of this suggestion.
string language_code = 4;

// If two suggestions have the same groupId, they will not be
// returned together. Instead the one ranked higher will be returned. This can
// be used to deduplicate semantically identical suggestions.
string group_id = 5;

// The score of this suggestion within its group.
double group_score = 6;

// Alternative matching phrases for this suggestion.
repeated string alternative_phrases = 7;
}
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,44 @@ service CompletionService {
metadata_type: "google.cloud.discoveryengine.v1.PurgeSuggestionDenyListEntriesMetadata"
};
}

// Imports
// [CompletionSuggestion][google.cloud.discoveryengine.v1.CompletionSuggestion]s
// for a DataStore.
rpc ImportCompletionSuggestions(ImportCompletionSuggestionsRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=projects/*/locations/*/collections/*/dataStores/*}/completionSuggestions:import"
body: "*"
additional_bindings {
post: "/v1/{parent=projects/*/locations/*/dataStores/*}/completionSuggestions:import"
body: "*"
}
};
option (google.longrunning.operation_info) = {
response_type: "google.cloud.discoveryengine.v1.ImportCompletionSuggestionsResponse"
metadata_type: "google.cloud.discoveryengine.v1.ImportCompletionSuggestionsMetadata"
};
}

// Permanently deletes all
// [CompletionSuggestion][google.cloud.discoveryengine.v1.CompletionSuggestion]s
// for a DataStore.
rpc PurgeCompletionSuggestions(PurgeCompletionSuggestionsRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=projects/*/locations/*/collections/*/dataStores/*}/completionSuggestions:purge"
body: "*"
additional_bindings {
post: "/v1/{parent=projects/*/locations/*/dataStores/*}/completionSuggestions:purge"
body: "*"
}
};
option (google.longrunning.operation_info) = {
response_type: "google.cloud.discoveryengine.v1.PurgeCompletionSuggestionsResponse"
metadata_type: "google.cloud.discoveryengine.v1.PurgeCompletionSuggestionsMetadata"
};
}
}

// Request message for
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,17 @@ message AnswerQueryRequest {
// If this field is unrecognizable, an `INVALID_ARGUMENT` is returned.
string order_by = 4;

// Specifies the search result mode. If unspecified, the
// search result mode is based on
// [DataStore.DocumentProcessingConfig.chunking_config][]:
// * If [DataStore.DocumentProcessingConfig.chunking_config][] is
// specified,
// it defaults to `CHUNKS`.
// * Otherwise, it defaults to `DOCUMENTS`.
// See [parse and chunk
// documents](https://cloud.google.com/generative-ai-app-builder/docs/parse-chunk-documents)
SearchRequest.ContentSearchSpec.SearchResultMode search_result_mode = 5;

// Specs defining dataStores to filter on in a search call and
// configurations for those dataStores. This is only considered for
// engines with multiple dataStores use case. For single dataStore within
Expand Down Expand Up @@ -706,6 +717,11 @@ message AnswerQueryRequest {
message QueryRephraserSpec {
// Disable query rephraser.
bool disable = 1;

// Max rephrase steps.
// The max number is 5 steps.
// If not set or set to < 1, it will be set to 1 by default.
int32 max_rephrase_steps = 2;
}

// Query classification specification.
Expand Down Expand Up @@ -777,6 +793,25 @@ message AnswerQueryRequest {
// The field must be a UTF-8 encoded string with a length limit of 128
// characters. Otherwise, an `INVALID_ARGUMENT` error is returned.
string user_pseudo_id = 12;

// The user labels applied to a resource must meet the following requirements:
//
// * Each resource can have multiple labels, up to a maximum of 64.
// * Each label must be a key-value pair.
// * Keys have a minimum length of 1 character and a maximum length of 63
// characters and cannot be empty. Values can be empty and have a maximum
// length of 63 characters.
// * Keys and values can contain only lowercase letters, numeric characters,
// underscores, and dashes. All characters must use UTF-8 encoding, and
// international characters are allowed.
// * The key portion of a label must be unique. However, you can use the same
// key with multiple resources.
// * Keys must start with a lowercase letter or international character.
//
// See [Google Cloud
// Document](https://cloud.google.com/resource-manager/docs/creating-managing-labels#requirements)
// for more details.
map<string, string> user_labels = 13;
}

// Response message for
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ message Document {

// The URI of the content. Only Cloud Storage URIs (e.g.
// `gs://bucket-name/path/to/file`) are supported. The maximum file size
// is 2.5 MB for text-based formats, 100 MB for other formats.
// is 2.5 MB for text-based formats, 200 MB for other formats.
string uri = 3;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,30 @@ message DocumentProcessingConfig {
pattern: "projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/documentProcessingConfig"
};

// Configuration for chunking config.
message ChunkingConfig {
// Configuration for the layout based chunking.
message LayoutBasedChunkingConfig {
// The token size limit for each chunk.
//
// Supported values: 100-500 (inclusive).
// Default value: 500.
int32 chunk_size = 1;

// Whether to include appending different levels of headings to chunks
// from the middle of the document to prevent context loss.
//
// Default value: False.
bool include_ancestor_headings = 2;
}

// Additional configs that defines the behavior of the chunking.
oneof chunk_mode {
// Configuration for the layout based chunking.
LayoutBasedChunkingConfig layout_based_chunking_config = 1;
}
}

// Related configurations applied to a specific type of document parser.
message ParsingConfig {
// The digital parsing configurations for documents.
Expand All @@ -57,6 +81,9 @@ message DocumentProcessingConfig {
bool use_native_text = 2;
}

// The layout parsing configurations for documents.
message LayoutParsingConfig {}

// Configs for document processing types.
oneof type_dedicated_config {
// Configurations applied to digital parser.
Expand All @@ -65,6 +92,9 @@ message DocumentProcessingConfig {
// Configurations applied to OCR parser. Currently it only applies to
// PDFs.
OcrParsingConfig ocr_parsing_config = 2;

// Configurations applied to layout parser.
LayoutParsingConfig layout_parsing_config = 3;
}
}

Expand All @@ -73,6 +103,9 @@ message DocumentProcessingConfig {
// `projects/*/locations/*/collections/*/dataStores/*/documentProcessingConfig`.
string name = 1;

// Whether chunking mode is enabled.
ChunkingConfig chunking_config = 3;

// Configurations for default Document parser.
// If not specified, we will configure it as default DigitalParsingConfig, and
// the default parsing config will be applied to all file types for Document
Expand All @@ -85,8 +118,10 @@ message DocumentProcessingConfig {
// * `pdf`: Override parsing config for PDF files, either digital parsing, ocr
// parsing or layout parsing is supported.
// * `html`: Override parsing config for HTML files, only digital parsing and
// or layout parsing are supported.
// layout parsing are supported.
// * `docx`: Override parsing config for DOCX files, only digital parsing and
// or layout parsing are supported.
// layout parsing are supported.
// * `pptx`: Override parsing config for PPTX files, only digital parsing and
// layout parsing are supported.
map<string, ParsingConfig> parsing_config_overrides = 5;
}
Loading

0 comments on commit b0dc1b2

Please sign in to comment.