Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [discoveryengine] add Chunk resource in the search response #5526

Merged
merged 5 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions packages/google-cloud-discoveryengine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ Samples are in the [`samples/`](https://github.com/googleapis/google-cloud-node/
| Sample | Source Code | Try it |
| --------------------------- | --------------------------------- | ------ |
| Completion_service.complete_query | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.complete_query.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.complete_query.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.import_completion_suggestions | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_completion_suggestions.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_completion_suggestions.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.import_suggestion_deny_list_entries | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_suggestion_deny_list_entries.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.import_suggestion_deny_list_entries.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.purge_completion_suggestions | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_completion_suggestions.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_completion_suggestions.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Completion_service.purge_suggestion_deny_list_entries | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_suggestion_deny_list_entries.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/completion_service.purge_suggestion_deny_list_entries.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Control_service.create_control | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/control_service.create_control.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/control_service.create_control.js,packages/google-cloud-discoveryengine/samples/README.md) |
| Control_service.delete_control | [source code](https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samples/generated/v1/control_service.delete_control.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/google-cloud-node&page=editor&open_in_editor=packages/google-cloud-discoveryengine/samples/generated/v1/control_service.delete_control.js,packages/google-cloud-discoveryengine/samples/README.md) |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,13 @@ message Answer {
// If citation_type is CHUNK_LEVEL_CITATION and chunk mode is on,
// populate chunk info.
repeated ChunkInfo chunk_info = 5;

// Data representation.
// The structured JSON data for the document.
// It's populated from the struct data from the Document (code
// pointer: http://shortn/_objzAfIiHq), or the Chunk in search result
// (code pointer: http://shortn/_Ipo6KFFGBL).
google.protobuf.Struct struct_data = 6;
}

// Search results observed by the search action, it can be snippets info
Expand Down Expand Up @@ -296,6 +303,12 @@ message Answer {
// Google skips the answer if there is a potential policy violation
// detected. This includes content that may be violent or toxic.
POTENTIAL_POLICY_VIOLATION = 4;

// The no relevant content case.
//
// Google skips the answer if there is no relevant content in the
// retrieved search results.
NO_RELEVANT_CONTENT = 5;
}

// Immutable. Fully qualified name
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
// Copyright 2024 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax = "proto3";

package google.cloud.discoveryengine.v1;

import "google/api/field_behavior.proto";
import "google/api/resource.proto";
import "google/protobuf/struct.proto";

option csharp_namespace = "Google.Cloud.DiscoveryEngine.V1";
option go_package = "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb;discoveryenginepb";
option java_multiple_files = true;
option java_outer_classname = "ChunkProto";
option java_package = "com.google.cloud.discoveryengine.v1";
option objc_class_prefix = "DISCOVERYENGINE";
option php_namespace = "Google\\Cloud\\DiscoveryEngine\\V1";
option ruby_package = "Google::Cloud::DiscoveryEngine::V1";

// Chunk captures all raw metadata information of items to be recommended or
// searched in the chunk mode.
message Chunk {
option (google.api.resource) = {
type: "discoveryengine.googleapis.com/Chunk"
pattern: "projects/{project}/locations/{location}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
pattern: "projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
};

// Document metadata contains the information of the document of the current
// chunk.
message DocumentMetadata {
// Uri of the document.
string uri = 1;

// Title of the document.
string title = 2;

// Data representation.
// The structured JSON data for the document. It should conform to the
// registered [Schema][google.cloud.discoveryengine.v1.Schema] or an
// `INVALID_ARGUMENT` error is thrown.
google.protobuf.Struct struct_data = 3;
}

// Page span of the chunk.
message PageSpan {
// The start page of the chunk.
int32 page_start = 1;

// The end page of the chunk.
int32 page_end = 2;
}

// Metadata of the current chunk. This field is only populated on
// [SearchService.Search][google.cloud.discoveryengine.v1.SearchService.Search]
// API.
message ChunkMetadata {
// The previous chunks of the current chunk. The number is controlled by
// [SearchRequest.ContentSearchSpec.ChunkSpec.num_previous_chunks][google.cloud.discoveryengine.v1.SearchRequest.ContentSearchSpec.ChunkSpec.num_previous_chunks].
// This field is only populated on
// [SearchService.Search][google.cloud.discoveryengine.v1.SearchService.Search]
// API.
repeated Chunk previous_chunks = 1;

// The next chunks of the current chunk. The number is controlled by
// [SearchRequest.ContentSearchSpec.ChunkSpec.num_next_chunks][google.cloud.discoveryengine.v1.SearchRequest.ContentSearchSpec.ChunkSpec.num_next_chunks].
// This field is only populated on
// [SearchService.Search][google.cloud.discoveryengine.v1.SearchService.Search]
// API.
repeated Chunk next_chunks = 2;
}

// The full resource name of the chunk.
// Format:
// `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}/chunks/{chunk_id}`.
//
// This field must be a UTF-8 encoded string with a length limit of 1024
// characters.
string name = 1;

// Unique chunk ID of the current chunk.
string id = 2;

// Content is a string from a document (parsed content).
string content = 3;

// Output only. Represents the relevance score based on similarity.
// Higher score indicates higher chunk relevance.
// The score is in range [-1.0, 1.0].
// Only populated on [SearchService.SearchResponse][].
optional double relevance_score = 8
[(google.api.field_behavior) = OUTPUT_ONLY];

// Metadata of the document from the current chunk.
DocumentMetadata document_metadata = 5;

// Output only. This field is OUTPUT_ONLY.
// It contains derived data that are not in the original input document.
google.protobuf.Struct derived_struct_data = 4
[(google.api.field_behavior) = OUTPUT_ONLY];

// Page span of the chunk.
PageSpan page_span = 6;

// Output only. Metadata of the current chunk.
ChunkMetadata chunk_metadata = 7 [(google.api.field_behavior) = OUTPUT_ONLY];
}
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,6 @@ option (google.api.resource_definition) = {
type: "healthcare.googleapis.com/FhirStore"
pattern: "projects/{project}/locations/{location}/datasets/{dataset}/fhirStores/{fhir_store}"
};
option (google.api.resource_definition) = {
type: "discoveryengine.googleapis.com/Chunk"
pattern: "projects/{project}/locations/{location}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
pattern: "projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}/chunks/{chunk}"
};

// The industry vertical associated with the
// [DataStore][google.cloud.discoveryengine.v1.DataStore].
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,34 @@ message SuggestionDenyListEntry {
// exact phrase, or block any suggestions containing this phrase.
MatchOperator match_operator = 2 [(google.api.field_behavior) = REQUIRED];
}

// Autocomplete suggestions that are imported from Customer.
message CompletionSuggestion {
// Ranking metrics of this suggestion.
oneof ranking_info {
// Global score of this suggestion. Control how this suggestion would be
// scored / ranked.
double global_score = 2;

// Frequency of this suggestion. Will be used to rank suggestions when score
// is not available.
int64 frequency = 3;
}

// Required. The suggestion text.
string suggestion = 1 [(google.api.field_behavior) = REQUIRED];

// BCP-47 language code of this suggestion.
string language_code = 4;

// If two suggestions have the same groupId, they will not be
// returned together. Instead the one ranked higher will be returned. This can
// be used to deduplicate semantically identical suggestions.
string group_id = 5;

// The score of this suggestion within its group.
double group_score = 6;

// Alternative matching phrases for this suggestion.
repeated string alternative_phrases = 7;
}
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,44 @@ service CompletionService {
metadata_type: "google.cloud.discoveryengine.v1.PurgeSuggestionDenyListEntriesMetadata"
};
}

// Imports
// [CompletionSuggestion][google.cloud.discoveryengine.v1.CompletionSuggestion]s
// for a DataStore.
rpc ImportCompletionSuggestions(ImportCompletionSuggestionsRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=projects/*/locations/*/collections/*/dataStores/*}/completionSuggestions:import"
body: "*"
additional_bindings {
post: "/v1/{parent=projects/*/locations/*/dataStores/*}/completionSuggestions:import"
body: "*"
}
};
option (google.longrunning.operation_info) = {
response_type: "google.cloud.discoveryengine.v1.ImportCompletionSuggestionsResponse"
metadata_type: "google.cloud.discoveryengine.v1.ImportCompletionSuggestionsMetadata"
};
}

// Permanently deletes all
// [CompletionSuggestion][google.cloud.discoveryengine.v1.CompletionSuggestion]s
// for a DataStore.
rpc PurgeCompletionSuggestions(PurgeCompletionSuggestionsRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=projects/*/locations/*/collections/*/dataStores/*}/completionSuggestions:purge"
body: "*"
additional_bindings {
post: "/v1/{parent=projects/*/locations/*/dataStores/*}/completionSuggestions:purge"
body: "*"
}
};
option (google.longrunning.operation_info) = {
response_type: "google.cloud.discoveryengine.v1.PurgeCompletionSuggestionsResponse"
metadata_type: "google.cloud.discoveryengine.v1.PurgeCompletionSuggestionsMetadata"
};
}
}

// Request message for
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,17 @@ message AnswerQueryRequest {
// If this field is unrecognizable, an `INVALID_ARGUMENT` is returned.
string order_by = 4;

// Specifies the search result mode. If unspecified, the
// search result mode is based on
// [DataStore.DocumentProcessingConfig.chunking_config][]:
// * If [DataStore.DocumentProcessingConfig.chunking_config][] is
// specified,
// it defaults to `CHUNKS`.
// * Otherwise, it defaults to `DOCUMENTS`.
// See [parse and chunk
// documents](https://cloud.google.com/generative-ai-app-builder/docs/parse-chunk-documents)
SearchRequest.ContentSearchSpec.SearchResultMode search_result_mode = 5;

// Specs defining dataStores to filter on in a search call and
// configurations for those dataStores. This is only considered for
// engines with multiple dataStores use case. For single dataStore within
Expand Down Expand Up @@ -706,6 +717,11 @@ message AnswerQueryRequest {
message QueryRephraserSpec {
// Disable query rephraser.
bool disable = 1;

// Max rephrase steps.
// The max number is 5 steps.
// If not set or set to < 1, it will be set to 1 by default.
int32 max_rephrase_steps = 2;
}

// Query classification specification.
Expand Down Expand Up @@ -777,6 +793,25 @@ message AnswerQueryRequest {
// The field must be a UTF-8 encoded string with a length limit of 128
// characters. Otherwise, an `INVALID_ARGUMENT` error is returned.
string user_pseudo_id = 12;

// The user labels applied to a resource must meet the following requirements:
//
// * Each resource can have multiple labels, up to a maximum of 64.
// * Each label must be a key-value pair.
// * Keys have a minimum length of 1 character and a maximum length of 63
// characters and cannot be empty. Values can be empty and have a maximum
// length of 63 characters.
// * Keys and values can contain only lowercase letters, numeric characters,
// underscores, and dashes. All characters must use UTF-8 encoding, and
// international characters are allowed.
// * The key portion of a label must be unique. However, you can use the same
// key with multiple resources.
// * Keys must start with a lowercase letter or international character.
//
// See [Google Cloud
// Document](https://cloud.google.com/resource-manager/docs/creating-managing-labels#requirements)
// for more details.
map<string, string> user_labels = 13;
}

// Response message for
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ message Document {

// The URI of the content. Only Cloud Storage URIs (e.g.
// `gs://bucket-name/path/to/file`) are supported. The maximum file size
// is 2.5 MB for text-based formats, 100 MB for other formats.
// is 2.5 MB for text-based formats, 200 MB for other formats.
string uri = 3;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,30 @@ message DocumentProcessingConfig {
pattern: "projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/documentProcessingConfig"
};

// Configuration for chunking config.
message ChunkingConfig {
// Configuration for the layout based chunking.
message LayoutBasedChunkingConfig {
// The token size limit for each chunk.
//
// Supported values: 100-500 (inclusive).
// Default value: 500.
int32 chunk_size = 1;

// Whether to include appending different levels of headings to chunks
// from the middle of the document to prevent context loss.
//
// Default value: False.
bool include_ancestor_headings = 2;
}

// Additional configs that defines the behavior of the chunking.
oneof chunk_mode {
// Configuration for the layout based chunking.
LayoutBasedChunkingConfig layout_based_chunking_config = 1;
}
}

// Related configurations applied to a specific type of document parser.
message ParsingConfig {
// The digital parsing configurations for documents.
Expand All @@ -57,6 +81,9 @@ message DocumentProcessingConfig {
bool use_native_text = 2;
}

// The layout parsing configurations for documents.
message LayoutParsingConfig {}

// Configs for document processing types.
oneof type_dedicated_config {
// Configurations applied to digital parser.
Expand All @@ -65,6 +92,9 @@ message DocumentProcessingConfig {
// Configurations applied to OCR parser. Currently it only applies to
// PDFs.
OcrParsingConfig ocr_parsing_config = 2;

// Configurations applied to layout parser.
LayoutParsingConfig layout_parsing_config = 3;
}
}

Expand All @@ -73,6 +103,9 @@ message DocumentProcessingConfig {
// `projects/*/locations/*/collections/*/dataStores/*/documentProcessingConfig`.
string name = 1;

// Whether chunking mode is enabled.
ChunkingConfig chunking_config = 3;

// Configurations for default Document parser.
// If not specified, we will configure it as default DigitalParsingConfig, and
// the default parsing config will be applied to all file types for Document
Expand All @@ -85,8 +118,10 @@ message DocumentProcessingConfig {
// * `pdf`: Override parsing config for PDF files, either digital parsing, ocr
// parsing or layout parsing is supported.
// * `html`: Override parsing config for HTML files, only digital parsing and
// or layout parsing are supported.
// layout parsing are supported.
// * `docx`: Override parsing config for DOCX files, only digital parsing and
// or layout parsing are supported.
// layout parsing are supported.
// * `pptx`: Override parsing config for PPTX files, only digital parsing and
// layout parsing are supported.
map<string, ParsingConfig> parsing_config_overrides = 5;
}
Loading
Loading