Skip to content

Introduce EndStreamAction in Delta Sharing Protocol #734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 29, 2025
Merged

Conversation

linzhou-db
Copy link
Collaborator

Introduce EndStreamAction and the header includeEndStreamAction in Delta Sharing Protocol.

PROTOCOL.md Outdated

### readerFeatures
readerfeatures is only useful when `responseformat=delta`, it includes values from [delta reader
features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features). It's set by the
caller of `DeltaSharingClient` to indicate its ability to process delta readerFeatures.

## API Response Format in Parquet
### includeEndStreamAction
The key is `includeEndStreamAction` and the value is `true` of `false`, i.e. `includeEndStreamAction=true`.
Copy link
Collaborator

@JialeTomTian JialeTomTian May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT (spelling): true or false here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

PROTOCOL.md Outdated
Client sets `includeEndStreamAction=true` in the request header.

The server can:
1) decide not to include `EndStreamAction` in the respnose, thus it has to set `includeEndStreamAction=false` or not set it in the response header.
Copy link
Collaborator

@JialeTomTian JialeTomTian May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (spelling): response

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

<tr>
<th>Client that doesn't specify the header</th>
<td colspan="2"> No changes in both request and response header, and the server will only return `EndStreamAction` at the
end of the response when needed*, the client shouldn't fail the request if not seeing the action in the response. </td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify what does needed mean here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

I intended to add the cases but forgot before sent out..

Copy link
Collaborator

@JialeTomTian JialeTomTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, since this is a public facing doc change, may be good to wait on more folks to get more eyes.

@linzhou-db
Copy link
Collaborator Author

@charlenelyu-db @chakankardb please take a look when you got time, thanks!

PROTOCOL.md Outdated
@@ -2512,7 +2512,7 @@ Accepted timestamp format by a delta sharing server: in the ISO8601 format, in t

## Delta Sharing Capabilities Header
This section explains the details of delta sharing capabilities header, which was introduced to help
delta sharing catch up with features in [delta protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md).
delta sharing protocol evolve, such as with new feature support or catch up with delta features in [delta protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's chatgpt this:

This section explains the details of the Delta Sharing Capabilities header, which was introduced to enable the Delta Sharing protocol to evolve over time. This includes supporting new features and maintaining compatibility with advancements in the Delta protocol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

- **delta**: format can be used to read a shared delta table with minReaderVersion > 1, which contains
readerFeatures such as Deletion Vector or Column Mapping. `delta-sharing-spark` libraries
that are able to process `responseformat=delta` will be released soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responseFormat
Specifies the expected format of the API Response in Parquet. Two values are supported:

parquet: Represents the response format used by delta-sharing-spark version 1.0 and earlier. This is the default format if responseFormat is not specified in the header. All existing Delta Sharing connectors are compatible with this format.

delta: Enables reading of shared Delta tables with minReaderVersion > 1, which may include advanced reader features such as Deletion Vectors or Column Mapping. Support for responseFormat=delta will be available in upcoming versions of the delta-sharing-spark library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@linzhou-db linzhou-db requested a review from chakankardb May 29, 2025 05:08
@linzhou-db linzhou-db merged commit 10d8a73 into main May 29, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants