Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Discussion): canonical api draft #4041

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

walldiss
Copy link
Member

New gRPC API Design Document and Proto Definitions

This PR introduces a comprehensive design document (readme.md) and proto definitions for the new gRPC-based API for Celestia Node. The design focuses on improving security, simplicity, and developer experience while enabling smooth migration from the current JSON-RPC implementation.

Key Features:

  • Unified status codes and error handling across all services
  • Enhanced data availability verification
  • Support for both row root (V0) and data root (V1) verification approaches
  • Simplified blob submission and management
  • Clear separation between different service domains (blobs, shares, proofs)

Looking for feedback on:

  • Overall API structure and design decisions
  • Security considerations
  • Migration approach (V0/V1)
  • Error handling strategy
  • Any missing functionality

TODO:
[ ] Header service
[ ] http annotation for http gateway support

Note: The proto definitions and design doc should be reviewed together as they complement each other.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.46%. Comparing base (2469e7a) to head (aa44ce7).
Report is 414 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4041      +/-   ##
==========================================
+ Coverage   44.83%   45.46%   +0.62%     
==========================================
  Files         265      309      +44     
  Lines       14620    22173    +7553     
==========================================
+ Hits         6555    10080    +3525     
- Misses       7313    11016    +3703     
- Partials      752     1077     +325     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Standardized Status Codes
```protobuf
// Common status codes used across all services
enum StatusCode {
Copy link
Member

@liamsi liamsi Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we should have the possibility for a dev to know if an error stems from node or e.g. somewhere from app or the cosmos sdk. Maybe that is already possible with the list somehow (e.g. not by the code but via the error string or sth?).

Comment on lines +73 to +76
- The API will verify availability for both the requested block and all preceding blocks within the sampling window
- Data will only be returned if the entire chain segment is verified as available
- Failed availability checks will return a specific `data_not_available` error code

Copy link
Member

@liamsi liamsi Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this slow anything down from an app perspective?

ah, nvm. It's optional for cases where this could matter.

Comment on lines +13 to +17
// Error codes (100+)
// ERROR_DATA_NOT_AVAILABLE is returned if data availability sampling gurantees was required for request
// and failed. Keep in mind that data availability sampling will be ensured for all blocks in
// sampling window till the requested height
ERROR_DATA_NOT_AVAILABLE = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the caller be able to differentiate between e.g. sampling timing out, and other cases?

rpc GetShare(GetShareRequest) returns (GetShareResponse);
rpc GetEDS(GetEDSRequest) returns (GetEDSResponse);

// Range operations - support both V0 (row roots) and V1 (data root) responses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both responses below return V0 though?

Copy link
Member

@liamsi liamsi Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in the protos I see there is also GetRangeResponseV1 etc. Maybe mention that we only list v0 here to keep the readme concise.

Comment on lines +12 to +14
double gas_price = 1; // Amount to be paid per gas unit
uint64 gas = 2; // Gas amount
string key_name = 3; // Keystore key for signing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make them optional too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't all options optional?

uint64 height = 1;
bytes namespace = 2;
bytes commitment = 3;
optional bool provide_commitment_proof = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this field. The initial idea was to divide an inclusion proof and the commitment proof(blob proof). The difference between them the blob proof additionally contains subtree roots to build the commitment(link. After discussions with @walldiss we agreed that it makes sense to always return the full(blob) proof.


message SubscribeAllRequest {
bytes namespaces = 1;
uint64 from_height = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify this: from_height should not be used to get the historical data(we have get methods for these purposes). Subscriptions should be used for the recent data only.

Comment on lines +59 to +63
message SubmitDataResponse {
celestia.node.v1.common.Status status = 1;
uint64 height = 2;
repeated bytes commitments = 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the tx hash here. iirc, some teams are using SubmitPFB in the state module only bc it additionally provides a tx hash.

repeated celestia.node.v1.types.BlobWithProofV1 blobs = 2;
}

message SubmitDataRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about submitting data under multiple namespaces?

I'd suggest to add one more method:

message SubmitBlobsRequest {
  repeated Blobs blobs = 1; // blobs from different namespaces can be set
 SubmitOptions options = 2;
}

Comment on lines +77 to +82
message BlobsSubscribeRequest {
bytes namespaces = 1;
uint64 from_height = 2;
repeated bytes commitment = 3;
optional celestia.node.v1.common.Options options = 4;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a method could be useful in case we support async blob submission. but iirc, it's not feasible rn as error handling in this case might be tricky.

optional bytes signer = 2;
bytes commitment = 3;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define BlobProof(reference here)

// BlobService - handles blob operations
service BlobService {
// Core operations
rpc GetBlob(GetBlobRequest) returns (GetBlobResponseV0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the submitted blob is v1,v2...? Maybe it's better to have a generic BlobResponse and extend the blob's message with additional optional fields?

Comment on lines +20 to +24
message Blob {
bytes data = 1;
optional bytes signer = 2;
bytes commitment = 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got your point the Blob is part of the namespace, and by submitting/getting the blob, the user already has it but I'd anyway make it part of the message for consistency.

Proof proof = 2;
}

message RowRootProof {
Copy link
Member

@vgonkivs vgonkivs Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed with @walldiss, he does not want to import data structures from different repos. So, let's copy past it from the tendermint

Copy link
Contributor

@ramin ramin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hope my comments are welcome. I've added some inline thoughts both from perspective as a keen observer and API user, high level it feels like this first pass is a literal "proto-fication" of the existing APIs vs a clean re-design considering dev ergonomics.

As a user:

  • i'd appreciate some aggressive design iterations on naming, there's lots of name stuttering inside files, services, messages
  • i think we could consider modularizing more, adding more files (ie: blob/v1 etc) so you could compose / import / use these protos as a developer building against node piecemeal, will then make gradual expansion easier
  • some of the proto's defined as *_service.proto should (i think) be just <name>.proto, they can then for example, define a Service, thus reading cleaner, like shares.Service, vs share_service.ShareService. Then, various types could be included there, and even nested in sub folder protos
  • i think another sweep or two of usability considerations should be made, that support end user/developer usage (ie: namespace being a messagehelpers etc) being supported in the proto messages to be more complex types, we could also consider protoc-gen-validate to define length and validation of these types etc in the canonical API, vs "needing to know" to submit a good request

that's all for now, thank you 😸


// Subscription operations
rpc SubscribeBlobs(BlobsSubscribeRequest) returns (stream BlobsSubscribeResponse);
rpc SubscribeAll(SubscribeAllRequest) returns (stream SubscribeAllResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it'd be really clean to have a single rpc in the service here

blobs.Subscribe(SubscriptionRequest)

That that would have make any config that differentiates all from a particular namespace or commitment like a filter, ie in plain english "i am subscribing to everything UNLESS i specify some options/filters in my subscription request configuration"

rpc GetAll(GetAllBEFPRequest) returns (GetAllBEFPResponse);

// Subscribe to receive fraud proofs as they are discovered
rpc SubscribeBEFPs(SubscribeBEFPRequest) returns (stream SubscribeBEFPResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, cleaner naming here would be to NOT repeat BEFP everywhere, its already in the name of the service,

eg:

BadEncodingFraudProofService.Subscribe() doesn't need the stutter of BEFP again if the service's concern is befp already

// Proof system supports two verification approaches:
// 1. Row roots (V0) - Uses row roots from Data Availability Header (DAH) for verification
// 2. Data root (V1) - Uses data root proofs directly, more efficient and doesn't require DAH
enum ProofType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proofs.Type imo

enum ProofType {
PROOF_TYPE_UNSPECIFIED = 0;
// Data proofs
PROOF_TYPE_DATA_INCLUSION = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATA_INCLUSION

// Shares should be from the same row
repeated celestia.node.v1.types.Share shares = 1;
celestia.node.v1.proof.RowRootProof row_proof = 2; // NMT proof to the row root
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistic: I think i disagree with a types.proto, the types should maybe be defined where the reset of their usage is (mostly)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to move definitions to the places where they are used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this (and this is how I see it done in other repos)

uint64 index = 3; // index of the first share of the blob in the EDS
}

message BlobWithProofV1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love the developer ergonomics V0 and V1 at the end of the messages, could we consider modularizing more,

api/v1/blobs/v0/proofs.proto
api/v1/blobs/v1/proofs.proto

Then be deliberate about which gets imported and referenced wherever? ie: I think we could them simplify the Request to take one of either version and validate and respond properly in the service, would make the interface nice and clean


message GetBlobRequest {
uint64 height = 1;
bytes namespace = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've been thinking, it might be nice to make namespace a distinct type, where the API could then support some nicer features, like submitting in raw text, decode/encode, etc etc. At present, it feels very literal and as designed by someone who knows the internals, vs a developer using the API

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Just tried to do a first pass of everything.

A more general topic regarding the proof API. What do you think about prioritising the data root based proving system first before releasing this API?

Comment on lines +12 to +14
double gas_price = 1; // Amount to be paid per gas unit
uint64 gas = 2; // Gas amount
string key_name = 3; // Keystore key for signing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't all options optional?

// Shares should be from the same row
repeated celestia.node.v1.types.Share shares = 1;
celestia.node.v1.proof.RowRootProof row_proof = 2; // NMT proof to the row root
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this (and this is how I see it done in other repos)

Comment on lines +19 to +20
// TBD: structure for commitment non-inclusion
PROOF_TYPE_COMMITMENT_NON_INCLUSION = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Comment on lines +54 to +69
// ProofsService - verification service that supports both V0 and V1 approaches
service ProofsService {
// V0 verification of inclusion of shares. Requires row root to perform verification.
// Does not verifies namespace rules of provided shares and should be used only
// to verify inclusion of data spread over multiple namespaces
rpc VerifySharesV0(VerifySharesRequestV0) returns (VerifyProofResponse);
// Performs what VerifySharesV0 does with additional namespace rules verification on top.
rpc VerifyNamespaceV0(VerifyNamespaceRequestV0) returns (VerifyProofResponse);

// V1 verification
// VerifyDataProof verifies raw blob data using data root proof
rpc VerifyDataProof(VerifyDataRequest) returns (VerifyProofResponse);
// VerifyCommitmentProof verifies inclusion of blob, by using commitment and commitment data root proof.
// Provided proof type should be PROOF_TYPE_COMMITMENT
rpc VerifyCommitmentProof(VerifyCommitmentRequest) returns (VerifyProofResponse);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer separate v0 and v1 proof services (especially if we plan on deprecating v0 in the future)

rpc VerifyDataProof(VerifyDataRequest) returns (VerifyProofResponse);
// VerifyCommitmentProof verifies inclusion of blob, by using commitment and commitment data root proof.
// Provided proof type should be PROOF_TYPE_COMMITMENT
rpc VerifyCommitmentProof(VerifyCommitmentRequest) returns (VerifyProofResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantic nit: I know the usage of Commitment is widespread already but I find it both vague and misleading (to commit over data is to sign over it). To make it more understandable to the users I would separate them into the two data structs they are proving inclusion over:

  1. BlobInclusionProofs
  2. ShareInclusionProofs (I think the way you have this, this can also be a range)

// ERROR_DATA_NOT_AVAILABLE is returned if data availability sampling gurantees was required for request
// and failed. Keep in mind that data availability sampling will be ensured for all blocks in
// sampling window till the requested height
ERROR_DATA_NOT_AVAILABLE = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only returned by the SharesAvailable method?

// sampling window till the requested height
ERROR_DATA_NOT_AVAILABLE = 100;
// Fraud was detected in the chain before requested block
ERROR_FRAUD_DETECTED = 101;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would cause this error? (i.e. more specifically)

ERROR_DATA_NOT_AVAILABLE = 100;
// Fraud was detected in the chain before requested block
ERROR_FRAUD_DETECTED = 101;
ERROR_TIMEOUT = 102;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all timeouts handed by gRPC itself i.e. contexts in the case of go. Do they need to be included here?

}

// ShareService - handles data share operations
service ShareService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have subscription options for both blob and fraud servies, why not share services?

I can imagine most base rollups (or rollup full nodes in general) will subscribe to all data in one or more namespaces.

Comment on lines +11 to +16
message GetShareRequest {
uint64 height = 1;
uint32 row = 2;
uint32 col = 3;
optional celestia.node.v1.common.Options options = 4;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you also be able to reference a share based on an index (rather than rol and col). Some of the responses return the starting index of a blob.

@walldiss walldiss self-assigned this Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants